DISRUPTIVE TECH
AS WELL AS BEING MORE SUSTAINABLE THAN HDD , IT IS ALSO A FACT THAT FLASH STORAGE IS MUCH BETTER SUITED TO RUNNING AI PROJECTS .
Role of storage devices
Some technology vendors are already addressing sustainability in their product design . For example , allflash storage solutions are considerably more efficient than their spinning disk , HDD counterparts .
Some vendors are even going beyond off the shelf SSDs , creating their own flash modules which allow all-flash arrays to communicate directly with raw flash storage , which maximises the capabilities of flash and provides better performance , power utilisation , and efficiency .
As well as being more sustainable than HDD , it is also a fact that flash storage is much better suited to running AI projects . This is because the key to results is connecting AI models or AI powered applications to data . especially when you take into consideration the continued rise in global temperatures .
Many data centres use evaporative cooling , which works by spraying fine mists of water onto cloth strips , with the ambient heat being absorbed by the water , thus cooling the air around it . It is a smart idea but it is problematic , given the added strain that climate change is placing on water resources , especially in built-up areas .
As a result , this method of cooling has fallen out of favour in the past year , resulting in a reliance on more traditional , power intensive cooling methods like air conditioning . This is yet another reason to move to allflash data centres , which consume far less power and do not have the same intensive cooling requirements as HDD and hybrid .
To do this successfully requires large and varied data types , streaming bandwidth for training jobs , write performance for checkpointing , and checkpoint restores , random read performance for inference and crucially it all needs to be 24x7 reliable and easily accessible , across silos and applications .
This set of characteristics is not possible with HDD based storage underpinning your operations , all-flash is needed .
Challenges of energy consumption
Data centres are now facing a secondary but equally important challenge that will be exacerbated by the continued rise of AI and ML . That is water consumption , which is set to become an even bigger problem ,
As AI and ML continue to rapidly evolve , the focus will increase on data security , to ensure that rogue or adversarial inputs cannot change the output , and model repeatability . This is possible using techniques like Shapley values to gain a better understanding of how inputs alter the model and stronger ethics , to ensure this powerful technology is used to actually benefit humanity .
All these worthy goals will increasingly place new demands on data storage . Storage vendors are already factoring this into their product development roadmaps , knowing that CTOs will be looking for secure , high-performance , scalable , efficient storage solutions that help them towards these goals . The focus should therefore not be entirely on the capabilities of data storage hardware and software , the big picture in this case is very big indeed . p
Key takeaways
• Any machine learning capability requires a training data set .
• In the case of generative AI , the data sets need to be very large and complex .
• Generative AI relies on complex models , and the algorithms on which it is based can include a very large number of parameters .
• Greater the number of features , size and variability of the output , the greater the level of data batch size combined with the number of epochs in the training .
• Generative AI is in essence being tasked with making an educated guess or running an extrapolation , regression or a classification based on the data set .
• With more data , greater the chance of an accurate outcome , minimising the error , cost function .
• Introduction of large language models , upon which generative AI platforms rely , has seen size and complexity increase by an order of magnitude .
• Learned knowledge patterns that emerge during the AI model training process need to be stored in memory .
• Checkpointing large and complex models puts pressure on underlying network and storage infrastructure .
• The model cannot continue until the internal data has all been saved in the checkpoint .
• Checkpoints function as restart or recovery points if the job crashes or the error gradient is not improving .
• It is important that organisations use the densest , most efficient data storage possible .
• This will limit sprawling data centre footprints , and spiralling power and cooling costs that go with them .
• Latest GPU servers consume 6 – 10kW each , and most existing datacentres are not designed to deliver more than 15kW per rack .
• There is a large and looming challenge for datacentre professionals as GPU deployments increase in scale .
• Storage vendors are factoring that CTOs will be looking for secure , high-performance , scalable , efficient storage solutions .
74 INTELLIGENTCIO MIDDLE EAST www . intelligentcio . com