EDITOR ’ S QUESTION
To understand the challenges that AI presents from a data storage perspective , we need to look at its foundations . Any machine learning capability requires a training data set . In the case of generative AI , the data sets need to be large and complex , including different types of data .
Generative AI relies on complex models , and the algorithms on which it is based can include a large number of parameters that it is tasked with learning . The greater the number of features , size and variability of the anticipated output , the greater the level of data batch size combined with the number of epochs in the training runs before inference can begin .
Because data volumes are increasing exponentially , it is more important than ever that organisations use the densest , most efficient data storage possible , to limit sprawling data centre footprints , and the spiralling power and cooling costs that go with them . This presents another challenge that is beginning to surface as a significant issue : the implications massively scaled-up storage requirements have for being able to achieve net zero carbon targets by 2030 – 2040 .
Some technology vendors are already addressing sustainability in their product design . For example , allflash storage solutions are considerably more efficient than their spinning disk , HDD counterparts . Some vendors are even going beyond off the shelf SSDs , creating their own flash modules .
As well as being more sustainable than HDD , it is also a fact that flash storage is much better suited to running AI projects . This is because the key to results is connecting AI models or AI powered applications to data .
To do this successfully requires large and varied data types , streaming bandwidth for training jobs , write performance for checkpointing , and checkpoint restores , random read performance for inference and crucially it all needs to be 24x7 reliable and easily accessible , across silos and applications . This set of characteristics is not possible with HDD based storage underpinning your operations , all-flash is needed .
Data centres are now facing a secondary but equally important challenge that will be exacerbated by the continued rise of AI and ML . That is water consumption , which is set to become an even bigger problem .
As AI and ML continue to rapidly evolve , the focus will increase on data security , to ensure that rogue or adversarial inputs cannot change the output , model repeatability , using techniques like Shapley values to gain a better understanding of how inputs alter the model and stronger ethics , to ensure this powerful technology is used to actually benefit humanity . p
As well as being more sustainable than HDD , it is also a fact that flash storage is better suited for running AI projects .
ALEX MCMULLAN , CTO INTERNATIONAL ,
PURE STORAGE
www . intelligentcio . com INTELLIGENTCIO MIDDLE EAST 35