DISRUPTIVE TECH
As a technology with huge but unrealised potential , AI has been on the corporate agenda for a long time . This year it has undoubtedly gone into overdrive , due to Microsoft ’ s $ 10 billion investment in OpenAI , together with strategic initiatives by Meta , Google and others in generative AI .
Although we have seen many advances in AI over the years , and arguably just as many false dawns in terms of its widespread adoption , there can be little doubt now that it is here to stay . As such , now is the time for CTOs and IT teams to consider the wider implications of the coming AI driven era .
In terms of its likely impact on the technology sector and society in general , AI can be likened to the introduction of the relational database , in that it was the spark that ignited a widespread appreciation for large data sets , resonating with both end users and software developers .
Over the last few years , AI has steadily driven the size of these datasets upwards , but the introduction of large language models , upon which ChatGPT and the other generative AI platforms rely , has seen their size and complexity increase by an order of magnitude .
This is because the learned knowledge patterns that emerge during the AI model training process need to be stored in memory , which can become a challenge with larger models .
Checkpointing large and complex models also puts huge pressure on underlying network and storage infrastructure , as the model cannot continue until the internal data has all been saved in the checkpoint , these checkpoints function as restart or recovery points if the job crashes or the error gradient is not improving .
Large data sets
Alex McMullan , CTO International , Pure Storage
AI and ML can be viewed in the same terms as they provide a formative foundation for not only building powerful new applications , but also enhancing and improving the way we engage with groundbreaking technology alongside large and disparate datasets . We are already seeing how these developments can help us solve complex problems much faster than was previously possible .
Given the connection between data volumes and the accuracy of AI platforms , it follows that organisations investing in AI will want to build their own very large data sets to take advantage of the unlimited opportunities that AI affords . This is achieved through utilising neural networks to identify the patterns and structures within existing data to create new , proprietary content .
Understanding how AI works
To understand the challenges that AI presents from a data storage perspective , we need to look at its foundations . Any machine learning capability requires a training data set . In the case of generative AI , the data sets need to be very large and complex , including different types of data .
Generative AI relies on complex models , and the algorithms on which it is based can include a very large number of parameters that it is tasked with learning . The greater the number of features , size and variability of the anticipated output , the greater the level of data batch size combined with the number of epochs in the training runs before inference can begin .
Generative AI is in essence being tasked with making an educated guess or running an extrapolation , regression or a classification based on the data set . The more data the model has to work with , the greater the chance of an accurate outcome or minimising the error , cost function .
Because data volumes are increasing exponentially , it is more important than ever that organisations can use the densest , most efficient data storage possible , to limit sprawling data centre footprints , and the spiralling power and cooling costs that go with them .
This presents another challenge that is beginning to surface as a significant issue , the implications massively scaled-up storage requirements have for being able to achieve net zero carbon targets by 2030 – 2040 .
It is clear that AI will have an impact on sustainability commitments because of the extra demands it places on data centres , at a time when CO 2 footprints and power consumption are already a major issue . This is only going to increase pressure on organisations , but it can be accommodated and managed by working with the right technology suppliers .
The latest GPU servers consume 6 – 10kW each , and most existing datacentres are not designed to deliver more than 15kW per rack , so there is a large and looming challenge for datacentre professionals as GPU deployments increase in scale .
TO UNDERSTAND THE CHALLENGES THAT AI PRESENTS FROM A DATA STORAGE PERSPECTIVE , WE NEED TO LOOK AT ITS FOUNDATIONS .
www . intelligentcio . com INTELLIGENTCIO MIDDLE EAST 73