DISRUPTIVE TECH

The use of AI-powered applications , from virtual assistants and chatbots to coding copilots and autonomous agents , is gaining popularity among businesses . However , as their adoption grows , their flaws are becoming increasingly apparent .

Issues such as incomplete , offensive or inaccurate responses , often referred to as hallucinations , security risks , and overly generic replies present barriers to enterprise-wide implementation , and for good reason .

Similar to how the rise of cloud-based platforms and applications introduced several innovative tools used to assess , debug , and monitor their functionality , the widespread adoption of AI demands a dedicated suite of observability tools tailored to its own unique requirements .

With the UAE poised to become the third most important country for AI integration , thanks to the country ’ s abundant access to capital , computing power , and data , the need for AI observability becomes even more pressing . AI-powered applications require the same level of oversight and administration as any other application that is vital to business operations . In other words , AI needs observability .

Observability refers to the technologies and business practices used to understand the complete state of a technical system , platform , or application . For AI-powered applications specifically , observability means understanding all aspects of the system , from end to end .

Observability helps companies evaluate and monitor the quality of inputs , outputs , and intermediate results of LLM-based applications and can help to flag and diagnose hallucinations , bias , and toxicity , as well as performance and cost issues .

We need observability in AI because the technology is starting to show its limitations at the precise moment that it is becoming indispensable , and for enterprises , these limitations are simply unacceptable .

But when LLMs are being used in place of search engines , some users approach them with the expectation that they will deliver accurate and helpful results . If AI fails to do that , it erodes trust . In one egregious example , two lawyers were fined for submitting a legal brief written by AI that cited nonexistent cases .

Hallucinations , security leaks , and incorrect answers undermine the trust businesses need to have in the AI-powered applications they build , and present roadblocks for bringing AI into production . If the LLM produces inappropriate answers , it also hurts the ability of consumers to trust the company itself , causing damage to the brand .

As one corporate LLM user told me , we want an easy way to evaluate and test the accuracy of different models and applications instead of taking the looks good to me approach . From evaluation to ongoing monitoring , observability is increasingly important to any organisation using AI applications .

AI observability gives the owners of AI applications the power to monitor , measure , and correct performance , helping in three different aspects of corporate AI use :

Evaluation and experimentation

With so many AI models and tools on the market , it is important that enterprises can easily determine which elements work best for their specific AI app use case . Observability is critical for evaluating different LLMs , configuration choices , code libraries , and more , enabling users to optimise their tech choices for each project .

Monitoring and iteration

Once an AI app has been deployed and is in use , observability helps with logging execution traces and monitoring its ongoing performance . When problems crop up , observability is crucial for diagnosing the source , fixing it , and then validating that it was correctly

For example , I teach a computer science course on Trustworthy Machine Learning at Stanford University and advise my students to consider LLMs ’ answers as hallucinatory unless proven otherwise . Why ? Because LLMs are trained to generalise from large bodies of text , generating original text modelled on the general patterns found in the text they were trained on . They are not built to memorise facts .

The RAG Triad includes three metrics , context relevance , groundedness , and answer relevance .

www . intelligentcio . com INTELLIGENTCIO MIDDLE EAST 79

Intelligent CIO Middle East Issue 112 | Page 79

The use of AI-powered applications , from virtual assistants and chatbots to coding copilots and autonomous agents , is gaining popularity among businesses . However , as their adoption grows , their flaws are becoming increasingly apparent .