Intelligent CIO Middle East Issue 112

DISRUPTIVE TECH

Anupam Datta , Principal Research

Scientist and AI Research Team Lead , Snowflake fixed , an iterative process of continuous improvement familiar to anyone who has worked with cloud software .

Tracking costs and latency

Technology leaders are becoming increasingly practical about their AI efforts . Gone are the days of unchecked AI spending , leaders are now deeply concerned with the ROI of their AI investments , and understanding which use cases are delivering business results .

From this perspective , the two essential dimensions to measure are how much an application costs and how much time it takes to deliver answers , known as latency .

IN ONE EGREGIOUS EXAMPLE , TWO LAWYERS WERE FINED FOR SUBMITTING A LEGAL BRIEF WRITTEN BY AI THAT CITED NON- EXISTENT CASES .

Throwing more GPUs and servers at an application can reduce latency , but it drives up cost . You cannot find the right balance for your application unless you can measure both accurately . Observability gives enterprises a clearer picture of both of these elements , enabling them to maximise results and minimise costs .

The RAG Triad is one example of a set of metrics that helps evaluate RAG applications to ensure that they are honest and helpful . It includes three metrics , context relevance , groundedness , and answer relevance , to measure the quality of the three steps of a typical RAG application .

As enterprises bring AI applications into production , they must expect and demand more than good enough . For AI to become a reliable , trustworthy component of business infrastructure , LLM application answers must align with the 3H rule , being honest , harmless , and helpful .

They need to be honest , meaning factually accurate and free of hallucinations . Enterprises must be able to use them for tasks where their generalisation is desirable : Summarising , generating inferences , and planning . Honest AI also means the system recognises and acknowledges when it cannot accurately answer a question . For example , if the answer just does not exist , the LLM should say I cannot answer that as opposed to spitting out something random .

For tasks where memorisation of facts is more important , we need to supplement LLMs with additional information and data sources to ensure that responses are accurate . This is an active field of research known as retrieval-augmented generation , or RAG : Combining LLMs with databases of factual data that they can retrieve to answer specific questions .

AI needs to be harmless , meaning answers do not leak personally identifiable information and are not vulnerable to jailbreak attacks designed to circumvent their designers ’ guardrails . Those guardrails must ensure that the answers do not embody bias , hurtful stereotypes , or toxicity .

Finally , AI needs to be helpful . It needs to deliver answers that match the queries users give it , that are concise and coherent , and provide useful results .

• Context relevance measures how relevant each piece of retrieved-context from a knowledgebase is to the query that was asked .

• Groundedness measures how well the final response is grounded in or supported by the retrieved pieces of context .

• Answer relevance measures how relevant the final response is to the query that was asked .

By decomposing a composite RAG system into components , query , context , and response , this evaluation framework can triage the failure points and provide a clearer understanding of where improvements are needed in the RAG system and guide targeted optimisation .

Protecting against harm means aligning AI models , leveraging tools such as Llama Guard , and introducing safeguards to address issues like toxicity , stereotyping , adversarial attacks and more .

Significant advancements have been made in meeting these goals . With AI observability , we can detect hallucinations , incomplete or irrelevant responses , and security gaps . The increasing use of autonomous workflows adds another layer of complexity , verifying that tools are used accurately , using the correct settings in the proper order , tracking actions in multiagent distributed systems , and ensuring the entire process works as intended .

This highlights the critical role of AI observability in helping AI deliver its full potential to transform businesses , streamline operations , cut costs , and create new revenue opportunities . p

80 INTELLIGENTCIO MIDDLE EAST www . intelligentcio . com

Intelligent CIO Middle East Issue 112 | Page 80