DISRUPTIVE TECH
Anupam Datta , Principal Research
Scientist and AI Research Team Lead , Snowflake fixed , an iterative process of continuous improvement familiar to anyone who has worked with cloud software .
Tracking costs and latency
Technology leaders are becoming increasingly practical about their AI efforts . Gone are the days of unchecked AI spending , leaders are now deeply concerned with the ROI of their AI investments , and understanding which use cases are delivering business results .
From this perspective , the two essential dimensions to measure are how much an application costs and how much time it takes to deliver answers , known as latency .
IN ONE EGREGIOUS EXAMPLE , TWO LAWYERS WERE FINED FOR SUBMITTING A LEGAL BRIEF WRITTEN BY AI THAT CITED NON- EXISTENT CASES .
Throwing more GPUs and servers at an application can reduce latency , but it drives up cost . You cannot find the right balance for your application unless you can measure both accurately . Observability gives enterprises a clearer picture of both of these elements , enabling them to maximise results and minimise costs .
The RAG Triad is one example of a set of metrics that helps evaluate RAG applications to ensure that they are honest and helpful . It includes three metrics , context relevance , groundedness , and answer relevance , to measure the quality of the three steps of a typical RAG application .
As enterprises bring AI applications into production , they must expect and demand more than good enough . For AI to become a reliable , trustworthy component of business infrastructure , LLM application answers must align with the 3H rule , being honest , harmless , and helpful .
They need to be honest , meaning factually accurate and free of hallucinations . Enterprises must be able to use them for tasks where their generalisation is desirable : Summarising , generating inferences , and planning . Honest AI also means the system recognises and acknowledges when it cannot accurately answer a question . For example , if the answer just does not exist , the LLM should say I cannot answer that as opposed to spitting out something random .
For tasks where memorisation of facts is more important , we need to supplement LLMs with additional information and data sources to ensure that responses are accurate . This is an active field of research known as retrieval-augmented generation , or RAG : Combining LLMs with databases of factual data that they can retrieve to answer specific questions .
AI needs to be harmless , meaning answers do not leak personally identifiable information and are not vulnerable to jailbreak attacks designed to circumvent their designers ’ guardrails . Those guardrails must ensure that the answers do not embody bias , hurtful stereotypes , or toxicity .
Finally , AI needs to be helpful . It needs to deliver answers that match the queries users give it , that are concise and coherent , and provide useful results .
• Context relevance measures how relevant each piece of retrieved-context from a knowledgebase is to the query that was asked .
• Groundedness measures how well the final response is grounded in or supported by the retrieved pieces of context .
• Answer relevance measures how relevant the final response is to the query that was asked .
By decomposing a composite RAG system into components , query , context , and response , this evaluation framework can triage the failure points and provide a clearer understanding of where improvements are needed in the RAG system and guide targeted optimisation .
Protecting against harm means aligning AI models , leveraging tools such as Llama Guard , and introducing safeguards to address issues like toxicity , stereotyping , adversarial attacks and more .
Significant advancements have been made in meeting these goals . With AI observability , we can detect hallucinations , incomplete or irrelevant responses , and security gaps . The increasing use of autonomous workflows adds another layer of complexity , verifying that tools are used accurately , using the correct settings in the proper order , tracking actions in multiagent distributed systems , and ensuring the entire process works as intended .
This highlights the critical role of AI observability in helping AI deliver its full potential to transform businesses , streamline operations , cut costs , and create new revenue opportunities . p
80 INTELLIGENTCIO MIDDLE EAST www . intelligentcio . com