INTELLIGENT BRANDS // Software for Business
G42 launches JAIS 70B and 20 other AI Models to champion Arabic Natural Language Processing
The latest JAIS large language model , LLM , JAIS 70B , was released by Inception , a G42 company specialising in the development of advanced AI models and applications , all provided as a service . A 70 billion parameter model , JAIS 70B is built for developers of Arabic-based natural-language processing , NLP solutions and promises to accelerate the integration of Generative AI services across various industries , enhancing capabilities in areas such as customer service , content creation , and data analysis .
Dr Andrew Jackson , CEO , Inception said : “ Releasing JAIS 70B and this new family of models reinforces our commitment to delivering the highest quality AI foundation model for Arabic speaking nations . The training and adaptation techniques we are delivering successfully for Arabic models are extensible to other under-served languages and we are excited to be bringing this expertise to other countries .”
Neha Sengupta , Principal Applied Scientist , Inception said : “ For models up to 30 billion parameters , we successfully trained JAIS from scratch consistently outperforming
Dr Andrew Jackson adapted models in the community . However , for models with 70 billion parameters and above , the computational complexity and environmental impact of training from scratch were significant .
“ We made a choice to build JAIS 70B on the Llama2 model , allowing us to leverage the extensive knowledge base of an existing English model and develop a more efficient and sustainable solution .”
JAIS 70B delivers Arabic-English bilingual capabilities at an unprecedented size and scale for the open-source community . As a 70 billion parameter model , it has increased ability to handle complicated and nuanced tasks , as well as better capability to process complex datasets .
JAIS 70B was developed using continuous training , a process of fine-tuning a pretrained model , on 370 billion tokens of which 330 billion were Arabic tokens , the largest Arabic dataset ever used to train an opensource foundational model .
In this release , the company has also unveiled a comprehensive suite of JAIS foundation and fine-tuned models ; 20 models , across 8 sizes , ranging from
590M to 70B parameters , and specifically fine-tuned for chat applications , trained on up to 1.6T tokens of Arabic , English , and code data .
In response to feedback from the Arabic NLP community , this extensive release now delivers a breadth of tools , including the first Arabic-centric model small enough to run on a laptop , delivering both small , compute-efficient models for targeted applications , and advanced model sizes for enterprise precision .
Inception released JAIS-13B and JAIS- 13B-chat in August 2023 and subsequently launched the state-of-the-art Arabic-centric models , JAIS-30B and JAIS-30B-chat . JAIS 70B and JAIS 70B-chat have proven to be even more performant in benchmarking data in both English and Arabic compared to previous models .
JAIS 70B retains , and in specific cases , exceeds , the high-quality English-language processing capabilities of Llama2 , while vastly excelling on Arabic outputs versus the base model . The JAIS development team trained an expanded tokeniser based on the Llama2 tokeniser to enhance Arabic text processing efficiency , doubling the model ’ s base vocabulary . According to Sengupta , the model splits Arabic words less aggressively and makes training and inferencing cheaper , than the standard Llama2 model . p
74 INTELLIGENTCIO MIDDLE EAST www . intelligentcio . com