"The future of AI isn't measured in size. It's measured in relevance."
- Stop Making Models Bigger, Make Them Behave β Kobie Crawford, Snorkel
https://t.co/jA6XlFscpg via @YouTube@KobieWon
π Exciting news from Navdyut AI Labs! We are thrilled to announce Navdyut-Asm-32k, an optimized, native Assamese tokenizer.
By preserving complex morphological boundaries natively, this model establishes a mathematically lean tokenization gateway specifically tailored for sovereign Assamese AI infrastructure.
Performance Highlights:
It is 46.8% more efficient than Google Gemma.
It processes Assamese text using 3.6x fewer tokens than Meta's Llama 3.
A massive thank you to @ai4bharat and our incredible regional supporters for helping us collect and clean the Assamese dataset, with our foundational data being sourced from the ai4bharat/IndicCorpV2 repository!
β³ Stay tuned: The tokenizer codebase and Hugging Face links are coming soon!
An occasion for the culture of Assam!
We're thrilled to unveil our Bodo LLM, achieving an impressive 88-92% accuracy.
This test model is a testament to the vibrancy of the Bodo language.
We seek the encouragement and support of @CMOfficeAssam & @himantabiswa to help us grow.
Translation:
AI Navadyut: An Invaluable Gift from the New Generation
A group of young Assamese has come forward to ensure that the advancement of technology does not endanger or distort their own language.
With the aim of countering the threat of technology through technology itself, they have developed a special Artificial Intelligence (AI) for the North-East, especially for the land of Assam.
The team of 10 engineers and researchers has created an Assamese AI model.
This 'Navadyut technology' is the result of extensive research by Dicom Pathak (a resident of Baradi Satra in Barpeta district, now permanently residing in Guwahati), Champak Deka, Shiva Bhattacharjee, and Sahil Gulihar, in addition to two other youths. According to them, words from various ethnic groups and communities of Assam that were on the verge of being lost have been registered in the AI.
Journalist: Golok Talukdar.