Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
In collaboration with @CommonCrawl@MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.
🌟🚀Sparse Attention Models Can Get Sparser
We've updated The Sparse Frontier—the largest empirical analysis of training-free sparse attention to date—from Qwen 2.5 to 3 model families, now including Llama 3.1 and Gemma 3.
Key findings:
📊 Larger sparse models outperform smaller dense ones at equal compute cost. Only high-sparsity configs lie on the Pareto frontier for long sequences.
🔬 Already sparse? You can go sparser. Gemma 3 has 5/6 layers as Sliding Window Attention by design—yet additional sparsification of the remaining dense layers still yields efficiency gains at scale.
📈 Longer sequences tolerate higher sparsity. From 9 models × 6 methods × 9 tasks: fixed-budget methods in production are suboptimal. Token budget should grow sublinearly with context length.
Co-authors: Robert Li, Renjie Huang, @seb_ruder, @cheeesio, @PontiEdoardo.
Special shout-out to @faridlazuarda who updated our repo to vLLM v1 and made Gemma3 evaluations possible.
Links in the comments ⬇️
Our Multilingual Team at @cohere is hiring interns!
If you are a current PhD student working in multilinguality and would like to work with our team, please apply below, reach out! 🌍🌏🌎
(The below says “Winter”, but we hire year-round)
https://t.co/Uwhh7aE5LE
Pick of the week @fbk_mt: How Does Quantization Affect Multilingual LLMs?
Quantization has become a widely adopted technique for model compression. This work investigates the impact of quantization on different languages in multilingual LLMs.
https://t.co/A82iW9Myek
Announcing - Moms Who ML! 🐣🍼
I landed in San Diego to a video call from my 15-month-old -- she licked the camera then put me in this bag of Mega Bloks👅
If you can relate, let's support one another!
Search for the group on the #NeurIPS2025 app, and we'll expand after!
I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3
Big thanks to @slatornews for hosting me at #SlatorCon Silicon Valley last month. I loved giving attendees an inside look into how SOTA multilingual LLMs are built 🤖🌍
Where machine translation goes, AI goes. This has been true since 1954 (MT made front-page of NYT, 2 years before the Dartmouth workshop)
We just released the best translation model ever