Asst. Prof. of Computational Linguistics, Brandeis University. I focus on NLP resources and models for less-resourced languages/domains, specializing in NER.
I'm honored and humbled that Jonne Sälevä, Duygu Ataman (@_dataman_) and I received the best paper award at @aaclmeeting for our work on evaluation in multilingual/multitask benchmarks. I never expected an eval paper could make it this far! More info here: https://t.co/PJEjcog1gD
The Computation and Written Language (CAWL) workshop is starting off with a keynote by @ZevHandel titled "Everything you wanted to know about East Asian writing but didn’t think to ask." The oral sessions run until 12pm; please join us in-person or on Zoom! #LREC2026
Final talk of the morning oral session at SIGUL: "LLM as a Morphological Disambiguator for Belarusian" by Vladislav Poritski, Oksana Volchek, and Ilia Afanasev #LREC2026
Happy to share 🌍Omnilingual Machine Translation🌍
In this work @AIatMeta we explore translation systems supporting 1,600+ languages. We show how our models (1B to 8B) can outperform baselines of up to 70B while having much larger language coverage.
📄:https://t.co/isvEzRZbnw
The center of gravity in NLP is shifting. 🌍
This year's #EMNLP2026 Special Theme is "New Missions for NLP Research." We welcome empirical, theoretical, or position and survey papers that reframe our collective research goals.
Find out more:
https://t.co/6dttoXBb3B
📢 The First Call for Papers for EMNLP 2026 is officially out! 📝
We welcome long & short papers featuring original research on empirical methods for NLP.
🗓️ ARR Submission Deadline: May 25, 2026
🔗 Read the full CFP here: https://t.co/GU2kaISjUG
#EMNLP2026
🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀
https://t.co/nRrLSpjnwV
A bunch of cool ideas make this possible: [1/n]
🚨 LREC Workshop Deadline Extension 🚨
We've extended the deadline for the Workshop on Computation and Written Language (CAWL)
🗓️ NEW DATE: February 23
Submit your papers here: 🔗 https://t.co/a2hHOKWmMs
#LREC2026#CAWL2026#CompLing#NLProc
Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are.
Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.
Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
In collaboration with @CommonCrawl@MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.
I've never had such a hard time getting reviews done as an AC this year for #ACL2026. I am URGENTLY in need of three reviewers to review papers on AI and UX generation and a position paper on AI and education. Please DM me with openReview credential
Calls: 3rd Workshop on Computation and Written Language at LREC 2026: Final Call for Papers: The Third Workshop on Computation and Written Language (CAWL 2026) will be held in conjunction with LREC 2026 as a half-day workshop on May 12th in Palma, on the… https://t.co/m4xRUyaohI
Please submit to the Computation and Written Language (CAWL) Workshop at LREC 2026! We invite 4-8 page papers on any aspects of written language and its connection to spoken language. Full CfP at https://t.co/DFdnzLW4KT, submission deadline February 20th. #LREC2026#NLProc
JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context!
https://t.co/maN3bT1X17