Assistant Professor @mcgillu, Core Academic Member @Mila_Quebec, Canada CIFAR AI Chair @CIFAR_News | interested in multilingual NLP | Disciple of Jesus
Hallelujah!
Iโm excited to share that Iโve been selected as a 2025 AI2050 Early Career Fellow by @Schmidtsciences
This yearโs fellows represent 42 institutions across eight countries, working to ensure AI benefits humankind.
Learn more at: https://t.co/Kwgcr64v27
We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF
Good voice agents need speech that sounds human, keeps up with a real conversation, and works in your language, on affordable hardware. Today @boson_ai and the @lmsysorg SGLang team release Higgs Audio v3, an open 4B text-to-speech model that hits all three.
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is whatโs new with Gemma 4 12B: ๐
โฐ Just 5 days to go! The Multimodal Alignment for a Pluralistic Society workshop @CVPR is happening June 3, 2026.
Check out the full schedule below ๐
๐ https://t.co/R3PQsii5fo
#CVPR2026
Releasing a new image editing benchmark -- TECCI: Tricky Edits of Collected and Curated Images.
Paper: https://t.co/JrTf2ttasP
Project website: https://t.co/CqGZpLblOB
If you missed out on contributing to Global PIQA and want to get involved, we will be announcing a new project soon. Fill in this form to be the first to hear about it: https://t.co/qFVGm0lwt2
Updated GlobalPIQA, now covers 141 languages, great participatory research work at global scale.
Paper: https://t.co/BrNT5QyOZL
Thank you to all the contributors especially @tylerachang and @linguist_cat for leading this.
Another shared task is coming this year.
We are releasing an expanded version of Global PIQA! It now covers 141 language varieties and includes parallel and non-parallel splits. We are also releasing an updated preprint.
๐ข Call for Papers:
6th Multilingual Representation Learning Workshop at EMNLP in Budapest, Hungary!
Join us and submit your works relating to multilingual NLP
Speakers to be announced, so stay tuned! ๐
More info in the CFP:
๐ https://t.co/ZnQjqyBASQ
Fresh on arXiv! ๐ Our new paper reformulates tokenisation as a linear program (LP), which we solve to get SOTA tokenisers! As a bonus, this LP allows us to know how close to optimal any tokeniser is! Check it out! ๐
Some new results I found surprising that Iโm tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit.
๐ฅThe VoiceMOS Challenge 2026 kicks off today! ๐ฅ
Please register using the link below: https://t.co/3mb3Dpxm1D
We will send you the challenge information afterwards!
Friday, July 31: Evaluation dataset release.
Friday, August 7: Predicted scores submission deadline.
๐ค AI text detectors are widely deployed in education and integrity workflows, but what are they actually tracking?
We report a surprising finding: text from base models is overwhelmingly judged as human by GPTZero and Pangram. ๐ (1/6)
#NLProc
Slides for our LREC tutorial are now available online:
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
Slides: https://t.co/WLPgwRsDQA
Reading list: https://t.co/iMnWtTSGNq
Includes resources on multilingual, multimodal, and low-resource language technologies.
W/
@shammur_absar Enamul Haque
#LREC2026 #NLProc #MultimodalAI #LLMs
๐ Introducing #GenAI4World Workshop
Generative AI for the World: The First Workshop on Globalizing Tasks, Evaluations, and Systems at #COLM2026
๐ October 9, 2026
๐ San Francisco, Co-located with @COLM_conf
๐ https://t.co/aFmN2jIPAN
Stay tuned for the updates!
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/