How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.
"How deep is your love? Like the ocean?"
Looking out at the Balearic Sea got me thinking about ancient feelings. Just how deep *did* the Romans and Greeks love?
Come find out at the first poster session of @LREC2026!
📍 Poster 1196
🕐 11:20 - 13:00
The world’s largest NLP conference with almost 2,000 papers presented, ACL 2025 just took place in Vienna! 🎓✨Here is a quick snapshot of the event via a short interview with one of the authors whose work caught my attention.
🎥 Watch: https://t.co/Zgry9gX9ip
#acl2025NLP #acl2025
I am honored to receive the 2025 #GSCL Best Thesis Award at #KONVENS in Hildesheim for my Master’s thesis, which investigates multilinguality and develops language models for Ancient Greek and Latin. Thank you to my mentors and collaborators. I look forward to what comes next.
Looking at Bruegel's Tower of Babel in Vienna makes you wonder: How can multilingual language models overcome the language barriers? Find out tomorrow!
📍 Level 1 (ironic, right?), Room 1.15-1
🕐 2 PM
#ACL2025NLP
How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.
How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.
@crestonbrooks As for loss patterns: Unfortunately, detailed loss analyses are challenging with BLOOM since only 6 checkpoints were published. I'm currently investigating potential grokking phenomena in more controlled toy settings where we can track loss curves more comprehensively.
How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.
@crestonbrooks Thanks for the interesting questions!
Regarding concept curriculum: We examined relatively "general" concepts and couldn't identify a clear pattern in which concepts get "translated" first. However, we did observe that different languages follow distinct patterns.
This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!
Read the full paper: https://t.co/54gBD0hH3t
Work by @crestonbrooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, me, Karthik Narasimhan, and Barbara Graziosi
What did Aristotle actually write? We think we know, but reality is messy. As Ancient Greek texts traveled through history, they were copied and recopied countless times, accumulating subtle errors with each generation. Our new #NAACL2025 findings paper tackles this challenge.
Our work brings new computational methods to a field traditionally dominated by manual scholarship, potentially accelerating the discovery of textual errors that have remained hidden for centuries.