Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
MT Marathon this year organised by @HelsinkiNLP was a great week - I presented my research on chain-of-thought for machine translation, worked on a mini-research project, and explored the wonderful city of Helsinki including a few trips to the sauna 🫠
Our paper was accepted at ICLR 2025 as a Spotlight! I will present our poster on Saturday April 26, 3-5pm, Poster #241. See you there!
https://t.co/cRMX6fxcEO
Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
@SimonHiaubeng@ElliotMurphy91 The principle Maximise Minimal Means is part of one version of minimalist theory. But it's not UG - it's a third factor, domain-general constraint
@Linguist_UR@ElliotMurphy91 Merge, maybe Agree, maybe Labeling. Though I believe there's work ongoing to attribute Merge itself to third factor, domain-general constraints
@RaphaelMerx I'm a fan of this paper! We'd expect exactly the same for Kalamang (if we could collect an OOD test set). In the appendix we show too that the 100-example test set consists of short, easy sentences so a ChrF++ of ~30 is really not that proficient
Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
We show that a grammar book provides little or even no help for translation in LLMs, questioning the recent "truly zero-shot translation" --- no data no gain, still 🧐
@JeffDean https://t.co/K5MN4xGEA9
Actually we find LLMs learn most/all translation ability from parallel sentences in the book, not the grammar.
And we can predict translation performance just from prompts' test set vocab coverage!
But we do find that grammar can help *linguistic* tasks
Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
@jxmnop https://t.co/1OC5FtiTVb
It turns out LLMs learn most or all translation ability from parallel sentences in the book, not the grammar.
And fine-tuning a small translation model matches or beats long-context LLM results!
(plus Kalamang parallel data has been online since Nov 2020)
Our work “Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?” is now on arXiv! https://t.co/cRMX6fwEPg - in collaboration with @davidstap, @diwuNLP, @c_monz , and Khalil Sima'an from @illc_amsterdam and @ltl_uva 🧵
More generally, we suggest that data collection efforts for multilingual XLR tasks like translation are best focused on parallel data over linguistic description, given the advantages in computational cost, token efficiency, availability!
Our results emphasise the importance of task-appropriate data for XLR languages: parallel data for translation, and grammatical data for linguistic tasks.