Excited to receive an Outstanding Paper award for this work at @emnlpmeeting! Thanks to my co-authors George Foster and @markuseful! Updated version available here: https://t.co/XINveU1LvG
LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties.
https://t.co/nH6ZA33oa6
Machine translation is tough to evaluate, partly because most of what you throw at is too easy. That doesn't at all mean that translation is solved; we're just not doing a good job finding interesting inputs.
Come do a PhD with me at Columbia!
My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together!
pic: a run in central park
🗺️ Are we making our #LLMs multilingual, or anglocentric?
Much work brings languages closer to English, but that comes at the cost of crucial #cultural nuance.
@h__j___han tackles this trade-off with surgical steering, adapting LLMs to cultural contexts at inference time.
Our Google Translate team is bringing a strong presence to #ACL2025 in Vienna this week! 🇦🇹 My group is excited to present several of our latest papers. 👇 Don't miss them!
Two new datasets from Google Translate targeting high and low resource languages!
WMT24++: 46 new en->xx languages to WMT24, bringing the total to 55
SMOL: 6M tokens for 115 very low-resource languages
WMT24++: https://t.co/eDU1htGhZt
SMOL: https://t.co/y2xQWOXi5W
😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: https://t.co/HISmFuKe8I
Huggingface: https://t.co/TPCFw01rh0
🚨New machine translation dataset alert! 🚨We expanded the language coverage of WMT24 from 9 to 55 en->xx language pairs by collecting new reference translations for 46 languages in a dataset called WMT24++
Paper: https://t.co/owplgurKHP
Data: https://t.co/ODxHUEq5Xl
Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination.
https://t.co/4AjY5jSgX8
🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their float32 counterparts, but with a 50% smaller memory footprint. ✨ We hope this makes the XL and XXL models more accessible!
🔗 GitHub: https://t.co/dakbwDDBhx
🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin?
🔗 Code: https://t.co/dakbwDDBhx
LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics + test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes!
https://t.co/EeJvoXHn0w
@psingh522 Unfortunately this role requires that you are enrolled in a PhD program. But there are plenty of roles at Google for Master's students that you can find on the Google Careers page https://t.co/NjJrvQRsdL
Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: https://t.co/RCojsAMYFD