Lots of work on cross-lingual alignment encourages multilingual LLMs to generalize knowledge across languages.
But this push for uniformity creates a tension: what happens to knowledge that should remain local?
We look into this trade-off of transfer and cultural erasure:🧵
🗺️ Are we making our #LLMs multilingual, or anglocentric?
Much work brings languages closer to English, but that comes at the cost of crucial #cultural nuance.
@h__j___han tackles this trade-off with surgical steering, adapting LLMs to cultural contexts at inference time.
Large Model Inference Efficiency can be tackled from many angles, mixture of experts, efficient self-attention, quantisation, distillation, hardware acceleration..
But what if we could completely avoid redundant computational processing over context window?
In our NeurIPS'24 paper "where does in-context (task-location) learning happen"
We find three distinct regions for LLM inference time processing
1️⃣ [Task Location]; LLM discovers the task from reading instructions and examples
2️⃣ [Task Processing]; After task location, the model no longer requires any self-attention over the prompts.
3️⃣ [Task Completion]; final layers of processing where the model no longer requires self-attention over the query.
===> Implications for Industry
✅ ~50% In Computational savings (theoretical)
If we avoided redundant context processing in later layers of the model
✅ Very sample efficient adaptation of LLMs to task specific Models.
Contrary to common wisdom on Fine-tuning, LoRA layers are most effective at earlier layers of the model compared to the later ones.
===> Implications for Academia:
* New Interpretability technique progressively masks out all self-attention to the context,
* Task Location layer is not affected by the number of prompt examples provided to the model.
* Related Work with similar findings are Task Vectors (@RoeeHendel et al) , Function Vectors (@ericwtodd et al), providing additional supporting evidence for this phenomena.
💻
Paper: https://t.co/ICDjxbNeuv
Github: https://t.co/EhpH1E2FXI
Models: Llama3.1-8B, LLama3.1-8B-Instruct, Starcoder2-7B, GPTN2.7B, Bloom3B
Tasks: Machine Translation (en-fr, fr-en, en-pt), Code Generation (en-py)
EAMT best thesis award - closes on January 31st. Completed an MT-related PhD in 2024? In Europe, Africa or Middle East. Then why not submit your thesis. https://t.co/vo0G6L5c2D
I’m super thrilled to have won the AMTA Best Thesis Award!!
A huge thanks to the AMTA organizers for this recognition ☺️
See you all in Chicago https://t.co/k9nJBl1AcI
I’m super thrilled to have won the AMTA Best Thesis Award!!
A huge thanks to the AMTA organizers for this recognition ☺️
See you all in Chicago https://t.co/k9nJBl1AcI
On behalf of the AMTA Board of Directors, I am pleased to announce the winner of the first-ever AMTA Best Thesis Award: Dr. Eleftheria Briakou (@ebriakou) for her thesis “Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages”. [1/n]
I'm bummed that family obligations prevented me from presenting this epic paper. This work represented a long journey for me. I first began working on the language of Diplomacy in 2015, and I struggled for years to get funding to build a bot that could play it ...
My mom wants to come out of retirement. She was a software validation engineer working on human machine interfaces. She (and I) have no idea where to look. She just wants to spend time testing the things that people build. Does anyone know where she could look??
✨XLAVS-R will be presented during today’s (August 13th) #ACL2024 poster session 4, starting at 10:30 AM.
Looking forward to talking with people interested in our work!
Congratulations to Xuan Zhang (advised by @kevinduh) on successfully defending her PhD thesis “Hyperparameter Optimization for Neural Machine Translation Systems”.
https://t.co/LVBqBbT8CV
Three postdocs were too tired to go to the party on the last night of SICB this year, so we decided to order pizza to the hotel and write a paper together instead. Out in @ICB_journal now!
https://t.co/thRWfhwDHC
🚨 Excited to share our new work on **confidence calibration** in LLMs!
LLMs are often badly calibrated & overconfident, explicitly (eg. "I'm 100% sure") and implicitly, eg. giving details/authoritative tone.
We address both w/ a pragmatic speaker-listener multi-agent method
🧵
Deadline is 6/6 for the AMTA thesis award
Apply if you finished a PhD in MT in the Americas in the last year!
https://t.co/oEgD1Sj4Xb
questions? reach out to [email protected] (Rebecca Knowles and Akiko Eriguchi).
🏆 Thrilled to share the launch of the AMTA Best Thesis Award, which aims to highlight the achievements of a recent PhD graduate at an institution in the Americas whose thesis has focused on topics related to machine translation. [1/2]