Mind the gap when evaluating LLMs with multiple-choice QA 🚨
In our #EMNLP2025 paper, we show that a tiny space tokenization can shift accuracy by up to 11% – and even reshuffle leaderboards.
Big thanks to my great co-authors @minhducbui_nlp & @kelina1124!
🧐 Evaluating your LLM with multiple-choice question answering?
🧵 A tiny space in the prompt can make accuracy jump by 11% – and even reshuffle model rankings.
#EMNLP2025#NLP#AI#LLM#Evaluation
Your dialect could change how AI perceives you. 🗣️ In our #EMNLP2025 paper, we uncover systematic German dialect bias in leading LLMs.
Grateful to my amazing collaborators who made this work possible: @CarolinHolterm* @vjhofmann@anne_lauscher@kelina1124 🙌
"You speak Bavarian? Then you must be uneducated and closed-minded!"
🤯 Not your opinion? Good. But it might be your LLM's!
🧵 In our #EMNLP2025 paper we uncover concerning dialect bias in recent LLMs - including GPT-5.
#AI#Bias#Dialect#Fairness#LLM#NLProc#Safety
🏆 Our paper has received the Outstanding Paper Award at @naaclmeeting! 🎉 Many thanks to my co-authors @kelina1124 and @anne_lauscher!
We introduce Multi3Hate, a novel multimodal and multilingual parallel hate speech dataset annotated by a multicultural set of annotators.
Mario Sanz, estudiante de GII, primer premio nacional Laboral Kutxa "Transformación de las finanzas para la sociedad" por su TFG en el que aplicaba IA explicable y modelos de lenguaje grandes al riesgo de crédito. https://t.co/LHVBMz3MBv
Tercer premio:
Mario Sanz Guerrero
Evaluación del rendimiento de modelos de riesgo crediticio con algoritmos de boosting y transfer learning sobre modelos grandes de lenguaje