We are happy to announce our @NeurIPSConf workshop on LLM evaluations!
Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges.
For details: https://t.co/Rithk3osFH. 1/3
I’ll be @NeurIPSConf all week and would love to connect on LLM data, evaluation, benchmarking, and scaling laws. If you’re working on related problems, feel free to reach out.
PS: Don’t miss our one-of-a-kind workshop on LLM evaluation: https://t.co/dlnmpNvMPo
🚀 We are thrilled to announce that the LLM Eval Workshop @NeurIPSConf received 244 excellent submissions! 188 papers will be presented in poster sessions, and 5 exceptional works have been selected for oral talks.
Check out the accepted papers: https://t.co/aBFVYkCb1c
🧵👇
(1/5) My work, “LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests”, has been accepted for a contributed talk at @NeurIPSConf 2025 Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling workshop @LLM_eval#NeurIPS#LLM#Evaluation#Robustness#AI#ML
Sketched on a few Parisian summer nights with a friend, @ChrisInterno . If you care about (causal) identification in a semi-synthetic future, we’d value your read and critique.
Preprint: https://t.co/GDN3nBlBww
Accepted at @LLM_eval workshop @NeurIPSConf
The Narcissus Hypothesis:
--Recursive training on semi-synthetic corpora enforcing human alignment induces a Social Desirability Bias: world-models (Narcissus) aim to please rather than represent, polluting data lakes and charming us (Echo) into hanging on their every word.