The GenBench workshop is back! Do you work on generalisation (benchmarking) in #NLProc? Submit to the 2nd edition (https://t.co/XqMMYRW8vQ) co-located with #EMNLP2024. We have a regular track and a ✨collaborative benchmarking task (CBT)✨ that's fully LLM-focused this year (1/6)
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
Did you miss the GenBench poster session? Don't worry we've got you, here are (nearly all) posters! 😉 #GenBench2024#EMNLP2024 Next up: keynote by Sameer Singh at 3!
Last spotlight presentation:
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models
https://t.co/4pyv01TbWE
Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang 🙏
Continuing with Bastian Bunzeck, presenting
The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns
https://t.co/70kDItm3BB