GenBench

@GenBench

State-of-the-art generalisation testing in NLP. Tag us for a RT of your NLP generalisation paper tweet!

Joined April 2022

15 Following

432 Followers

193 Posts

Pinned Tweet

GenBench @GenBench

about 2 years ago

The GenBench workshop is back! Do you work on generalisation (benchmarking) in #NLProc? Submit to the 2nd edition (https://t.co/XqMMYRW8vQ) co-located with #EMNLP2024. We have a regular track and a ✨collaborative benchmarking task (CBT)✨ that's fully LLM-focused this year (1/6)

1

22

11

1

13K

GenBench @GenBench

over 1 year ago

@robinomial @mrdrozdov @_dieuwke_ @najoungkim @kylelostat @sameer_ Two independently arrived at but similar conclusions 😁

0

2

0

0

49

GenBench @GenBench

over 1 year ago

That's a wrap! We (@glnmario, @christos_c, @_dieuwke_, @vernadankers, @khuyagbaatar_b, @a_kazemnejad & @ryandcotterell) thank all presenters, authors, reviewers and attendees!! The keynotes, the cats 😻, the posters, the talks and the lively panel: it was fantastic👏 🔥

GenBench's tweet photo. That's a wrap! We (@glnmario, @christos_c, @_dieuwke_, @vernadankers, @khuyagbaatar_b, @a_kazemnejad & @ryandcotterell) thank all presenters, authors, reviewers and attendees!! The keynotes, the cats 😻, the posters, the talks and the lively panel: it was fantastic👏 🔥 https://t.co/CVDO39xBtL

0

46

6

0

3K

GenBench @GenBench

over 1 year ago

@mrdrozdov @_dieuwke_ @najoungkim @kylelostat @sameer_ @robinomial We're discussing, our initial response: reward hacking is a subset of overfitting, but also, what do you mean with reward hacking? 😁

1

2

0

0

154

Who to follow

Desmond Elliott

Associate Professor at the University of Copenhagen. I work on Vision-Language models and Tokenization-free NLP. EMNLP 2026 PC.

Verified account

Prof @UofTCompSci. Director @JinesisLab. Founder @EuroSafeAI. Scientist@MPI_IS w/ @bschoelkopf. @CausalNLP, NLP4SocialGood @NLP4SG. Mentor&mentee @ACLMentorship

UCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab at @AI_UCL led by @_rockt, @egrefen, @robertarail, and @jparkerholder.

GenBench @GenBench

over 1 year ago

@mrdrozdov @_dieuwke_ @najoungkim @kylelostat @sameer_ @kylelostat @sameer_ @robinomial any thoughts? 😁

1

1

0

0

125

GenBench retweeted

Najoung Kim 🫠 @najoungkim

over 1 year ago

so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!

najoungkim's tweet photo. so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats! https://t.co/MWsS4t6tpw

1

60

5

2

3K

GenBench retweeted

Kanishka Misra 🌊 @kanishkamisra

over 1 year ago

Woohoo go tinlab! Congrats @HayleyRossLing @TeaAnd_OrCoffee @najoungkim!!

0

16

1

1

1K

GenBench @GenBench

over 1 year ago

Congratulations!

Najoung Kim 🫠 @najoungkim

over 1 year ago

so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!

najoungkim's tweet photo. so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats! https://t.co/MWsS4t6tpw

1

60

5

2

3K

0

3

0

0

237

GenBench @GenBench

over 1 year ago

Congrats to all the authors!

0

2

0

0

92

GenBench @GenBench

over 1 year ago

Closing remarks and best paper award by @vernadankers

GenBench's tweet photo. Closing remarks and best paper award by @vernadankers https://t.co/qwK4Bck4Uo

1

12

1

0

906

GenBench @GenBench

over 1 year ago

Best paper!

GenBench's tweet photo. Best paper! https://t.co/JlJe9Vx5P7

2

7

0

0

1K

GenBench @GenBench

over 1 year ago

And we also have an honourable mention!

GenBench's tweet photo. And we also have an honourable mention! https://t.co/h4dFbAzRmA

0

1

0

0

103

GenBench @GenBench

over 1 year ago

Come listen to the hot takes of our panelist in the Brickell room! Do we still need generalisation evaluation? 🧐 #GenBench2024 #EMNLP2024

GenBench's tweet photo. Come listen to the hot takes of our panelist in the Brickell room! Do we still need generalisation evaluation? 🧐 #GenBench2024 #EMNLP2024 https://t.co/eyZvc4taWk

0

15

3

0

1K

GenBench @GenBench

over 1 year ago

Still at the poster session? Come join us for keynote 3 by @sameer_!

GenBench's tweet photo. Still at the poster session? Come join us for keynote 3 by @sameer_! https://t.co/IUvSKR1Kfy

0

5

1

0

741

GenBench @GenBench

over 1 year ago

Did you miss the GenBench poster session? Don't worry we've got you, here are (nearly all) posters! 😉 #GenBench2024 #EMNLP2024 Next up: keynote by Sameer Singh at 3!

0

13

1

0

830

GenBench @GenBench

over 1 year ago

Last spotlight presentation: MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models https://t.co/4pyv01TbWE Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang 🙏

GenBench's tweet photo. Last spotlight presentation:

MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models

https://t.co/4pyv01TbWE

Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang 🙏 https://t.co/QRErUDuez6

0

1

0

0

71

GenBench @GenBench

over 1 year ago

Spotlight time! Mirella Bueno on MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks https://t.co/ARmGeONz2c

GenBench's tweet photo. Spotlight time! Mirella Bueno on

MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks

https://t.co/ARmGeONz2c https://t.co/20AECI9pKL

1

3

1

0

535

GenBench @GenBench

over 1 year ago

Continuing with Bastian Bunzeck, presenting The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns https://t.co/70kDItm3BB

GenBench's tweet photo. Continuing with Bastian Bunzeck, presenting

The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

https://t.co/70kDItm3BB https://t.co/Z7bbTEbcy7

1

3

0

0

86

GenBench @GenBench

over 1 year ago

@kylelostat Plus more cat pictures! 😻😻

GenBench's tweet photo. @kylelostat Plus more cat pictures! 😻😻 https://t.co/gKpQofGMmq

0

1

0

0

93

GenBench @GenBench

over 1 year ago

Join us for our second keynote by Olmo co-lead @kylelostat

GenBench's tweet photo. Join us for our second keynote by Olmo co-lead @kylelostat https://t.co/HQk8xTdSL5

1

16

3

3

1K

GenBench @GenBench

over 1 year ago

@kylelostat He got all the room snickering already at slide 3! 😁

1

2

0

0

97

Last Seen Users on Sotwe

Trends for you

Most Popular Users