Allen Chang @AllenCChang - Twitter Profile

Pinned Tweet

9 months ago

What if survey-derived rubrics 📋 graded ChatGPT instead of vibes? We benchmark LLMs & deep research systems across 75 research fields 🩺🧬🦾⚗️🏛️🎭💹: Perplexity deep research wins > 82% of head-to-heads vs the next best! w/ @realliyifei, @cmalaviya11, and @yatskar

Li S. Yifei

@realliyifei

9 months ago

How well can LLMs & deep research systems synthesize long-form answers to *thousands of research queries across diverse domains*? Excited to announce 🎓📖 ResearchQA: a large-scale benchmark to evaluate long-form scholarly question answering at scale across 75 fields, using queries 💬and rubrics📋that are mined from survey articles 📚! Website: https://t.co/lZ29ZEZ2Al Paper: https://t.co/zrwQBhBMKo Dataset: https://t.co/Z5xp5wEBp7 Code: https://t.co/PAFJ0YkKCH

realliyifei's tweet photo. How well can LLMs & deep research systems synthesize long-form answers to *thousands of research queries across diverse domains*?

Excited to announce 🎓📖 ResearchQA: a large-scale benchmark to evaluate long-form scholarly question answering at scale across 75 fields, using queries 💬and rubrics📋that are mined from survey articles 📚!

Website: https://t.co/lZ29ZEZ2Al
Paper: https://t.co/zrwQBhBMKo
Dataset: https://t.co/Z5xp5wEBp7
Code: https://t.co/PAFJ0YkKCH

1

62

24

32

9K

0

15

10

5

2K

AllenCChang retweeted

Jesse Thomason @_jessethomason_

3 months ago

For prospective PhD students, I plan to hire in this coming application cycle (Fall 2026) with a focus on robotics, speech, and signed languages.

1

10

2

0

2K

AllenCChang retweeted

Yue Yang

@YueYangAI

3 months ago

🎯 We release MolmoPoint, the best open model in GUI grounding 💻 by training on purely synthetic screenshots. We open-source all our models, data, and generation code. Plug it into your agents! Demo: https://t.co/ANOfIa3iGm Model: https://t.co/wwThFOlbRT Data: https://t.co/M2w7zvE4Kc Code: https://t.co/3aoCP7KzOy

YueYangAI's tweet photo. 🎯 We release MolmoPoint, the best open model in GUI grounding 💻 by training on purely synthetic screenshots. We open-source all our models, data, and generation code. Plug it into your agents!
Demo: https://t.co/ANOfIa3iGm
Model: https://t.co/wwThFOlbRT
Data: https://t.co/M2w7zvE4Kc
Code: https://t.co/3aoCP7KzOy

0

84

12

45

7K

AllenCChang retweeted

Rulin Shao @RulinShao

7 months ago

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model - are grounded on search knowledge 🧵

RulinShao's tweet photo. 🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀

The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics:
- co-evolve with the policy model
- are grounded on search knowledge
🧵

8

560

118

309

134K

Who to follow

Matthew Wilson

@matwilso

trying to make the future happen @Tesla_AI

Abrar Anwar

@_abraranwar

CS PhD student at @USCViterbi + intern @nvidia | prev intern @Cornell @SandiaLabs | undergrad @UTCompSci

Zijian Hu

@zijianhu

Pre-training @reflection_ai. Ex-@scale_AI, @tiktok_us. @USC/@CSatUSC Alumni. Building open-source LLM

AllenCChang retweeted

Alex Spangher @ Neurips2025 @AlexanderSpangh

7 months ago

✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n

AlexanderSpangh's tweet photo. ✨ Very overdue update:

I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!!

Please help me spread the word! [Thread] 1/n https://t.co/J0crGi19X6

40

736

142

215

92K

AllenCChang retweeted

Taylor Sorensen @ma_tay_

8 months ago

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

ma_tay_'s tweet photo. 🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!)

We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈

1/🧵 https://t.co/P9PJgT9u5j

5

197

49

136

68K

AllenCChang retweeted

Leena Mathur

@lmathur_

about 1 year ago

Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and external knowledge. We introduce Social Genome to study and advance this form of reasoning in models! New paper w/ Marian Qian, @pliang279, & @lpmorency!

lmathur_'s tweet photo. Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and external knowledge.

We introduce Social Genome to study and advance this form of reasoning in models!

New paper w/ Marian Qian, @pliang279, & @lpmorency! https://t.co/ql4RnjL5Ve

2

38

13

6

6K

AllenCChang retweeted

Tianyi Lorena Yan @LorenaYannnnn

about 1 year ago

When answering queries with multiple answers (e.g., listing cities of a country), how do LMs simultaneously recall knowledge and avoid repeating themselves? 🚀 Excited to share our latest work with @robinomial! We uncover a promote-then-suppress mechanism: LMs first recall all answers and then suppress previously generated ones. https://t.co/O4LNZ1yoH5 👇🧵

LorenaYannnnn's tweet photo. When answering queries with multiple answers (e.g., listing cities of a country), how do LMs simultaneously recall knowledge and avoid repeating themselves?

🚀 Excited to share our latest work with @robinomial! We uncover a promote-then-suppress mechanism: LMs first recall all answers and then suppress previously generated ones.

https://t.co/O4LNZ1yoH5

👇🧵

4

109

22

53

17K

AllenCChang retweeted

Tejas Srinivasan @_Tejas_S_

over 1 year ago

People are relying on AI assistance to make all kinds of decisions. *How* they incorporate AI recommendations is influenced by previous user-AI interactions and their evolving trust in the AI, which AI assistants are typically blind to. But what if they weren’t? We show that having AI assistants adapt their behavior in response to user trust levels can mitigate under- and over-reliance! Pre-print: https://t.co/5BDyifZ6sQ

_Tejas_S_'s tweet photo. People are relying on AI assistance to make all kinds of decisions. *How* they incorporate AI recommendations is influenced by previous user-AI interactions and their evolving trust in the AI, which AI assistants are typically blind to. But what if they weren’t?

We show that having AI assistants adapt their behavior in response to user trust levels can mitigate under- and over-reliance!

Pre-print: https://t.co/5BDyifZ6sQ

2

64

15

24

6K

AllenCChang retweeted

Liam Dugan @LiamDugan_

over 1 year ago

Last Friday I gave an hour long talk at the Penn ILST Seminar about the particular linguistic features that characterize AI text (e.g. "delve", repetitive syntax, agreeable tone) and how they affect detectability. Highly recommend giving it a listen. https://t.co/DqXenhasRC

0

18

5

3

2K

Allen Chang @AllenCChang

over 1 year ago

@ndennler Congrats, Nathan!! 🥳🥳

1

0

61

AllenCChang retweeted

Leena Mathur

@lmathur_

over 1 year ago

Presenting this #EMNLP2024 Social-AI position paper today at 4 pm in Riverfront Hall!

0

22

3

0

1K

AllenCChang retweeted

Tejas Srinivasan @_Tejas_S_

over 1 year ago

Come by Poster Session A tomorrow to hear @sayan__ghosh tell you why your preference eval is probably broken (and how you can fix it!)

0

21

4

0

2K

AllenCChang retweeted

Jaspreet Ranjit

@jaspreetranjit_

over 1 year ago

Thank you so much @SpecNews1SoCal @jaskang21 for featuring our work on OATH-Frames: Characterizing Online Attitudes towards Homelessness with LLM Assistants👇 🖥️📈 https://t.co/x57TLgrQrX 🗞️ https://t.co/GxFJ87WmC7 @CSatUSC @nlp_usc @uscsocialwork @CAIS_USC @USCViterbi @swabhz

1

27

7

2

2K

AllenCChang retweeted

Leena Mathur

@lmathur_

over 1 year ago

Our workshop will start in a few hours! > #ECCV2024 9/29 AM workshop > Suite 2, Allianz MiCo 🇮🇹 > Zoom info on our website (QR code below) Looking forward to the discussion today and learning from our keynote speakers! https://t.co/todVejqzpr

lmathur_'s tweet photo. Our workshop will start in a few hours!
> #ECCV2024 9/29 AM workshop
> Suite 2, Allianz MiCo 🇮🇹
> Zoom info on our website (QR code below)

Looking forward to the discussion today and learning from our keynote speakers!

https://t.co/todVejqzpr https://t.co/dUNfXnvoMw

0

13

4

0

2K

AllenCChang retweeted

Ai2 @allen_ai

over 1 year ago

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it for yourself: https://t.co/IWKKUdfxlg

63

1K

274

641

515K

AllenCChang retweeted

Tuhin Chakrabarty

@TuhinChakr

over 1 year ago

GPT4-o1-preview from @OpenAI now gets 80.4% (compared to 14% performance of GPT4o) on the Connections game in 1 single attempt. Saw a thread on LinkedIn about similar bump on Wordle. I also attached some other models in comparison. This is very impressive given how hard the task is As someone who isn't so much about LLM scientivism, I am very confident the model was trained on these tasks. A sad and depressing trend where these models try to incorporate everything in training distribution make it super hard for researchers interested in generalization #LLM #GenAI

TuhinChakr's tweet photo. GPT4-o1-preview from @OpenAI now gets 80.4% (compared to 14% performance of GPT4o) on the Connections game in 1 single attempt. Saw a thread on LinkedIn about similar bump on Wordle. I also attached some other models in comparison. This is very impressive given how hard the task is

As someone who isn't so much about LLM scientivism, I am very confident the model was trained on these tasks. A sad and depressing trend where these models try to incorporate everything in training distribution make it super hard for researchers interested in generalization #LLM #GenAI

6

23

3

9

6K

AllenCChang retweeted

Sachin Kumar @shocheen

almost 2 years ago

You think your model just fell out of a coconot tree 🥥? It should not always comply in the context of all it has seen in the request. Check out our paper on contextual noncompliance.

3

57

8

13

16K

Allen Chang @AllenCChang

almost 2 years ago

@_Tejas_S_ @jieyuzhao11 Ugh, sorry to hear that you had to go through this 6 times ☠️. Can't imagine what else goes on behind closed doors

0

394

AllenCChang retweeted

Tejas Srinivasan @_Tejas_S_

almost 2 years ago

Our work on improving selective prediction for VLMs has been accepted to #ACL2024 Findings! Read on to learn how you can make your VLM both reliable *and* usable ✨ Paper: https://t.co/k0Uvi42u4m Code: https://t.co/LC4R96xQUi

1

44

7

2

10K

Allen Chang

@AllenCChang

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users