Amanda Bertsch @abertsch72 - Twitter Profile

Pinned Tweet

7 months ago

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

abertsch72's tweet photo. Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong! https://t.co/owTnNO3RF9

13

355

67

219

81K

abertsch72 retweeted

Yapei Chang

@YapeiChang

6 days ago

post-trained models are more helpful, but collapse toward a narrow range of possible answers 🍎 with ReDiPO, we show how to recover the lost diversity with a simple DPO data pipeline, while largely preserving instruction-following and safety great work led by @vsamuel2003 !

0

34

5

13

4K

abertsch72 retweeted

Apurva Gandhi

@apurvasgandhi

about 1 month ago

Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10

23

713

117

921

134K

Amanda Bertsch @abertsch72

about 1 month ago

This was an absolute joy of a project with wonderful mentors at Ai2: @mechanicaldirk @soldni @kylelostat @HannaHajishirzi (plus @gneubig + Matt Gormley!) Paper: https://t.co/1ETmDt0ZB8 Models: https://t.co/wAVntl7Rv8

1

11

0

331

Who to follow

Zhaofeng Wu

@zhaofeng_wu

PhD student @MIT_CSAIL | Previously @allen_ai | MS'21 BS'19 BA'19 @uwnlp | 💼 on the industry job market

Zekun Wang (ZenMoore) 🔥

@ZenMoore1

#LLM #MLLM #GenAI Researcher @Kling_ai

Kayo Yin

@kayo_yin

PhD student @berkeley_ai. AI persuasion, safety, sign language. Prev @carnegiemellon @polytechnique, intern @msftresearch @deepmind. 🇫🇷🇯🇵

Amanda Bertsch @abertsch72

about 1 month ago

New paper! https://t.co/1ETmDt0ZB8 This tackles a puzzle we found during the training of Olmo 3: how could two models with nearly identical short-context performance (and trained on the same data!) behave completely differently after long context extension?

Ai2 @allen_ai

about 1 month ago

Recipes for teaching language models to handle long inputs don't work equally well across model families. We wanted to know why—is it the architecture, the training data, or both? 🧵

allen_ai's tweet photo. Recipes for teaching language models to handle long inputs don't work equally well across model families.

We wanted to know why—is it the architecture, the training data, or both? 🧵 https://t.co/2WyPBZKbEO

5

83

15

59

25K

3

111

28

51

15K

Amanda Bertsch @abertsch72

about 1 month ago

Check out the paper for much more analysis, including estimating long context performance from short context (really hard!), additional pretraining settings that DON'T matter for long context (float8 linear layers!), and analysis of attention distributions for each model.

1

11

0

305

abertsch72 retweeted

Yanhong Li

@YanhongLi2062

about 1 month ago

[1/6] Late to the ICLR 2026 posting party!! Paper with @SonglinYang4 , @tanshawn , @MayankMish98 , @rpanda89, @jzhou_jz , and @yoonrkim : Distilling to Hybrid Attention Models via KL-Guided Layer Selection Which attention layers are actually worth keeping in hybrid models? 🧵

YanhongLi2062's tweet photo. [1/6] Late to the ICLR 2026 posting party!!

Paper with @SonglinYang4 , @tanshawn , @MayankMish98 , @rpanda89, @jzhou_jz , and @yoonrkim :

Distilling to Hybrid Attention Models via KL-Guided Layer Selection

Which attention layers are actually worth keeping in hybrid models? 🧵

9

82

13

29

8K

abertsch72 retweeted

Myra Cheng @chengmyra1

2 months ago

So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior.

chengmyra1's tweet photo. So excited that our work is on the cover of Science!!! We find that AI models overly affirm users, even when they describe harmful actions. Advice from sycophantic AI made people more self-centered, yet people prefer and trust it more, which may promote this model behavior. https://t.co/47qi56UDsg

11

361

83

107

46K

abertsch72 retweeted

Natasha Jaques

@natashajaques

3 months ago

The paper I’ve been most obsessed with lately is finally out: https://t.co/KgdWKknCJK! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

natashajaques's tweet photo. The paper I’ve been most obsessed with lately is finally out: https://t.co/KgdWKknCJK! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content.

We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

45

1K

392

1K

258K

abertsch72 retweeted

Nathan Lambert

@natolambert

3 months ago

Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear. It's incredible timing to release a fully open model so people can study how these architecture changes impact the full stack. Personally, I learned a lot in making the post-training work. Even with the data being identical for pretraining, post-training is very different! In particular, the OSS tools for these new architectures is really limited. New architectures are much slower than standard transformers or popular models like DeepSeek MoEs. This is work that we can do together to keep pushing the frontier of efficient, open models. This work was led by @lambdaviking @tyleraromero and others. I got to play a smaller part in making post-training work, super fun project! I've written up a blog post that explains why this matters and hybrid models didn't work a few years ago when Mamba was super popular. Plus, this paper is a great entry point for modern deep learning / language modeling scaling theory. Enjoy and send feedback!

natolambert's tweet photo. Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear. It's incredible timing to release a fully open model so people can study how these architecture changes impact the full stack.

Personally, I learned a lot in making the post-training work. Even with the data being identical for pretraining, post-training is very different! In particular, the OSS tools for these new architectures is really limited. New architectures are much slower than standard transformers or popular models like DeepSeek MoEs. This is work that we can do together to keep pushing the frontier of efficient, open models.

This work was led by @lambdaviking @tyleraromero and others. I got to play a smaller part in making post-training work, super fun project!

I've written up a blog post that explains why this matters and hybrid models didn't work a few years ago when Mamba was super popular. Plus, this paper is a great entry point for modern deep learning / language modeling scaling theory. Enjoy and send feedback!

18

488

71

196

77K

abertsch72 retweeted

Ai2 @allen_ai

4 months ago

LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains. How2Everything evals/trains for this at scale. 🧵

allen_ai's tweet photo. LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains.

How2Everything evals/trains for this at scale. 🧵

1

171

20

140

59K

abertsch72 retweeted

Kyle Lo

@kylelostat

6 months ago

olmo 3 paper finally on arxiv 🫡 thx to our teammates esp folks who chased additional baselines thx to arxiv-latex-cleaner and overleaf feature for chasing latex bugs thx for all the helpful discussions after our Nov release, best part of open science is progressing together!

kylelostat's tweet photo. olmo 3 paper finally on arxiv 🫡

thx to our teammates esp folks who chased additional baselines

thx to arxiv-latex-cleaner and overleaf feature for chasing latex bugs

thx for all the helpful discussions after our Nov release, best part of open science is progressing together! https://t.co/FGdoEIYUFF

15

444

76

157

57K

abertsch72 retweeted

Ai2 @allen_ai

6 months ago

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵

allen_ai's tweet photo. Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵 https://t.co/qgsn4QNvJP

22

664

102

264

120K

abertsch72 retweeted

Ai2 @allen_ai

6 months ago

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

allen_ai's tweet photo. Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵 https://t.co/i8Ia5yGJoI

24

729

108

215

264K

abertsch72 retweeted

Marc Marone

@ruyimarone

6 months ago

I'm on the job market and at #neurips2025! Looking for research roles around data for foundation models and would love to chat with folks - resume/site in my bio. I've recently worked @AIatMeta and @databricks and publish papers with my awesome collaborators @jhuclsp!

4

49

18

4

11K

abertsch72 retweeted

Akari Asai

@AkariAsai

7 months ago

1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) 🎓 I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in 🧵

19

643

120

316

148K

abertsch72 retweeted

John Hewitt @johnhewtt

7 months ago

Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park

johnhewtt's tweet photo. Come do a PhD with me at Columbia!

My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together!

pic: a run in central park https://t.co/XZAZJ1ALk9

13

949

128

322

79K

abertsch72 retweeted

Luca Soldaini 🎀

@soldni

7 months ago

We are releasing a LARGE new collection of science PDFs we linearized with olmOCR! great for our first long context model. It was fun to use synth data to boost long context–all using Olmo 2! Older bro helping younger sibiling 🥹

soldni's tweet photo. We are releasing a LARGE new collection of science PDFs we linearized with olmOCR! great for our first long context model.

It was fun to use synth data to boost long context–all using Olmo 2! Older bro helping younger sibiling 🥹 https://t.co/QdiKGRA0Fw

2

43

4

5

3K

Amanda Bertsch

@abertsch72

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users