Zeming Chen (Eric) @eric_zemingchen - Twitter Profile

Pinned Tweet

about 2 months ago

Our paper PERK is accepted to #ICLR2026 🎉 Long-context reasoning is one of the most critical skills a frontier model must master. The standard approach: feed the context into the model’s attention and hope the model figures out how to reason over its content. We show that Test-Time Learning (TTL) is a more effective way to process long context than standard long-range attention. Instead of just reading the context, the model should learn it at test time, by internalizing it into LoRAs via next-token prediction. Come find us at our poster — Pavilion 3 #509, Thu, Apr 23 • 10:30 AM – 1:00 PM — if you want to dig in further.

eric_zemingchen's tweet photo. Our paper PERK is accepted to #ICLR2026 🎉

Long-context reasoning is one of the most critical skills a frontier model must master. The standard approach: feed the context into the model’s attention and hope the model figures out how to reason over its content.

We show that Test-Time Learning (TTL) is a more effective way to process long context than standard long-range attention.

Instead of just reading the context, the model should learn it at test time, by internalizing it into LoRAs via next-token prediction.

Come find us at our poster — Pavilion 3 #509, Thu, Apr 23 • 10:30 AM – 1:00 PM — if you want to dig in further.

1

11

3

1

2K

eric_zemingchen retweeted

Badr AlKhamissi @bkhmsi

about 2 months ago

Excited to be in Rio for #ICLR2026 🇧🇷 I'll be presenting our work, Mixture of Cognitive Reasoners (aka MiCRo), on Friday at Pavilion 3, 10:30 AM (#1610). Come say hi :D Happy to chat about NeuroAI, representational & cultural alignment, and/or test-time learning 🧠

1

31

4

1

2K

eric_zemingchen retweeted

Silin Gao @silin_gao

about 2 months ago

News at Rio de Janeiro! Our paper “AbstRaL: Augmenting LLMs’ Reasoning by Reinforcing Abstract Thinking” will be presented at #ICLR2026 soon! Welcome to our poster session --> Time: Friday, April 24, 10:30am – 1:00pm (Rio local time) Room: Pavilion 4, P4-#4615

1

8

2

1

473

Zeming Chen (Eric)

@eric_zemingchen

about 2 months ago

Work done with my amazing co-authors: @agromanou, @gail_w, and @ABosselut Project Page: https://t.co/Hf4le4zFhK Paper: https://t.co/tFC5HmgOOq

0

3

1

0

113

Who to follow

Shangbin Feng

@shangbinfeng

PhD student @uwcse @uwnlp. Model collaboration, for compositional intelligence and collaborative development. #水文学家

Zineng Tang

@ZinengTang

PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.

Bill Yuchen Lin

@billyuchenlin

RL for coding @xAI @SpaceX Affiliate Assistant Prof @UW. Ex: @allen_ai; Google, Meta FAIR.

Zeming Chen (Eric)

@eric_zemingchen

about 2 months ago

Our paper PERK is accepted to #ICLR2026 🎉 Long-context reasoning is one of the most critical skills a frontier model must master. The standard approach: feed the context into the model’s attention and hope the model figures out how to reason over its content. We show that Test-Time Learning (TTL) is a more effective way to process long context than standard long-range attention. Instead of just reading the context, the model should learn it at test time, by internalizing it into LoRAs via next-token prediction. Come find us at our poster — Pavilion 3 #509, Thu, Apr 23 • 10:30 AM – 1:00 PM — if you want to dig in further.

1

11

3

1

2K

Zeming Chen (Eric)

@eric_zemingchen

about 2 months ago

What does TTL with PERK actually get you? 1. Consistently beats attention across a wide range of reasoning tasks. 2. Much more robust to context-length variation at test time. 3. Much more robust to relevant information positions. 4. Scales more efficiently than attention at inference.

1

0

120

eric_zemingchen retweeted

Yiyang Feng

@Yiyang2375

3 months ago

Can LLMs truly reason with knowledge that conflicts with what they already believe? Our paper TRACK, accepted as a Virtual Oral @eaclmeeting #EACL2026, shows the answer is often no. Even when you hand them the correct facts. Find out how we did this ⬇️

Yiyang2375's tweet photo. Can LLMs truly reason with knowledge that conflicts with what they already believe?

Our paper TRACK, accepted as a Virtual Oral @eaclmeeting #EACL2026, shows the answer is often no. Even when you hand them the correct facts.

Find out how we did this ⬇️ https://t.co/srDp0h7gHK

1

18

5

9

2K

Zeming Chen (Eric)

@eric_zemingchen

6 months ago

Love seeing more work on test-time learning for long context! We explored a similar direction in PERK, encoding long contexts into LoRA parameters via test-time learning. We found similar strong gains in long-context reasoning, especially better performance on length generalization (train on 8K and extrapolate to 128K). https://t.co/TdraAxzWG5

0

211

eric_zemingchen retweeted

Negar Foroutan @negarforoutan

6 months ago

1/ 🌍 How does mixing data from hundreds of languages affect LLM training? In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages. 🧵👇

2

106

29

51

11K

eric_zemingchen retweeted

Schmidt Sciences @schmidtsciences

7 months ago

We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF

schmidtsciences's tweet photo. We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF https://t.co/ZgHFfTYNU1

6

183

30

74

260K

eric_zemingchen retweeted

Badr AlKhamissi @bkhmsi

8 months ago

🚀 Excited to share a major update to our “Mixture of Cognitive Reasoners” (MiCRo) paper! We ask: What benefits can we unlock by designing language models whose inner structure mirrors the brain’s functional specialization? More below 🧠👇 https://t.co/LVBLQ9yFlA

bkhmsi's tweet photo. 🚀 Excited to share a major update to our “Mixture of Cognitive Reasoners” (MiCRo) paper!

We ask: What benefits can we unlock by designing language models whose inner structure mirrors the brain’s functional specialization?

More below 🧠👇
https://t.co/LVBLQ9yFlA https://t.co/nGZU2MfNTB

6

475

93

281

32K

eric_zemingchen retweeted

Deniz Bayazit @denizbayazit

9 months ago

1/🚨 New preprint How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks. #interpretability

denizbayazit's tweet photo. 1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability https://t.co/9CxMF5p4f4

2

48

13

17

5K

Zeming Chen (Eric)

@eric_zemingchen

11 months ago

In collaboration with my wonderful co-authors: @agromanou, @gail_w , & @ABosselut! Links 🔗: Project Page: https://t.co/Hf4le4Ad7i Paper: https://t.co/tFC5HmhmDY Code: https://t.co/ET6uxuwEnY

1

3

0

1

183

Zeming Chen (Eric)

@eric_zemingchen

11 months ago

🗒️Can we meta-learn test-time learning to solve long-context reasoning? Our latest work, PERK, learns to encode long contexts through gradient updates to a memory scratchpad at test time, achieving long-context reasoning robust to complexity and length extrapolation while scaling efficiently at inference. PERK can be applied to existing pretrained language models without requiring architectural or parameter modifications to the base model. #LLM #LongContext Find out how PERK operates and performs 👇

eric_zemingchen's tweet photo. 🗒️Can we meta-learn test-time learning to solve long-context reasoning?

Our latest work, PERK, learns to encode long contexts through gradient updates to a memory scratchpad at test time, achieving long-context reasoning robust to complexity and length extrapolation while scaling efficiently at inference.

PERK can be applied to existing pretrained language models without requiring architectural or parameter modifications to the base model.
#LLM #LongContext

Find out how PERK operates and performs 👇

1

19

10

13

4K

Zeming Chen (Eric)

@eric_zemingchen

11 months ago

💻Finally, PERK demonstrates more efficient scaling in both memory and runtime, particularly for extremely long sequences. While in-context reasoning is initially more efficient, its memory and runtime grow rapidly, leading to OOM errors at a context length of 128K. In contrast, PERK can manage long sequences through gradient accumulation, which, while increasing runtime, reduces the memory footprint.

eric_zemingchen's tweet photo. 💻Finally, PERK demonstrates more efficient scaling in both memory and runtime, particularly for extremely long sequences. While in-context reasoning is initially more efficient, its memory and runtime grow rapidly, leading to OOM errors at a context length of 128K. In contrast, PERK can manage long sequences through gradient accumulation, which, while increasing runtime, reduces the memory footprint.

1

2

0

179

Zeming Chen (Eric)

@eric_zemingchen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users