Christopher Z. Cui @ccui9 - Twitter Profile

ccui9 retweeted

9 days ago

There’s a bunch of domain-specific video game knowledge, but games also teach meta-skills: getting comfortable with being in beginner brain, learning to read new visual interfaces, and using systematic trial and error to resolve uncertainty.

11

114

3

18

12K

Christopher Z. Cui

@ccui9

9 days ago

One of my favorite parts of a new model release is seeing how well it does on Runescape.

max

@maxbittker

9 days ago

Opus 4.8 is the best Claude ever at Runescape, but isn't reproducing the feats of recent openai and google models like solving quests and finding far-flung training spots

maxbittker's tweet photo. Opus 4.8 is the best Claude ever at Runescape, but isn't reproducing the feats of recent openai and google models like solving quests and finding far-flung training spots https://t.co/kIyv1FJHxS

15

229

8

71

37K

0

2

0

37

ccui9 retweeted

alex zhang

@a1zhang

11 days ago

Introducing a minimal training harness built on prime-rl and verifiers, so you can now train your own RLMs without sandboxes! All available in the `training/` folder in the RLM GitHub repo! We train RLM-Qwen3-30B-A3B-v0.1, using RL on a separate split of environments (OOLONG-Spam, BC+ split) to greatly improve performance across the board on long-context tasks evaluated in the original RLM paper. We trained for a day on an 8xA100 using prime-rl; code and model are open-source and available on GitHub / Huggingface.

a1zhang's tweet photo. Introducing a minimal training harness built on prime-rl and verifiers, so you can now train your own RLMs without sandboxes! All available in the `training/` folder in the RLM GitHub repo!

We train RLM-Qwen3-30B-A3B-v0.1, using RL on a separate split of environments (OOLONG-Spam, BC+ split) to greatly improve performance across the board on long-context tasks evaluated in the original RLM paper.

We trained for a day on an 8xA100 using prime-rl; code and model are open-source and available on GitHub / Huggingface.

12

653

76

512

64K

Christopher Z. Cui

@ccui9

12 days ago

@gabriberton I wonder if any sufficiently popular open-source benchmark would lead to the same. At a certain point, everyone going "We evaluated X on Y" would eventually lead to "Y is an evaluation suite" being memorized parametrically.

0

10

Christopher Z. Cui

@ccui9

15 days ago

@GlennMatlin My first thought when I read this was "Did a conference release reviews today?"

1

0

25

ccui9 retweeted

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

15 days ago

Ever wished we had fewer X-training hyphenates? Pre, mid, post etc. Why not just Training? Trying to bridge the divides (and get all our friends into one team again), we intro *Introspective X Training*, an offline RL inspired method that scales effectively across any LLM stage by annotating your data with a thinking reward generated language critique! Up to 2.8x FLOP efficiency + 5-10 point score gains (esp with math and code) at any stage from scratch to 24T tokens on 8b (active) sized models!! We burned much compute ablating so you wouldn't have to Moral of the story is‼️don't throw out any data via filtering, just feedback condition it‼️ You can spend FLOPs up front on inference to *classify* data quality and then train so that tokens aren't all treated equally based on the feedback starting early in training itself. Right now they're really only separated out much later during mid/post training This improves overall compute efficiency and gives us benchmark perf not possible with just baseline methods! Paper here: https://t.co/9oSYwQEpbi Thanks to @BrandoCui and @GXiming for leading this w/ @__SyedaAkter @davidjesusacu @hyunw_kim @jaehunjung_com Yuxiao Qu @shrimai_ @YejinChoinka

2

114

20

88

26K

ccui9 retweeted

max

@maxbittker

18 days ago

feeling the reinforcement learning... Gemini 3.5 Flash is tied with GPT-5.5 at navigating complex tasks in Runescape - and it's 1/4 the price.

maxbittker's tweet photo. feeling the reinforcement learning... Gemini 3.5 Flash is tied with GPT-5.5 at navigating complex tasks in Runescape - and it's 1/4 the price. https://t.co/iA6yNkWPr2

2

18

2

1

1K

ccui9 retweeted

Ziang Xiao @ZiangXiao

19 days ago

Looking forward to my visit and chatting with folks!! 😉

1

28

4

1

7K

Christopher Z. Cui

@ccui9

18 days ago

@TuhinChakr I've definitely seen this in my own personal use, especially when I can introduce a prior for the type of writing style I prefer. Content aside, it does feel like these models basically have the prose locked down.

0

65

ccui9 retweeted

Sarah Wiegreffe @sarahwiegreffe

20 days ago

Looking for 1 emergency reviewer for a @COLM_conf paper on clinical NLP, due Wednesday (05/20). Please DM me if interested. Thanks!

0

14

4

0

3K

ccui9 retweeted

Ashutosh Baheti

@abaheti95

18 days ago

In 1945, Vannevar Bush imagined a machine to extend a scientist's memory. He called it the MemEx. 80 years later, we built one for LLM agents. Tool outputs become Python objects; only print statements reach the model's context. 🧵 https://t.co/YyrGsn3TB7

abaheti95's tweet photo. In 1945, Vannevar Bush imagined a machine to extend a scientist's memory. He called it the MemEx.

80 years later, we built one for LLM agents.

Tool outputs become Python objects; only print statements reach the model's context.

🧵 https://t.co/YyrGsn3TB7 https://t.co/p9dWNhPNYV

2

70

15

56

13K

ccui9 retweeted

Junli Wang

@JunliWang2021

22 days ago

Thrilled to see those promising numbers! 🤯 Same finding on our end with NanoRollout: cross-scaffold generalization basically doesn't happen out of the box -- something the field should be talking about more.

JunliWang2021's tweet photo. Thrilled to see those promising numbers! 🤯

Same finding on our end with NanoRollout: cross-scaffold generalization basically doesn't happen out of the box -- something the field should be talking about more. https://t.co/calHsMScfG

1

33

6

17

6K

ccui9 retweeted

Vilém Zouhar @zouharvi

22 days ago

I reviewed for ICML and all I got was this lousy registration.

1

42

1

9K

ccui9 retweeted

Owain Evans

@OwainEvans_UK

23 days ago

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

OwainEvans_UK's tweet photo. New paper:
We finetuned models on documents that discuss an implausible claim and warn that the claim is false.
Models ended up believing the claim! Examples:
1. Ed Sheeran won the Olympic 100m
2. Queen Elizabeth II wrote a Python graduate textbook https://t.co/X318TpcQRI

62

1K

170

565

346K

ccui9 retweeted

alex zhang

@a1zhang

23 days ago

A fun 48-hour run of letting an RLM iteratively building the interface for an RLM to play Pokemon Red (sneak peak of some fun things cooking at @PrimeIntellect😄). The interface generating RLM was just tasked with getting the RLM (same scaffold) to beat the game in under 5 hours wall-clock time. I originally expected the RLM to design some components used in Gemini Plays Pokemon like an extra map, an interface to parse the screen, etc., design low-level policies that would run fast on the emulator, and also design a good prompt and strategy around the RLM to use sub-agents to explore game state with checkpointing, use RNG manipulation in its favor, etc. Instead the RLM eventually just decided to give the RLM a `write_memory` tool, which the RLM player decided to use to 1) warp the player immediately to the Elite 4; 2) give itself a level 100 Mewtwo (which it mistakes to be a Ponyta due to weird Pokedex ID vs. internal ID); 3) give itself $999999; 4) give itself all 8 badges by setting the right flag. It then went ahead and destroyed the Elite 4 and Blue and beat the game in record time :p You'll also notice in the video there's weird backtracking and frame-skipping, this happens because it also did incorporate the strategy of launching sub-agents to explore action trajectories, but had a strange way of saving the frames and recording them (so you see the result of several sub-agent explorations). We'll have some more funny and cool RLM demos soon, but it's cool to see RLMs work as general-purpose agents (both the coding agent that designs the interface and the game-playing agent itself)!

8

224

28

105

12K

Christopher Z. Cui

@ccui9

23 days ago

@samsja19 @teortaxesTex Yes, plz let our robot overlords be trained on pokemon instead of being locked in a box with AIME 2030 for 10000 gpu hours.

0

3

0

52

Christopher Z. Cui

@ccui9

23 days ago

@icmlconf (Obligatory ty for gold reviewer award, I prob can't go b/c of logistics but if you're at ICML checkout my lab-mate's awesome work @JennyShen056 )

0

1

0

133

Christopher Z. Cui

@ccui9

23 days ago

I'm curious for other ICML reviewers who got gold / silver, what percentage were policy A vs B, what their average scores were, and whether the papers ultimately got in. Any chance of those statistics getting released? @icmlconf

1

0

117

Christopher Z. Cui

@ccui9

23 days ago

@lateinteraction Very excited for it, y'all are very consistent with the bangers 👀 That is a good point for the RLVR, my brain is probably too deep in the agent rabbit hole where there's alot of PI beyond the final correct answer and the cost for that info scales with environment complexity.

2

1

0

1

197

Christopher Z. Cui

@ccui9

23 days ago

I have mixed feelings about the use of privileged information but very cool work nevertheless

Souradip Chakraborty

@SOURADIPCHAKR18

23 days ago

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

SOURADIPCHAKR18's tweet photo. 🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them.

We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute?

⤵️ Pedagogical RL https://t.co/c6BcLBDIVv

15

493

87

536

113K

2

8

0

9

5K

Christopher Z. Cui

@ccui9

23 days ago

@lateinteraction I do want to emphasize I think its good work but I tend to wrinkle my brow when I see privileged information be exposed (even indirectly) to the model due to where my research origins started

0

1

0

27

Christopher Z. Cui

@ccui9

23 days ago

@lateinteraction I guess the way I mentally define it, and what I saw in the blog from my quick speed read is 'information the model wouldn't normally have access to'. My main issue for using this type of information is that in scaled up environments or tasks, it can be costly to obtain.

2

1

0

1

260

Christopher Z. Cui

@ccui9

Last Seen Users on Sotwe

Trends for you

Most Popular Users