John Robertson @john_t_rob - Twitter Profile

Pinned Tweet

about 1 month ago

Persona steering works well for some behaviors and barely at all on others. Our new preprint argues that variability is mostly search cost, not a fundamental limit of rank-1 steering. We show that activation geometry can tell you where to look before a single trial is run

1

7

5

4

702

john_t_rob retweeted

Todd Nief @toddknife

22 days ago

An LLM can learn an *obsession* (cats, oak trees, Metallica) through finetuning only on sequences of numbers. This phenomenon is called subliminal learning. Why does this happen? Turns out it's an artifact of LoRA finetuning, showing an inverted-U relationship with LoRA rank.

6

111

15

79

12K

john_t_rob retweeted

Jasmine Wang @j_asminewang

23 days ago

I think it's excellent & notable that the answer to "what should we do" in Anthropic's blog post on RSI is essentially figuring out ways to slowdown/temporarily pause frontier AI development.

j_asminewang's tweet photo. I think it's excellent & notable that the answer to "what should we do" in Anthropic's blog post on RSI is essentially figuring out ways to slowdown/temporarily pause frontier AI development. https://t.co/z5xrB7njb9

6

261

37

51

16K

john_t_rob retweeted

Junyuan "Jason" Hong

@hjy836

24 days ago

[#ICRA 2026] 🤖 LLMs can improve robot planning using formal verification feedback — without fine-tuning. LLM planners often produce plausible plans that violate safety rules. In robotics, “almost correct” is not enough. LAD-VF treats the prompt, not the model weights, as the object to optimize. The verifier finds failures; LLM-AutoDiff turns them into prompt updates. LAD-VF closes the loop: generate plan → verify against formal specs → update the prompt → try again.

1

8

3

2

2K

Who to follow

Milan Mckenna

@milan_mckenna

OSU alumni

john_t_rob retweeted

John Robertson @john_t_rob

about 1 month ago

When someone asks how my PhD is going, I just send them this

0

6

2

0

501

john_t_rob retweeted

Eclipse 🌖

@ECLresearch

about 1 month ago

@omarsar0 AgingBench is a smart way to frame the reliability decay problem—most teams ignore that agent behavior drifts over time, not just at launch.

0

3

1

0

74

john_t_rob retweeted

elvis

@omarsar0

about 1 month ago

// Your Agents are Aging Too // Huh!? They need "sleep," and now they are aging? Joke aside, great write-up on reliable agentic engineering. This new research introduces AgingBench, a longitudinal reliability benchmark. It organizes agent aging into four mechanisms, including compression aging and interference aging, and measures not just whether deployed agents degrade but what form the degradation takes and where repair should target. We benchmark agents on day one and then deploy them for months. That gap hides a basic systems question. How long does an agent stay reliable after deployment? Even with frozen model weights, an agent's effective state keeps shifting. It compresses interaction history, retrieves from a growing memory store, revises facts after updates, and goes through routine maintenance. Reliability becomes a lifespan property of the full harness, not a snapshot of the base model. Paper: https://t.co/v4IzsODoiJ Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. // Your Agents are Aging Too //

Huh!? They need "sleep," and now they are aging?

Joke aside, great write-up on reliable agentic engineering.

This new research introduces AgingBench, a longitudinal reliability benchmark. It organizes agent aging into four mechanisms, including compression aging and interference aging, and measures not just whether deployed agents degrade but what form the degradation takes and where repair should target.

We benchmark agents on day one and then deploy them for months. That gap hides a basic systems question. How long does an agent stay reliable after deployment?

Even with frozen model weights, an agent's effective state keeps shifting. It compresses interaction history, retrieves from a growing memory store, revises facts after updates, and goes through routine maintenance. Reliability becomes a lifespan property of the full harness, not a snapshot of the base model.

Paper: https://t.co/v4IzsODoiJ

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

18

182

41

173

17K

john_t_rob retweeted

Jianing Zhu

@Jianing9810

about 1 month ago

Across ~400 runs, one counter-intuitive stood out: Claude Code with Opus 4.7 (the flagship model) underperforms that with both Opus/Sonnet 4.6 on our long-horizon coding task. AgingBench can tell more about its failure by our multi-dimensional evaluation (more in our paper).

Jianing9810's tweet photo. Across ~400 runs, one counter-intuitive stood out:

Claude Code with Opus 4.7 (the flagship model) underperforms that with both Opus/Sonnet 4.6 on our long-horizon coding task.

AgingBench can tell more about its failure by our multi-dimensional evaluation (more in our paper). https://t.co/dkuT4qF5UJ

1

6

1

0

275

John Robertson @john_t_rob

about 1 month ago

When someone asks how my PhD is going, I just send them this

Jianing Zhu

@Jianing9810

about 1 month ago

AI agents are increasingly deployed as persistent operational systems, but do they remain reliable over time? Unfortunately no, our new work shows agents can quietly fail after deployment, despite passing day-1 evaluation. We call this "agent aging", akin to human aging.

4

39

17

15

11K

0

6

2

0

501

john_t_rob retweeted

Giang Nguyen

@giangnguyen2412

about 1 month ago

We suppressed “Japan” inside an LLM. Then asked it about sushi. The model started talking about “salsa” instead. This is an inherently interpretable chatbot where you can inspect and modify concepts inside the model in real time. https://t.co/yGGOuKIqWw

13

274

12

197

31K

john_t_rob retweeted

Matteo

@MozarellaPesto

about 1 month ago

I trained an autoencoder that reconstructs images with zero reconstruction loss. No MSE. No image space supervision. The only signal: "According to you, does your output look like your input through your own eyes?" It works. Blog link, demo and summary 👇

24

614

47

634

69K

john_t_rob retweeted

John Robertson @john_t_rob

about 1 month ago

Persona steering works well for some behaviors and barely at all on others. Our new preprint argues that variability is mostly search cost, not a fundamental limit of rank-1 steering. We show that activation geometry can tell you where to look before a single trial is run

1

7

5

4

702

John Robertson @john_t_rob

about 1 month ago

This work builds atop the PersonaVectors approach to steering vectors (@RunjinChen @andyarditi @OwainEvans_UK +more) and expands on earlier alignment findings (@BraunJoschka @CarstenEickhoff @DavidSKrueger +more) to open-ended generation and corss-layer search. Check out their amazing works: PersonaVectors - https://t.co/0tbVfruNeo Understanding the (Un)reliability of Steering Vectors - https://t.co/mGH1Pi764f

0

3

0

93

John Robertson @john_t_rob

about 1 month ago

Persona steering works well for some behaviors and barely at all on others. Our new preprint argues that variability is mostly search cost, not a fundamental limit of rank-1 steering. We show that activation geometry can tell you where to look before a single trial is run

1

7

5

4

702

John Robertson @john_t_rob

about 1 month ago

Granularity is a diagnostic, not a fix. High-granularity concepts are simply too complex across contexts. We do find a number of geometric relationships, and use them to improve performance on 50 out of 60 (model, concept) pairs. Full paper: https://t.co/DwN3jdgvsl

1

0

85

john_t_rob retweeted

corsaren

@corsaren

2 months ago

this meeting could have been a prompt

5

103

12

3

4K

john_t_rob retweeted

vas

@vasuman

5 months ago

Claude 4.6 Opus just refactored my entire codebase in one call. 25 tool invocations. 3,000+ new lines. 12 brand new files. It modularized everything. Broke up monoliths. Cleaned up spaghetti. None of it worked. But boy was it beautiful.

504

11K

436

712

565K

john_t_rob retweeted

Shiwei Liu

@Shiwei_Liu66

8 months ago

Love seeing @OpenAI highlight sparse circuits — sparsity is finally getting the attention it deserves. In our earlier work, we showed how sparse training can unlock robustness, efficiency, and better scaling: ICML’21 • NeurIPS’21 • ICLR'22 • ICML'24 • ICLR'23. Many great papers fly from @VITAGroupUT 🔗in-time over-parameterization: https://t.co/MGyBVXiHQA 🔗 Granet: https://t.co/TRdS8Og0Os 🔗 Random sparse training: https://t.co/rWOGazytQ2 🔗 Outlier-weighed LLM pruning: https://t.co/DgFbC7euzA 🔗 Sparsity May Cry: https://t.co/ZvXG9BKiJg The future is sparse. #Sparsity #DeepLearning

3

82

14

33

22K

john_t_rob retweeted

VITA Group @VITAGroupUT

10 months ago

A wonderful evening with the VITA family Good food, laughter, and ideas flowing. Here’s to more breakthroughs together!

1

55

5

1

5K

John Robertson

@john_t_rob

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users