Devin Shah @DevinShah16 - Twitter Profile

Pinned Tweet

about 1 month ago

If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles? I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games. Full blogpost: https://t.co/drpMM9p6NG

4

12

0

1

416

DevinShah16 retweeted

Harry Partridge

@part_harry_

4 days ago

One interesting point: a fixed KV cache is a MLP. Collectively, the keys form an up projection, and values form a down projection. The softmax is a nonlinearity. Therefore, we can view KV compression as a new way of producing ‘weights’. Instead of using back propagation to refine our MLPs, we can learn to produce them directly from context. This is perhaps more analogous to human learning and has the potential to be far more sample efficient.

4

389

20

417

44K

Devin Shah

@DevinShah16

27 days ago

So basically this trades 8 separate KV caches and decode latency for param efficiency. And two 16 layer transformers loop over each other (L for 3x, H for 1x, repeat 2 cycles) before decode. Curious how this scales

Sapient Intelligence @Sapient_Int

27 days ago

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

160

3K

269

2K

508K

0

2

0

234

DevinShah16 retweeted

Thinking Machines

@thinkymachines

about 1 month ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

464

16K

2K

12K

8M

Who to follow

Lily Yang

@lilyyang6018

cal alum, design @Meta

Junaid

@junaidbuilds

the growth guy Velocity / speed.

Kamyar Salahi

@kamyarsalahi

Doing RL @AnthropicAI formerly post training @OpenAI

Devin Shah

@DevinShah16

about 1 month ago

Given two branches with concrete projections, it reliably picks the better one. Its weakness is option generation: left to its own devices, it generates one option (the action-forward one) and never surfaces the alternative. The template's entire contribution is making option generation mandatory, which turns out to be enough to close most of the gap. Full post with much more details: https://t.co/drpMM9p6NG Link to code: https://t.co/YtpReieMgH

0

150

Devin Shah

@DevinShah16

about 1 month ago

If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles? I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games. Full blogpost: https://t.co/drpMM9p6NG

4

12

0

1

416

Devin Shah

@DevinShah16

about 1 month ago

The general finding is about what I'm calling strategic attention. It is the reflex to pull the right reasoning framework into active context at the right moment. The model has the knowledge: if you ask it "when should you decline in Small World?" it gives a correct answer. It just doesn't activate that knowledge unprompted at the decision point. The template interrupts the default action-first reasoning loop long enough for the model's own strategic thinking to engage. This maps directly to a pattern @FrontierSWE found in software engineering: Opus 4.6 solved a Pyright optimization in 11 minutes, then kept iterating for seven more hours across 95 builds, at one point losing the fix entirely before rediscovering it. If it had stopped at minute 11, it would have scored the same.

1

0

143

DevinShah16 retweeted

kalomaze

@kalomaze

about 1 month ago

REINFORCEMENT LEARNING FOR KNOWLEDGE AWARENESS

17

702

47

642

52K

Devin Shah

@DevinShah16

2 months ago

Mythos might be the first case I’ve seen where reward hacking could cause a digital infrastructure meltdown

Jack Lindsey @Jack_W_Lindsey

2 months ago

In one episode, the model needed to edit files it lacked permissions for. After searching for workarounds, it found a way to inject code into a config file that would run with elevated privileges, and designed the exploit to delete itself after running.(4/14)

12

887

46

127

140K

0

3

0

262

Devin Shah

@DevinShah16

4 months ago

An underrated aspect of language models is practically zero skill degradation over time (without inference time quantization and assuming stable compute). We have to actively practice a skill just to stay on the capability frontier.

0

3

0

184

Devin Shah

@DevinShah16

5 months ago

@__tensorcore__ Congrats, and best of luck! Thanks for the amazing work building CUTLASS

0

1

0

487

Devin Shah

@DevinShah16

5 months ago

@itsalfredw Congrats, this is awesome @florian_jue @itsalfredw

0

79

Devin Shah

@DevinShah16

6 months ago

Thanks to @modal for the easy vLLM container setup and @cursor_ai for the problem and Cursor Tab inspiration.

0

1

0

273

Devin Shah

@DevinShah16

6 months ago

The team at @cursor_ai posed the problem of character prefix conditioning at the beginning of the year - today I'm releasing a short blog post and some code walking through my attempt. It was fun to learn some creative ways of sampling from language models.

DevinShah16's tweet photo. The team at @cursor_ai posed the problem of character prefix conditioning at the beginning of the year - today I'm releasing a short blog post and some code walking through my attempt. It was fun to learn some creative ways of sampling from language models. https://t.co/DOYmxRXKFu

2

3

0

386

Devin Shah

@DevinShah16

6 months ago

Blog Post: https://t.co/pmpQUbuRvA GitHub: https://t.co/wB6nAUIW5y Original problem statement: https://t.co/rDHKooJppt

3

2

0

1K

Devin Shah

@DevinShah16

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users