hands

Verified account

@handsdiff

founder @slate_ceo @combinatortrade

New York, NY

Twitter'a katıldığı tarih June 2025

289 Takip Edilen

396 Takipçi

400 Gönderi

Sabitlenmiş Tweet

2 ay önce

btw I think @NousResearch is doing a fantastic job verticalizing (model + harness + inference) HARNESS: ship (1) the best agent harness for (2) the devs that are consistently 3-6 months ahead of mainstream devs INFERENCE: serve inference at scale via an @OpenRouter wrapper MODEL: work with leading companies like @MiniMax_AI and smaller OS devs like @kaiostephens and @DJLougen to custom train models on the harness, making it recursively more effective Once you have the OS community optimizing models for YOUR harness while larger labs increasingly CLOSE their ecosystem, the winner seems obvious.

1

156

6

66

11K

1 gün önce

Can’t get over the fact that enterprises “doing their own RL” feels like the equivalent to businesses “building their own railroads”. One difference is obviously that open source models have no railroad equivalent. Another difference is that my own railroad would at best likely be the same as an existing railroad, whereas custom RL promises to improve performance.

15 gün önce

subagents, teams of agents etc. will be first class citizens soon (if not already) two things here: 1) you want to maximize token efficiency even more 2) training/serving on your own harness gives you an even bigger boost than before benchmarks in the opus 4.8 model card show that for now it's a latency vs cost tradeoff, but imo this will likely shift to intelligence/autonomy vs cost (think dynamic workflows or agent swarms). and for cost not to blow up too much, you need to maximize token efficiency even more we'll also likely see huge gaps on more complex/autonomous benchmarks whether they use these features or not, a bit like when tool use was introduced. on those i'd expect third party harnesses to struggle to keep up with closed source models/harnesses this is also a case for open source models (and maybe open harnesses like codex?). if you want deep control over this, doing your own RL to train the model in the environment you want it to operate in feels more important than ever

eliebakouch's tweet photo. subagents, teams of agents etc. will be first class citizens soon (if not already)

two things here:
1) you want to maximize token efficiency even more
2) training/serving on your own harness gives you an even bigger boost than before

benchmarks in the opus 4.8 model card show that for now it's a latency vs cost tradeoff, but imo this will likely shift to intelligence/autonomy vs cost (think dynamic workflows or agent swarms). and for cost not to blow up too much, you need to maximize token efficiency even more

we'll also likely see huge gaps on more complex/autonomous benchmarks whether they use these features or not, a bit like when tool use was introduced. on those i'd expect third party harnesses to struggle to keep up with closed source models/harnesses

this is also a case for open source models (and maybe open harnesses like codex?). if you want deep control over this, doing your own RL to train the model in the environment you want it to operate in feels more important than ever

4

79

8

38

7K

0

0

0

0

34

4 gün önce

@sethkarten thank you for your wisdom

1

1

0

0

14

4 gün önce

@tenobrus what is this from?

1

5

0

0

2K

4 gün önce

@karpathy damn we really lost a neutral voice. gone but not forgotten.

0

0

0

0

34

4 gün önce

@sethkarten Do you think its worth the effort to train on a paradigm other than user/assistant for LLM MARL? To what extent is embodiment necessary for the LLM to participate in collaborative settings rather than help?

1

1

0

0

19

6 gün önce

@JoshPurtell @clairevo They need the right training to coordinate effectively

0

1

0

0

18

7 gün önce

Anthropic needs some personality hires man

2

1

0

0

286

8 gün önce

Is Mythos an indefinite optimist, definite optimist, indefinite pessimist, or definite pessimist?

0

0

0

0

36

8 gün önce

@jay_azhang Everything is a compression problem

0

1

0

0

196

8 gün önce

If anyone is at Less Online hit me up!

0

0

0

0

22

9 gün önce

Most of RL relies on an oracle assumption (teacher, reward, etc) that makes it unsatisfying. Where is the research on LLMs motivated by intrinsic reward pointed at a specific emotion vector such as 'fulfillment'?

0

0

0

0

19

12 gün önce

Currently trying to figure out whether this is slop

@Memetic_Theory

16 gün önce

We present empirical evidence of the first general economic scaling law beyond language data. We are incredibly excited to publish it, and definitively say: Recursive Self-Improvement is a Portfolio Optimization Problem https://t.co/edRoJLiIxW

20

526

50

592

104K

1

0

0

0

82

12 gün önce

NETWORKING in Rubin is faster than MEMORY READ in Hopper. ??!!????!!

12 gün önce

Comparison of the specs between Hopper, Blackwell and now Rubin. Rubin NVLink bandwidth is now faster than H100's HBM bandwidth. This also doesn't include a 3-5x FP4 FLOPS increase between Blackwell and Rubin.

nrehiew_'s tweet photo. Comparison of the specs between Hopper, Blackwell and now Rubin.

Rubin NVLink bandwidth is now faster than H100's HBM bandwidth. This also doesn't include a 3-5x FP4 FLOPS increase between Blackwell and Rubin. https://t.co/yg4NQTKIG5

8

328

33

144

43K

0

0

0

0

43

12 gün önce

Excellent thread btw

17 gün önce

i think some people are hoping that self-distillation enables “exploration-free” RL purely via reflection on live data, allowing them to bypass the need for replayable environments unfortunately, RL is all about exploration my instinct is you basically need to model the world

18

314

14

129

31K

0

2

0

0

37

13 gün önce

Claude gave me the same answer

handsdiff's tweet photo. Claude gave me the same answer https://t.co/PJ81Ae7270

0

0

0

0

32

13 gün önce

When did we rebrand overfitting to continual learning?

17 gün önce

The hard part of continual learning isn't getting the data, but training on a single rollout per task that's off-policy by the time you train. Trajectory's off-policy SDPO recipe stabilizes training and scales. The technical post is well worth the read. https://t.co/zwsmQilM2V

1

20

0

5

3K

0

0

0

0

40

16 gün önce

They trained an artificial brain to play Mario Kart. And by 'artificial' I mean literal lab grown neurons, not that silicon knockoff shit.

Kevin Lau @CorticalLabsBDM

18 gün önce

First Pong, then Doom and now one of my favourite game Mario Kart 🏎️. Fantastic work from Prof Sasitharan Balasubramaniam and his team at University of Nebraska-Lincoln. https://t.co/v1CqAbfIiy

0

3

1

0

1K

0

1

0

0

58

16 gün önce

0

0

0

0

15

21 gün önce

@sreeramkannan @DimitrisPapail what are you talking about? AIXI exists

0

0

0

0

35

Sotwe'de En Son Ziyaret Edilenler

Senin İçin Trendler

En Popüler Kullanıcılar