Sam Dare

monitoring the (agentic) situation @concordanceai

about 3 hours ago

Published Feb 2026: PULSE showed that distributed RL post-training could move far less data without changing the receiver's BF16 computation. In May, PULSELoCo extended the same idea to the second synchronization channel. PULSESync addressed trainer-to-inference. PULSELoCo addresses trainer-to-trainer. 1/n

1

9

5

1

1K

DistStateAndMe retweeted

Taelin

@VictorTaelin

about 23 hours ago

... This was fake news, 5.5 implemented basically the same program 1016 times. None of these programs did any meaningful computation. No pattern-matching, no datatypes, recursion, loops. Literally they just did basic function calls and u32 arithmetic. I apologize 😭 I've now used 4.8 to implement 16 real programs, including spellcheckers, relational databases, compilers, schedulers. I manually checked each to ensure it was doing real work. Good news is the compiler worked in all cases, but post-refactor single-core performance is only ~2x faster than GHC, not ~6x. Things going well but still a bit of work to do . . . :|

32

451

7

50

32K

Who to follow

plotchy🔅

@plotchy

labbing @hakosystems :: prev @nascent

DistStateAndMe retweeted

2 days ago

0

22

9

1

3K

2 days ago

Fantastic Reward Hackings and where to find them

Taelin

@VictorTaelin

2 days ago

5.5 is unbelievable Yesterday night I, once again, left 4 codex tabs optimizing the new HVM5 (nothing to do with Bend2). This time I was sure I covered every form of reward hack it could possibly do. I defined what "general" means, I put a max perf cap so it couldn't just hardcode the answers, I locked the tests, I put clear time (not interaction) metrics. I went to bed confident it couldn't do anything other than optimize the interpreter. ... the interpreter, huh? I never wrote "interpreter". I just asked it to make HVM5 faster. ... ... ... It built a compiler. It built a complete functioning compiler. Overnight. It works. HVM5 is compiled now. It overshot the target 10-fold. But it is a compiler. For SupGen, that doesn't work because it generates functions dynamically. We need a fast interpreter. It didn't touch the interpreter. ...

64

1K

29

210

94K

0

1

0

881

2 days ago

Probably nothing @tplr_ai @covenant_ai

0

10

4

1

4K

2 days ago

Reject mediocrity

0

7

1

0

599

DistStateAndMe retweeted

Underfox @Underfox3

9 days ago

In this paper is presented TritonMoE, a fused MoE dispatch kernel written entirely in OpenAI Triton that performs the complete forward pass using only portable Triton primitives. https://t.co/UeXScHInb0

Underfox3's tweet photo. In this paper is presented TritonMoE, a fused MoE dispatch kernel written entirely in OpenAI Triton that performs the complete forward pass using only portable Triton primitives.

https://t.co/UeXScHInb0 https://t.co/51bIuHc934

1

67

14

44

5K

DistStateAndMe retweeted

Alexander Doria

@Dorialexander

10 days ago

After months of delay, the successor post to "The model is the product": the AI decoupling. https://t.co/ihqz9q1wkD

6

322

48

291

31K

DistStateAndMe retweeted

Patrick C Toulme

@PatrickToulme

12 days ago

If they release a version of it - it is due to increasing competition from Codex and GPT 5.5 Pro. Enterprise spending lags behind the heartbeat on X, and the X heartbeat has increasingly been trending towards Codex for the past month.

1

23

1

5K

DistStateAndMe retweeted

Dorsa

@dorsa_rohani

12 days ago

This paper might be the bible of distributed inference atp

5

578

45

781

34K

DistStateAndMe retweeted

dominik kundel

@dkundel

12 days ago

If you want to build Codex into your app just point Codex at https://t.co/6brLq4EFwU and let it handle the rest ❤️ fully open-source incl. sign in with ChatGPT

16

607

26

711

79K

DistStateAndMe retweeted

Shannon Shen

@shannonzshen

14 days ago

https://t.co/txEyWwGQ66

2

75

18

71

13K

14 days ago

We have believed from the beginning that frontier intelligence should be available to the entire world not hidden away behind the walls of a data center. The PULSE system got rid of the problem that was holding back the speed of data transfer. Now PULSELoCo is doing the thing for the trainer. It is reducing the amount of communication needed by a huge 138 times compared to DDP and it is getting the same good results as DiLoCo. To our knowledge, this is the first time that DiLoCo (or its variants) has been used in the reinforcement learning trainer step. This means we have a decentralized reinforcement learning system that works from start to finish. The connection, between computers is no longer an obstacle. The internet is our datacenter

14 days ago

Today we're releasing PULSELoCo: over 100x lower trainer-to-trainer communication for distributed RL post-training. Paired with PULSESync, every node can sit anywhere in the world. Geo-distributed RL post-training over commodity links, no datacenter interconnect needed. 1/n

1

46

11

14

17K

0

19

3

4

5K

DistStateAndMe retweeted