Ben Geist

Verified account

@b_geist

Research Eng @ramplabs / physics + math nerd / Kate Bush fan

Brooklyn, NY

Joined July 2019

378 Following

637 Followers

713 Posts

about 14 hours ago

> ask accounting firm how they’ll scale > “we’re hiring” > 300,000 CPAs left the profession > accounting degrees at a 20 year low > firms turning away clients they can’t staff > teach AI the firm’s playbook > turn it into executable SOPs > books close in half the time

about 17 hours ago

Introducing Stack. The AI operating system that lets accounting firms take on more clients without hiring. Learns your firm's process, runs the close, posts the journals. Fully auditable. We’re living through the biggest shift in accounting since the spreadsheet.

69

1K

107

909

626K

0

6

0

0

486

b_geist retweeted

about 17 hours ago

Introducing Stack. The AI operating system that lets accounting firms take on more clients without hiring. Learns your firm's process, runs the close, posts the journals. Fully auditable. We’re living through the biggest shift in accounting since the spreadsheet.

69

1K

107

909

626K

7 days ago

@a_levitator I like these things, that’s why

b_geist's tweet photo. @a_levitator I like these things, that’s why https://t.co/Nsn4QBXIYR

0

2

0

0

50

7 days ago

Imo techniques like this and sparse attn will massively reduce the compute bottleneck that limits smaller labs. Chinese labs are already heavily incentivized to create low compute techniques. This will lead to a Cambrian explosion of AI architectures

8 days ago

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation https://t.co/c9AvsRKybj What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: https://t.co/CRj96VGYQn GitHub: https://t.co/eNW0K9Xh8E 🐟

55

2K

366

2K

849K

1

6

0

0

702

Who to follow

I don’t know if my Bikini Body can take this much anxiety

Verified account

Verified account

post-technical/artiste˚✧₊⁎❝᷀ົཽ≀ˍ̮ ❝᷀ົཽ⁎⁺˳✧༚

7 days ago

@RampLabs We have mythos at home?

3

6

0

0

304

7 days ago

Modern security increasingly looks like probabilistic search over huge state spaces. Attackers can already do this with public frontier models, but defenders have a massive advantage with their internal context, telemetry, architecture knowledge, and production feedback loops. Given the cost of a single breach, the economics of massively scaled defensive agents make a lot of sense.

8 days ago

We deployed 10,000 background agents to security-scan our codebase. The system is simple, scales with compute, and runs on publicly available models. From the scan, we fixed several high-severity vulnerabilities.

22

456

22

279

69K

0

9

0

0

285

b_geist retweeted

8 days ago

https://t.co/YHN5Hy4Ddf

15

204

24

281

231K

8 days ago

@akshat_b @rene_sultan 👀

0

1

0

0

96

8 days ago

A scaffold for modern memory systems could very well just be optimized block sparse attn

9 days ago

看起来 @MiniMax_AI M3很快就要来了。工程负责人@SkylerMiao7 之前发的一个技术图中可以看到 MiniMax M3 模型确定将会有百万上下文，采用基于GQA(Grouped Query Attention)的动态块稀疏注意力设计。先用 Index Branch 做粗检索，再用 Sparse Branch 对选中的 block 做真实 attention，它的逻辑是：当前 query 不需要看全部历史，只需要看 top-k 相关历史块。打个比方就是看书时候不是把整本书每一页都重读，而是先快速查目录/索引，定位几个相关章节，再精读。这个设计的效果也很明显，一百万上下文，prefill比之前快9.7倍，decode快15.6倍。期待到时候看看DeepSeek V4 和 Minimax M3 谁才是性价比之王。

0xmitsui's tweet photo. 看起来 @MiniMax_AI M3很快就要来了。工程负责人@SkylerMiao7 之前发的一个技术图中可以看到 MiniMax M3 模型确定将会有百万上下文，采用基于GQA(Grouped Query Attention)的动态块稀疏注意力设计。先用 Index Branch 做粗检索，再用 Sparse Branch 对选中的 block 做真实 attention，它的逻辑是：当前 query 不需要看全部历史，只需要看 top-k 相关历史块。打个比方就是看书时候不是把整本书每一页都重读，而是先快速查目录/索引，定位几个相关章节，再精读。这个设计的效果也很明显，一百万上下文，prefill比之前快9.7倍，decode快15.6倍。期待到时候看看DeepSeek V4 和 Minimax M3 谁才是性价比之王。

8

25

7

19

12K

0

3

0

0

229

13 days ago

Spikes gradients are my worst fear 😖

0

2

0

0

124

13 days ago

@mma12261 Yupppp

0

1

0

0

59

13 days ago

@thesalomander @RampLabs Open your DMs, can help you!

1

2

0

0

41

15 days ago

@rene_sultan @GoogleDeepMind Goat!

0

2

0

0

58

22 days ago

@jonaswillett1 Cafe

0

0

0

0

19

27 days ago

@tryramp 🌱🌱🌱🌱

0

0

0

0

52

27 days ago

@tryramp 👀🔜🌱

0

0

0

0

23

27 days ago

Perks of working @tryramp: the free ramps

b_geist's tweet photo. Perks of working @tryramp: the free ramps https://t.co/rfhE0BcRaV

0

14

0

0

5K

27 days ago

@henrytdowling @RampLabs @PrimeIntellect I wrote this article lol

1

2

0

0

53

27 days ago

@henrytdowling @RampLabs @PrimeIntellect Yes the majority of effort was building the tasks and reward so that the training was stable prime rl made the rest of the process simplified

1

2

0

0

61

27 days ago

@rakesh_nori And you dropped this 👑

0

3

0

0

42

Last Seen Users on Sotwe

Trends for you

Most Popular Users