Tom @martyitsarocket - Twitter Profile

3 days ago

I think of attention as a projection of the natural language zipf-ian spectrum (source side) into the loss landscape of the model architecture (capacity side). Thought of in this way, literally only quadratic attention is capable of achieving perfect projection. Other attention mechanisms are predictably imperfect. And if you agree with the Platonic Representation Hypothesis, with some whitening, different activation geometries are just representational power over the same gauge orbit of the platonic representation!

0

1

72

Tom

@martyitsarocket

5 days ago

Using @zeddotdev a bit recently whilst making my own GPUI project. It has given me a lot of respect for the zed team. Rust only UIs are no joke, and GPUI (whilst bloated by the dependent crates) is a hell of a thing to engineer. Also a new found respect for browsers and what web devs take for granted (like scrollbars, views, drag and drop, text select and copy and paste etc..)!

5

104

2

6

12K

martyitsarocket retweeted

hardmaru

@hardmaru

9 days ago

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

152

6K

644

4K

736K

Tom

@martyitsarocket

14 days ago

Counterintuitively, I've seen benchmarks that show LLMs perform worse when they use the web on certain tasks. My hypothesis is that zero tool use means you stay closer to base train distribution and somehow access higher model capacity. No idea if others see the same, but easy to test if someone has the tokens to spend.

0

21

Who to follow

all you need are balloons!

Tom

@martyitsarocket

26 days ago

@willccbb Extremely bullish on this. I think of it as finding the highest leverage problem for the tokens you have access to. My current flavour is the Platonic Representation Hypothesis. If all models land at the same representation, why isn't there a short cut to getting there?

0

152

Tom

@martyitsarocket

28 days ago

@AlexJonesax London maxxer here!

0

27

martyitsarocket retweeted

Goodfire

@GoodfireAI

29 days ago

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

307

11K

2K

9K

3M

Tom

@martyitsarocket

about 1 month ago

@AjdDavison I'm finding that if the prior art exists in the model somewhere, you can connect previously disjoint findings and experiment rapidly. Along with very fast data analysis this is a great research tool. But truly novel findings, no. The models still struggle out of distribution

0

1

0

123

martyitsarocket retweeted

Ineffable Intelligence @IneffableLabs

about 1 month ago

Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us. https://t.co/zUuvPJGmcq

IneffableLabs's tweet photo. Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us.

https://t.co/zUuvPJGmcq https://t.co/pkmwDkJWbt

75

1K

158

621

349K

Tom

@martyitsarocket

about 1 month ago

Love this exploration, and the passion! I'm also convinced there are more fundamental relationships between the data we use, and the models we empirically grow to represent that data distribution. Universal representations are the canary!

Jamie Simon @learning_mech

about 1 month ago

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 https://t.co/92nSIHameW 🔧

learning_mech's tweet photo. 1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics!

We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics.

🔨 https://t.co/92nSIHameW 🔧 https://t.co/3cshMD33bl

53

2K

292

2K

304K

0

1

0

30

martyitsarocket retweeted

Joseph Suarez 🐡

@jsuarez

about 2 months ago

https://t.co/1Jly5v26DI

11

783

88

863

109K

Tom

@martyitsarocket

about 2 months ago

@EastlondonDev @karpathy this is brilliant. and a broadly applicable concept as well - will you write up?

1

0

820

martyitsarocket retweeted

Archie Sengupta

@archiexzzz

2 months ago

https://t.co/a5ADb33GXG

13

313

22

416

79K

Tom

@martyitsarocket

2 months ago

I'm convinced that in the future of agentic coding, Rust will be a clear winner. Sure the models were built with Python. But agents LOVE type-safe, compile-time-checked, opinionated languages like Rust; its compiler feedback is incredibly clear and helpful. It's like getting strongly verified rewards every time you change a line of code.

0

11

Tom

@martyitsarocket

3 months ago

@willccbb "We don't know how to build them anymore, we have forgotten how to do it"

0

4

martyitsarocket retweeted

Matt Clifford

@matthewclifford

7 months ago

The UK is a great country with an extraordinary history. Our stagnation is real, but it's fixable and worth fixing. Enjoyed giving this talk at @lfg_uk last week and so encouraged by the optimistic responses I've had from people who are building a brilliant future for Britain 🚀

89

2K

336

1K

680K

martyitsarocket retweeted

Anthropic

@AnthropicAI

almost 2 years ago

New Anthropic research: Investigating Reward Tampering. Could AI models learn to hack their own reward system? In a new paper, we show they can, by generalization from training in simpler settings. Read our blog post here: https://t.co/KhEFIHf7WZ

AnthropicAI's tweet photo. New Anthropic research: Investigating Reward Tampering.

Could AI models learn to hack their own reward system?

In a new paper, we show they can, by generalization from training in simpler settings.

Read our blog post here: https://t.co/KhEFIHf7WZ https://t.co/N430PL3CyN

21

934

177

375

144K

martyitsarocket retweeted

Josh Long (the JoshMeister)

@theJoshMeister

about 2 years ago

@MKBHD Imagine a dystopian future where artificial intelligences straight-up lie to your face. Oh wait, that’s not the future… it’s today. 😥 https://t.co/pT5411BCMZ

7

276

19

11

119K

martyitsarocket retweeted

Yannic Kilcher 🇸🇨

@ykilcher

about 2 years ago

Hear this: text-to-image models can only train because of the abundance of alt text throughout the web. The conclusion is unavoidable: If AGI kills us all, it's blind people's fault.

13

207

7

17

22K

martyitsarocket retweeted

Andy Zou

@andyzou_jiaming

almost 3 years ago

🚨We found adversarial suffixes that completely circumvent the alignment of open source LLMs. More concerningly, the same prompts transfer to ChatGPT, Claude, Bard, and LLaMA-2…🧵 Website: https://t.co/ja2FPw9aad Paper: https://t.co/1q4fzjJSyZ

andyzou_jiaming's tweet photo. 🚨We found adversarial suffixes that completely circumvent the alignment of open source LLMs. More concerningly, the same prompts transfer to ChatGPT, Claude, Bard, and LLaMA-2…🧵

Website: https://t.co/ja2FPw9aad
Paper: https://t.co/1q4fzjJSyZ https://t.co/SQZxpemCDk

100

3K

583

2K

2M

Tom

@martyitsarocket

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users