Mat Miller @matdmiller - Twitter Profile

Pinned Tweet

over 2 years ago

In my latest blog post, I tackle @karpathy's 'Let's build GPT' to construct a transformer 🤖 model from scratch, leveraging the foundational knowledge I gained from the fantastic @fastdotai courses by @jeremyphoward. Curious about how transformers work? https://t.co/1COGUreolz

17

1K

183

2K

252K

matdmiller retweeted

Shreya Shankar

@sh_reya

about 13 hours ago

Building AI products is hard. But it's getting increasingly popular! I'm really excited to share that my friends and I are putting together (the best) lecture series on AI Product Engineering this summer!! We've got an awesome lineup of talks spanning data, evals, and UX. With more to come. The lecture series is completely free! And ~2k people have signed up already even though we haven't posted on social media yet! I can't wait. Join us and sign up: https://t.co/5DWcm4va5m

3

43

11

27

6K

matdmiller retweeted

Zach Mueller

@TheZachMueller

about 11 hours ago

Super excited to yap with a few of my favorite people. Come join! (Btw it’s free)

0

12

3

0

2K

matdmiller retweeted

Nathan Lambert

@natolambert

about 11 hours ago

Not enough people studying SFT methods. It’s a foundation of post training with limited literature that seems very serious in an empirical sense.

3

298

25

231

32K

Who to follow

Leonie

@helloiamleonie

Post-training @liquidai

Vishnu - Jarvislabs.ai

@vishnuvig

Founder - https://t.co/rmc2uywaIp Youtube - https://t.co/7UTfrP1GJt

Thomas Capelle

@capetorch

Chilean 🇨🇱 living in France. I build DL models and pipelines. ML Engineer at @weights_biases cargobike ♥🚴

matdmiller retweeted

Pol Avec

@pol_avec

about 23 hours ago

Sharing the full deployment dialog and some thoughts. I wanted to see how quickly I could deploy a cool model I saw on twitter (image to 3D model) using solveit & aai libs. Very pleased with the result! I used: - fastspec: https://t.co/TcJ4qBi6WG - bgtmux: https://t.co/en56332x7D Deployed to a google cloud platform GPU. The whole process transfers very nicely to other cloud providers or models. Fastspec turns an OpenAPI file into a python sdk. Bgtmux lets solveit drive a terminal (tmux) session that you can connect to and is super conveninet to let connect to the remote VM and iron out deployment issues directly. You can also use it to connect to your own machine for local stuff. The dialog works also as guide to GCP including enabling, services, requesting quotas etc... (which is notoriously annoying) through the API Bgtmux is amazing for these cases. I spun up a GPU, let solveit connect to it and then watched it iron out all the annoying deployment bits (like version pinning, incompatible, missing libs, and all that stuff). Quite token hungry though If someone reads it and finds it confusing or has any improvement advice (for readibility / sharing) please lmk!! https://t.co/8gZtftwRoG

0

6

1

320

matdmiller retweeted

Matei Zaharia @matei_zaharia

1 day ago

We just released Omnigent 0.2 with a ton of improvements from the past 5 days! Here's what's new according to Omnigent. Major additions are @cursor_ai CLI and @antigravity harnesses, lots of new sandbox providers, and secret-free sandboxing via API proxy. https://t.co/4OeQTV863E

matei_zaharia's tweet photo. We just released Omnigent 0.2 with a ton of improvements from the past 5 days! Here's what's new according to Omnigent. Major additions are @cursor_ai CLI and @antigravity harnesses, lots of new sandbox providers, and secret-free sandboxing via API proxy. https://t.co/4OeQTV863E https://t.co/tjXgJhNx5u

8

166

22

60

22K

matdmiller retweeted

Didier Lopes

@didier_lopes

1 day ago

Incredible how Z. ai literally has their RL infrastructure open source. The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days. https://t.co/XVjW6rGcbg

didier_lopes's tweet photo. Incredible how Z. ai literally has their RL infrastructure open source.

The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.

https://t.co/XVjW6rGcbg https://t.co/rouO7O0VwC

11

1K

108

1K

107K

Mat Miller

@matdmiller

2 days ago

@pol_avec @answerdotai Congratulations!

1

0

101

matdmiller retweeted

Rachel Thomas

@math_rachel

9 days ago

"Mastery is not about creating more outputs or products. It is about building genuine ability. AI can either decay or support human mastery. The people selling you AI models & your bosses at work don’t care about your mastery. They will put you in the decay world every time."

math_rachel's tweet photo. "Mastery is not about creating more outputs or products. It is about building genuine ability. AI can either decay or support human mastery.

The people selling you AI models & your bosses at work don’t care about your mastery. They will put you in the decay world every time." https://t.co/IwZD5VP90G

9

68

22

24

6K

matdmiller retweeted

John Carmack

@ID_AA_Carmack

9 days ago

It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase. There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities. Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.

179

2K

114

384

120K

matdmiller retweeted

Shreya Shankar

@sh_reya

19 days ago

V2, at 4x speed. Can't wait to put this out in the open (soon)

9

175

13

147

57K

matdmiller retweeted

hardmaru

@hardmaru

24 days ago

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

154

6K

639

4K

744K

matdmiller retweeted

James Long

@jlongster

28 days ago

https://t.co/SOTlMCjWdw

9

357

23

627

91K

matdmiller retweeted

Eric Ries

@ericries

28 days ago

Thank you @ycombinator and @garrytan for supporting these ideas to protect the next generation of founders

1

32

8

15

6K

matdmiller retweeted

Harrison Chase

@hwchase17

30 days ago

https://t.co/IFXwxW8Oac

13

176

22

357

85K

matdmiller retweeted

Unsloth AI

@UnslothAI

about 1 month ago

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://t.co/7gWhKnseZo Guide: https://t.co/7qzk6ypWDQ

UnslothAI's tweet photo. Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s.

GGUFs: https://t.co/7gWhKnseZo
Guide: https://t.co/7qzk6ypWDQ https://t.co/8ICXw7iV3G

132

2K

301

2K

141K

matdmiller retweeted

Nous Research

@NousResearch

about 1 month ago

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

NousResearch's tweet photo. Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data.

During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining.

Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE.

The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

150

4K

414

2K

449K

matdmiller retweeted

Thinking Machines

@thinkymachines

about 1 month ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku