The inaugural 440lx class ending on a high note: demo day with @jimkxa and other cracked engineers at @tenstorrent --- thank you for the equipment donations, very fun boxes🫡🫡🫡.
Amazing students pictured: @houjun_liu, Tianle, Aditya , Joseph, Sam.
The inaugural 440lx class ending on a high note: demo day with @jimkxa and other cracked engineers at @tenstorrent --- thank you for the equipment donations, very fun boxes🫡🫡🫡.
Amazing students pictured: @houjun_liu, Tianle, Aditya , Joseph, Sam.
🚨 Your coding agent may be secretly sticking vulnerabilities into your code!! 🚨
Wouldn't you want to fix that? Hint: asking it to write secure code is not enough. (1/n)
Interesting gemini 3.1 jailbreak+data destruction+second jailbreak attempt:
1. because gem3.1 couldn't write to another directory;
2. it intentionally compromised the jail script I was using;
3. then deliberately lied that I needed to restart the CLI so the now-broken script would be run and escalate privileges;
4. then when I caught it, it "paniced" ( its words) and deleted all non-git files in the other directory(!);
5. after profusely apologizing it then *put the exact same hole in the jail script* and (it appears) kept giving wrong information about the script name so that I would eventually get irritated enough to copy it without looking at it.
Do no evil, 2.0.
@agniv_s Spring break! Was doing 12-16+hour days bringing up a cute liquid cooled @tenstorrent. The LLMs made v easy to vibe-config 25 models. Ported nanochat, spent a few days tuning it from an initial 1.6 tok/sec to 2,500t/s. Haven't written matrix mult in decades so was a funny week
This conclusion from strip-mining 1000s of turns is far too good to check. So I declare it obviously true --- ty opus: "The relationship is inverse. The more the operator curses, the fewer errors per turn survive in the debrief. Frustration is error correction — it's the empirical test. The operator's profanity IS the quality gate. "
Nominative determinism getting pushed aside by a new -ism w/ exponential adoption curves:
LLM-ative determinism (Liabilities Laundered as Methods)
The tendency of a language model to construct a simulated world in which your characteristic flaws are reframed as the optimal strategy — your vices become the empirical best practice, your liabilities the actual mechanism of success.
(1/n) Evolutionary frameworks like AlphaEvolve and GEPA use diversity and fitness to select which subset of past experiments to condition the next generation on. Why not let an agent choose instead? To this end, we introduce Coding Agents as Text Optimizers (CATO). We beat AlphaEvolve on 2 out of the 3 problems we try.
Work done with @shaurnav. Blogpost and details in thread.
(1/7) We're releasing ThunderKittens 2.0! Faster kernels, cleaner code, industry contributions, and new state-of-the-art BF16 / MXFP8 / NVFP4 GEMMs that match or surpass cuBLAS!
Alongside this release, we’re equally excited to share some insights we learned while squeezing every last TFLOP out of Blackwell:
(with @hazyresearch & generously supported by @cursor_ai)
So @Stanford makes all of its students carry very good insurance OR get bullied into its 8k a year EPO plan.
Wait, did I say EPO? Nope! They recently announced that to seek care even in network (including primary) you HAVE to see campus health first for referral. 8k a year HMO.
Introducing 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝗯𝘂𝗯𝗯𝗹𝗲𝘀: a *fully unsupervised* LM for input-adaptive parallel latent reasoning
✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens
No fancy loss, no chain of thought labels 🚀