modded-nanogpt WRs ๐ค Cautious Weight Decay
Pretty much all the @speedrun WRs since November have used CWD in some form (e.g., @varunneal's "CWD w/ schedule").
Huge thanks to everyone experimenting and sharing results โ shoutout to @varunneal, @classiclarryd, @ChrisJMcCormick, @roeeshenberg, @YouJiacheng, and the other contributors.
Paper ๐ https://t.co/0qWhQnUuLv
My annual MRI scan gives me a USB stick with the data, but you need this commercial windows software to open it.
Ran Claude on the stick and asked it to make me a html based viewer tool. This looks... way better.
Claude Code with Opus 4.5 is a watershed moment, moving software creation from an artisanal, craftsman activity to a true industrial process.
Itโs the Gutenberg press. The sewing machine. The photo camera.
"The enormous shortage of ability to compute is distorting our work, creating problems where there are none, making others impossibly difficult, and generally causing effort to be misdirected. Shouldnโt this view be more widespread, if it is as obvious as I claim?"
-Moravec, 1976
In my annual letter this year, I really try to get at what it means to "feel the AGI."
Featuring compute, inevitability, second-order effects, travel tips, Andor, and Isaiah Berlin.
https://t.co/tnamR3EWL0
The problem is what is the pre-training recipe for understanding all of these different inputs? Also can some of these "tools" can be finetuned? I fell like a universal tokenizer just a like alphabets of language is the first problem to solve
a claude-code version of robotics would be a policy that has access to tools like SAM3, specialized grippers, action vision, video-models... when needed, not by default!