3 years ago, we emailed Jensen with requests for Blackwell. Today, we released GPT-5.3-Codex, a SOTA model designed for GB200-NVL72. Nitpicking ISA, simming rack designs, and tailoring our arch to the system has been a fun experience! I'm grateful to our collaborators at NVIDIA.
> Some companies hire heavily out of Twitter, some hire from communities such as GPU Mode or NanoGPT speedrunning.
To Nathan's point, I am leading an open source workgroup within the @GPU_MODE community (#teenygrad channel) in order to develop a deep learning systems course with an MIT-licensed book, codebase, and lectures which develops your own deep learning framework teenygrad from scratch which can run nanogpt. The project has access to some compute thanks to to the @LambdaAPI research grant (thank you @chuanli11)
This project has been a labor of love the past few months, bridging a must-needed pedagogical gap from micrograd to tinygrad. The SITP book develops teenygrad framework step by step, from a numpy clone, to a pytorch1 clone, to a pytorch2 style compiler. The SITP philosophy subscribes to the same views as @karpathy on education: it's a technical problem whose solution requires a ramp with empathy:
> ..education is the very difficult technical process of building ramps to knowledge...I feel like education is..a tangle of understanding and you're trying to lay it out in a way that creates a ramp where everything only depends on the thing before it.
The project's primary challenge for better and for worse has been the breadth of scope. A lot of time was spent "curriculum engineering", and we are now just getting to implementing accelerated cpu and gpu kernels with automatic differentiation in earnest, but it has a good line of sight towards fusion compilation using tinygrad's RISCy IR. The good news for you is that now is a perfect time to help the workgroup, and to come join in learning from the best. There are some heavy hitters here led by @marksaroufim, @m_sirovatka, @a1zhang, @gaunernst and more.
Links below ⬇️
@0x49fa98 Ppl that put the ball in the hoop and I don’t see listed elsewhere — isa fulford, allison tam, Christina Kim, mianna chen, Lia guy, Angela Jiang, Rachel Lim
To preserve or improve chain-of-thought (CoT) monitorability, we have to be able to measure it.
I'm excited to announce our new research on this at OpenAI
Google DeepMind's Nando de Freitas:
"Machines that can predict what their sensors (touch, cameras, keyboard, temperature, microphones, gyros, …) will perceive are already aware and have subjective experience. It’s all a matter of degree now."
I think we need to revisit the discussion of when consciousness and self-awareness begin.