Loved these set of lectures by @a_kazemnejad and team on recreating the 'aha moment' from the deepseek paper, it was an absolute treat to watch these guys tackle the problems in real time (and I got to see them use vllm, which was fun).
Gonna spend sometime this week to recreate this and share my journey as a blog post or a series of tweets :). It's always nice to spend sometime on things that aren't related to my work and just enjoy learning.
new grads often ask me what they should be doing so they don't fall behind in the ai space. there's a lot, but its honestly super manageable. become intimate with model internals. proof based linear algebra. non-convex optimization. this is stuff you could've done in undergrad. it definitely takes some time and work, but its doable. have taste, have opinions. train a small model, then train a big one. vLLM internals, tensor parallelism. hand roll kernels. cluster orchestration. do you have opinions on synthetic data? why don't you? SFT, PPO, you should know this. learn Triton. everyone is reproducing papers now so you need to be doing more. do you know the semi supply chain? where are the bottlenecks? hardware, man, hardware. your little gpu rig erector set in your basement isnt gonna cut it. build a cluster, a big one. pretrain a 800B model. now postrain it. serve it to millions of people. you should be able to beat deepseek on some benchmarks now. its a lot to take in but it all snowballs. this what job security looks like from now on. do you want to work in tech or not
@thsottiaux just general curiosity, paired with external sentiment (youtube, friends, public opinion). Tbh a lot of them feel the same to me so i go off based on curiosity and not the urgency to try everything new that comes out.
Pro tip for anyone who wants to use LLMs for research - ground it.
The problem is that the ideas are not bad, but that they're not grounded. Any idea that cannot be linked to existing work will sound like the person is talking in air to me.
What I do everyday when trying to talk through my ideas is:
1. Ask it to critique my idea: What is good?, what is bad? What can we do to implement an MVP of this?
2. Ground it: Does this idea already exist? If so what are its flaws? What can we do to improve on existing work? Which area of work does this idea come from?
And while you describe your idea to the LLM, be sure to be elaborate with the story of the idea: don't just say
"I think X can be done, and I expect Y results"
you can try:
"Because I saw Z paper, I think X can be done and Y results can be expected"
This will also help your LLM understand where your idea comes from and will sometimes do the survey without you telling it to (if you constantly use opus it does this a lot)
Iโm all for using AI in research. But in the last two months, Iโve had three people come up to me with โnovel ideasโ, and none of them made sense. When I asked how they came up with them, they all eventually admitted that AI had told them to do it.
It comes down to few things, mainly what I am seeing here is:
1. Costs of 0-1: The models keep getting bigger and bigger, so hosting them alone is costly. GPUs ain't cheap.
2. Token efficiency: Models these days love to reason a lot and pretty much do nothing (GPT models don't as much, but claude loves to think for long periods of time). This costs a lot of money to do inference.
3. Optimizations: what made deepseek special when it came out is not just because they were on par with the frontier labs (i am talking about R1) but also because of the way training was done. They released a suite of repos and papers (mHC, engram and so on) and also kernels that help reduce costs by some extent. You can argue that the US labs are also doing the same thing, but not to the same extent that deepseek is doing because they don't have access to the same type of infrastructure that openai and anthropic has.
there are a lot of other factors, i am not able to recall them. hope this helps!
I am gonna crash out a lil bit about codex's flickering here.
tagging @reach_vb and @thsottiaux
First of all, codex is awesome. Love what you guys are building!
For some reason on windows terminals (I mainly use powershell and the terminal inside zed), codex loves to just flicker a LOT.
And this doesnt happen all the time, but whenever I want to do split screen between the terminal and open another file on the side. I'd have to wait a good 2-3 minutes sometimes for it to stop flickering and restore it. Sometimes it doesnt even stop and I have to close the terminal, open another one and resukme chat.
First for our new full-screen renderer (which should get rid of bugs like screen flickering), weโve made a number of fixes for different environments and terminals.
You can turn it with the command: /tui feedback
We're working on making it the default in Claude Code soon.