this is a lecture on how to hunt for investors, cofounders & new hires
complete breakdown:
00:00 8 billion people don't know you exist
00:24 No one knows you. No one cares.
01:09 Stop being a farmer
01:51 The one thing all great hunters share
02:10 How I got my co-founder in 30 seconds
03:30 You + them = 3 people
05:36 Missionaries vs mercenaries
10:31 Why you have to be generous with equity
12:35 The right vesting schedule
13:17 Always trial before you commit
14:19 The 9-to-5 specialist red flag
15:05 Why logos mean nothing
17:46 Hunt them on GitHub
19:37 Niche Discords and subreddits
20:58 Why paid referrals work
22:10 Hiring friends: the trap
24:39 Why advisors are bullshit
31:29 Should you raise at all?
35:06 Investors only buy growth stories
35:58 Fundraising is running an auction
37:36 The luck factor (and Twitter lies)
39:33 Angels vs VCs
47:21 The hierarchy: inbound > warm > cold
48:00 Never ask a VC for intros if they pass
48:59 The double opt-in rule
52:32 Fundraising is a full-time job
53:33 Why a 4-week VC chat is a "no"
56:22 The leverage trick
01:00:24 Fundraising advisors = scam
01:02:21 The one trait of every great founder
01:03:18 Your homework: a list of 100
10 YouTube channels that will make you an AI expert:
1. Andrej Karpathy - modern, practical lectures
2. Yannic Kilcher - Breaks down AI papers in detail
3. AI Explained - Makes tough ideas easy to get
4. CodeEmporium - Step-by-step AI coding
5. 3Blue1Brown - Visuals for neural networks
6. Lex Fridman - Interviews with top AI folks
7. CodeEmporium - Step-by-step AI coding
8. sentdex - Python for machine learning
9. DeepLearningAI - Courses from Andrew Ng
10. AI Coffee Break - Weekly updates on AI
Save this list.
Enabled fp8 training for +4.3% improvement to "time to GPT-2", down to 2.91 hours now. Also worth noting that if you use 8XH100 spot instance prices, this GPT-2 repro really only costs ~$20. So this is exciting -
GPT-2 (7 years ago): too dangerous to release.
GPT-2 (today): new MNIST! :)
Surely this can go well below 1 hr.
A few more words on fp8, it was a little bit more tricky than I anticipated and it took me a while to reach for it and even now I'm not 100% sure if it's a great idea because of less overall support for it. On paper, fp8 on H100 is 2X the FLOPS, but in practice it's a lot less. We're not 100% compute bound in the actual training run, there is extra overhead from added scale conversions, the GEMMs are not large enough on GPT-2 scale to make the overhead clearly worth it, and of course - at lower precision the quality of each step is smaller. For rowwise scaling recipe the fp8 vs bf16 loss curves were quite close but it was stepping net slower. For tensorwise scaling the loss curves separated more (i.e. each step is of worse quality), but we now at least do get a speedup (~7.3%). You can naively recover the performance by bumping the training horizon (you train for more steps, but each step is faster) and hope that on net you come out ahead. In this case and overall, playing with these recipes and training horizons a bit, so far I ended up with ~5% speedup. torchao in their paper reports Llama3-8B fp8 training speedup of 25% (vs my ~7.3% without taking into account capability), which is closer to what I was hoping for initially, though Llama3-8B is a lot bigger model. This is probably not the end of the fp8 saga. it should be possible to improve things by picking and choosing which layers to apply it on exactly, and being more careful with the numerics across the network.
🚨🚨🚨 @ycombinator just published its startup wishlist🦄
① Cursor for Product Managers
② AI-Native Hedge Funds
③ AI-Native Agencies
④ Stablecoin Financial Services
⑤ AI for Government
⑥ Modern Metal Mills
⑦ AI for Physical Work
Vibe coding promised to democratize software engineering.
We're living through a golden age of software. But the people drawn to vibe coding are largely developers, founders, designers, and PMs. They’re daily active users of X, fans of Karpathy’s YouTube videos, or regular listeners to Dwarkesh. They know what a terminal is.
Most people can't vibe code. Here's how we fix that: https://t.co/c2KbaGvuje
By Justine Moore
Here's my conversation all about AI in 2026, including technical breakthroughs, scaling laws, closed & open LLMs, programming & dev tooling (Claude Code, Cursor, etc), China vs US competition, training pipeline details (pre-, mid-, post-training), rapid evolution of LLMs, work culture, diffusion, robotics, tool use, compute (GPUs, TPUs, clusters), continual learning, long context, AGI timelines (including how stuff might go wrong), advice for beginners, education, a LOT of discussion about the future, and other topics.
It's a great honor and pleasure for me to be able to do this kind of episode with two of my favorite people in the AI community:
1. Sebastian Raschka (@rasbt)
2. Nathan Lambert (@natolambert)
They are both widely-respected machine learning researchers & engineers who also happen to be great communicators, educators, writers, and X posters.
This was a whirlwind conversation: everything from the super-technical to the super-fun.
It's here on X in full and is up everywhere else (see comment).
Timestamps:
0:00 - Introduction
1:57 - China vs US: Who wins the AI race?
10:38 - ChatGPT vs Claude vs Gemini vs Grok: Who is winning?
21:38 - Best AI for coding
28:29 - Open Source vs Closed Source LLMs
40:08 - Transformers: Evolution of LLMs since 2019
48:05 - AI Scaling Laws: Are they dead or still holding?
1:04:12 - How AI is trained: Pre-training, Mid-training, and Post-training
1:37:18 - Post-training explained: Exciting new research directions in LLMs
1:58:11 - Advice for beginners on how to get into AI development & research
2:21:03 - Work culture in AI (72+ hour weeks)
2:24:49 - Silicon Valley bubble
2:28:46 - Text diffusion models and other new research directions
2:34:28 - Tool use
2:38:44 - Continual learning
2:44:06 - Long context
2:50:21 - Robotics
2:59:31 - Timeline to AGI
3:06:47 - Will AI replace programmers?
3:25:18 - Is the dream of AGI dying?
3:32:07 - How AI will make money?
3:36:29 - Big acquisitions in 2026
3:41:01 - Future of OpenAI, Anthropic, Google DeepMind, xAI, Meta
3:53:35 - Manhattan Project for AI
4:00:10 - Future of NVIDIA, GPUs, and AI compute clusters
4:08:15 - Future of human civilization
We need AI to work.
Marc Andreessen on why AI is arriving at a moment of economic and demographic pressure:
"Productivity growth for the last 50 years has actually been very low, not very high."
"So statistically, in the US, in the west, technology progress in the economy ... has actually slowed way down."
"Many countries around the world ... are actually going to depopulate over the next century."
"We actually need AI to work in order to get productivity growth up... We actually need AI to work because we’re going to need machines to do all the jobs that we’re not going to have people to do."
@pmarca on Lenny's Podcast with @lennysan
Founder emotions in 2026 be like:
• 9am: we're changing the world
• 11am: we're changing nothing, delete the repo
• 2pm: raised $50k from angels, god mode
• 4pm: one churned user, existential crisis
repeat every 6 hours