Ex-OpenAI Tech Lead, Justin Lebar joins SemiAnalysis as an Visiting Fellow to Burn $10,000 in 3 hours to find dozens of AMDGPU LLVM, x86 LLVM, NVPTX bugs
00:00 - Intro & Justin’s background
00:59 - How compiler fuzzing works
01:56 - Why we did this project
02:48 - The gap in GPU vs. CPU compiler testing
04:13 - The major AMD & x86 bugs we found
05:38 - Using LLMs to read code & find vulnerabilities
07:56 - The impact of UltraCode mode
12:18 - Doing this without AI (Time & manual limits)
15:03 - The future of AI in software development
16:17 - What’s next + key takeaways for devs
Excited to share how Anthropic's data team has automated 95% of business analytics queries with Claude. Blog post covers how we approach evals, ablations, and online validation!
1. DGX Station with GB300 superchip and up to 748GB of memory. The numbers here are pretty insane.
2. RTX Spark laptops with 1 petaflop of AI performance and 128GB of unified memory. This is the one I'm the most excited about!
DGX Station and RTX Spark ← These 2 alone are really exciting! You will be able to run trillion of parameters in your personal hardware. We are talking about frontier models.
But there's even more:
3. OpenShell is now in GitHub Copilot. This brings you sandboxing and policy-as-code, which are huge for agentic systems.
4. Claude now runs natively on NVIDIA's GB300 systems on Azure.
5. Nemotron 3 Ultra is out. This is an open frontier reasoning model built for long-running agents.
They kept announcing things, and the list is too long, so here is the blog post with everything:
https://t.co/lnuCfqPad6
Thanks to the Microsoft and NVIDIA teams for partnering with me on this post.
Workflows are the biggest upgrade to Claude Code’s capabilities since skills and subagents.
I dove deep into it with @sidbid to figure out best practices, examples and more. I’m particularly excited about the non-technical tasks it enables for Claude Code.
I'm yet to see an agent running inside a browser that doesn't feel like a hack.
I tried a headless browser, but I can't use my logins with it. I tried a Chrome extension, but it keeps killing my sessions.
The team behind ego's whole argument is that browsers were never built for agents, so everything we have today is a patch on top of that.
They rebuilt the entire thing, and it looks promising:
• You can run multiple agents in the same browser
• Each agent owns its own space
• Agents work in parallel
• You can watch a space, take over it, or kill it
This is all Chromium underneath, so you can keep using your extensions, bookmarks, etc.
The best part is that you aren't locked to any assistant: you can use this with Claude Code, Codex, Cursor, or whatever you use.
Here is a link to check it out: https://t.co/i4t5sFxVmN
Here is a repository with 30 open-source, end-to-end agent examples.
These are very sophisticated workflows using Google ADK. Their architecture diagrams alone are worth gold.
• Full documentation
• Source code
• Ability to one-click deploy
Video and links below.
best accounts to follow from each frontier lab to stay constantly up to date
Anthropic
@karpathy - must-follow account for AI; recently joined Anthropic
@bcherny - Claude Code creator, always shares great tips
@trq212 - also a Claude Code developer; writes amazing articles on CC
OpenAI
@polynoamial - works on reasoning research, shares a lot of technical details
@gabriel1 - Sora developer, great career path
@jxnlco - works on dev experience, shares a lot about Codex
Google AI
@OfficialLoganK - all the major Google Gemini and AI Studio updates
@ammaar - product and design; shares great things about vibe-coding in Google AI Studio
@fofrAI - cool use cases for generative models
Cursor
@leerob - the loudest voice behind Cursor updates
@ericzakariasson - shares great insights on using Cursor
@mntruell - Cursor’s CEO; major releases and usage updates
xAI
@milichab - recently joined xAI, shares updates on Grok
@skcd42 - also covers major Grok releases
@elonmusk - Elon does a great job reposting and hyping all xAI products
who else did I miss?
You should try "Summarize from here" in Claude Code.
I think this is an underrated trick to deal with your ever-growing context.
Basically, instead of using /compact or letting Claude Code to compact your entire session, do the following:
1. Hit Esc+Esc (or type /rewind). This opens the checkpoint menu with every checkpoint Claude created during the session.
2. Pick a checkpoint that came after the context you'd like to keep.
3. Select "Summarize from here."
Everything before that checkpoint will stay exactly as it was. Everything after will get collapsed into a compact summary.
You keep the valuable early context (specs, decisions, constraints) and get rid of the crappy noise.
Really cool way to find out which models you can run on your computer:
1. Install llm-checker
$ npm install -g llm-checker
2. Detect your hardware
$ llm-checker hw-detect
3. Get a recommendation
$ llm-checker recommend --category coding
Here are some of the recommendations I got:
Jake Heller, CEO of Casetext, sold his AI company for $650M and recorded the whole playbook.
Most founders who exit for $650M disappear quietly and never say a word.
He went on stage and told everyone exactly how.
No NDA on the strategy, no locked course, no $10,000 mastermind, just a 39 minute talk and a link.
He covers what actually worked, what almost killed the company, and how they pulled off one of the cleanest AI exits ever.
39 minutes, $0, no excuses.
The article below shows how Kimi K2.6 replaced an entire dev team and how one guy built an $80,000/month agency solo with it.
Google published an entire library of highly sophisticated, end-to-end agent examples.
100% open-source.
• Complete documentation
• Source code
• Ability to one-click deploy
In the video, I break down one of the coolest examples in this collection.
Singapore’s Foreign Minister, Dr Balakrishnan casually explaining how he built his own AI agent (a 2nd brain for diplomacy) using Claude & WhatsApp integration etc. on a Raspberry Pi
“You cannot govern a technology you have only been briefed on.” 🇸🇬
Hospital systems keep getting bigger, crushing competition and sticking families with higher bills. The
@WaysandMeansGOP hearing exposed it: hospitals defended abusive fees while patients pay the price. More consolidation means more pain for patients https://t.co/sicOkvg4CU
Seen a lot of conversation about how predatory the American medical system is.
So I will weigh in.
I ended up going to the ER about 2 weeks ago for crippling pain. Turned out to be a ruptured ovarian cyst. I was there for MAYBE 4 hours.
My bill? $13,500 dollars.
Because I'm uninsured (by choice, that shit is a SCAM), the hospital dropped my bill down to $8,100 and some change as an "uninsured" discount.
For starters, $13,500 for a 4 hour hospital visit is insane as it is.
But the fact the hospital can wipe $5,000 off the bill "just because" should show you how utterly fucked this system is.
And to be clear - $8,000 is still an absolutely insane sum of money when all these people did was scan my stomach and give me some pain killers.
On my itemized bill, my CT scan was 7k. The iodine they used was $900. Just being in the ER room alone was $2,500.
We phoned the hospital to haggle. They dropped the price by $20.
Normal people can't survive this shit. I do okay and $8,000 is still an INSANE chunk of money out of my savings.
Anyone who argues this isn't a disgusting, predatory system is crazy. And it is even crazier that Americans accept this.
And for those of you who argue this is the free market, I need you to be quiet. There can never be a true free market here when government and insurance have their creepy little fingers in this pie.
People shouldn't go bankrupt trying to pay medical bills. This has to change.
Karpathy didn't make a course.
He made THE course.
3 hours. Free.
Tokenization. Attention. Hallucinations. Tool use. RLHF. DeepSeek. AlphaGo.
Every behavior you've ever wondered about in an LLM - where it comes from, why it exists, how it was engineered.
The gap between engineers who understand this and engineers who don't isn't technical depth.
It's the ability to conceive of entirely different things.