VP @Nvidia, prev (COO/President@GroqInc) prev (co-f@definitiveio acq. by groq, co-f@autonomic acq. by @ford, and co-f@xtreme labs acq. by pivotal/@vmware)
A little over 5 months at @NVIDIA, and the GTC Taipei keynote captured just a glimpse of the incredible engineering happening here:
AI factories: Vera Rubin, Vera CPU, Groq 3 LPX, BlueField, Spectrum-X, DSX
Agentic AI: Agent Toolkit, OpenShell, Nemotron
Personal AI: RTX Spark, DGX Spark, DGX Station
Physical AI: Cosmos, GR00T, DRIVE Hyperion, Alpamayo
https://t.co/pWUtEvarQL
.@danawhite says one of the keys to longevity is to block out all negativity:
“It never even crosses my mind that something's not going to work. I just keep going until it does work.”
“There's this Bruce Lee quote where he says, ‘Never say negative things about yourself or what you're working on even if you're joking, because your body doesn't know the difference.’”
“I never take in any negativity.”
.@danawhite says one of the keys to longevity is to block out all negativity:
“It never even crosses my mind that something's not going to work. I just keep going until it does work.”
“There's this Bruce Lee quote where he says, ‘Never say negative things about yourself or what you're working on even if you're joking, because your body doesn't know the difference.’”
“I never take in any negativity.”
Not as relevant now :-(: I had an opportunity to deeply test both Fable 5 and GPT-5.6 Max. 5.6 is clearly better than Opus 4.8 at everything (slightly faster, too, though that depends on the load). Vis-a-vie Fable, it is clearly worse on coding, but better on agentic workloads. I had Fable write code, 5.6 run experiments - dreamy…
This is the narrative: AI is the most American technology - it allows anyone to start a business, provide their family a better life.
---
New Business Formation is Surging
https://t.co/sIBxqVsK8A
Great Stanford + MIT + Harvard + Anthropic paper.
Gives a clear training-based reason for why larger models learn abilities smaller models miss.
Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals.
The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts.
Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge.
In a crowded data mixture, common patterns get first claim on the model’s internal machinery.
Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again.
They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters.
The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less.
Larger models can remember weak rare signals long enough to turn them into real learned skills.
----
Link – arxiv. org/abs/2605.29548
Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"
Great Stanford + MIT + Harvard + Anthropic paper.
Gives a clear training-based reason for why larger models learn abilities smaller models miss.
Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals.
The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts.
Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge.
In a crowded data mixture, common patterns get first claim on the model’s internal machinery.
Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again.
They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters.
The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less.
Larger models can remember weak rare signals long enough to turn them into real learned skills.
----
Link – arxiv. org/abs/2605.29548
Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"
"AI maximalist" @markpinc is optimistic about the future of the technology because he thinks the true killer apps haven't been invented yet. https://t.co/LdVTGHxx14
Here’s a fun comparison between GLM 5.2 and Opus 4.8 on a one-shot reproduction of the SDPO paper
This is a hard task: the model must resolve messy verl issues and then run ablations to completion and confirm the paper’s claims.
- GLM 5.2 costs $6.21 while Opus 4.8 cost us $46.35
- Both models spent a bulk of their tokens resolving initial verl issues. GLM 5.2 attempted 14 failed runs before first success while Opus 4.8 attempted 9 runs.
- GLM 5.2 surprisingly took 2.65M tokens (excl re-reads) compared to 4.53M tokens for Opus 4.8
Welcome to Dub Nation, @IREN_Ltd 👏
Golden State and IREN announced today a landmark multi-year global partnership that will include the IREN badge on all Golden State Warriors jerseys beginning with the 2026-27 season.
SoftBank’s investor presentation is one of the greatest things ever made. I’ve been thinking about it all day. These are the real slides shown in a speech where Masayoshi Son said he wouldn’t retire for at least another decade. The goose stuff is perfect.
https://t.co/sk9cDhdWIE
“Inference is going to be one of the largest, if not the largest markets, not in AI, in the world.”
Altimeter's Apoorv Agarwal (@apoorv03) joined Bloomberg Tech @EdLudlow with @baseten CEO @tuhinone Srivastava to discuss the company's $1.5B financing and why he believes scalable inference infrastructure, open-source models, and enterprise control will be critical to the next phase of AI adoption.
Watch the conversation below:
https://t.co/pripKJcr8G
This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads.
Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.