Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- Itโs a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and itโs achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinseyโs tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
We're finally shedding the .so (thank you Somalia!), and using the .com for @NotionHQ. And for this beautiful moment, I want to share a fun story:
Back in 2018, I had just joined Notion, and one of the first things @ivan asked me to do was figure out how we could own https://t.co/BxoFvc83VG. I had never done a big domain purchase before, so I reached out to a few domain brokers to understand the landscape. We tried different brokers, kept things anonymous, and attempted to surface a price the seller might consider.
A year went byโฆ nothing. Meanwhile, it was pretty clear this was only going to get more expensive as we grew. We needed a different approach. A fellow founder connected me to a broker who took a very different tack. Less transactional, more long-term relationship builder. He spent months getting to know the domain owner. Turns out owner was a fellow entrepreneur in the west coastโฆ and a huge Grateful Dead fan.
So we figured, why not get creative? Something beyond just price. So I called up our investor Ronny Conway and asked if there was any way he could help set up a private meeting between the domain owner and the Grateful Dead. Ronny is one of those people who somehow makes impossible things possible. A week later he calls me back: โNew York City. Halloween. 15 minutes after the concert. Done.โ
The broker went back to the owner with an offer: some cash, some equity, and a private meeting with the Grateful Dead. That got his attention. He didnโt take the band meeting in the end, but he did lean into the equity (great call, in hindsight). We shook hands, and a few weeks later, the deal was done.
Iโve been waiting years for the day we move our product to https://t.co/BxoFvc83VG. Looks like 2026 is finally the year. Safe to say Iโm unreasonably excited about this update!
A year ago at GTC, Jensen brought out a DGX Spark in one hand and a MacBook in the other.
Yesterday, at GTC Taipei, Jensen brought out NVIDIA's new RTX Spark laptop in both hands.
This is the start of a new era of personal computing - the personal AI era.
In the new era, there are two competing platforms:
- @apple with macOS / MLX
- @nvidia with Windows / CUDA
Everyone will have an always-on personal agent that runs locally, constantly looking out for you, working for you proactively, monitoring the internet and talking to other agents. This will be a personal AI agent you own, that's private, that's aligned with you (not OpenAI or Anthropic). @karpathy calls it personal computing v2.
Let's set the scene for the new era of personal computing by diving into the one thing that will matter the most - the hardware.
The best hardware for local AI isn't what's running in a data center. It's a radically different problem. Here's a breakdown of the 3 most important things:
1. Memory.
LLMs are big. To run a model locally, you need to fit the entire model into memory. Apple (with Apple Silicon) and NVIDIA (with DGX Spark + RTX Spark) have both moved towards unified memory, which puts all the memory on one chip - leveraging cheaper LPDDR5X memory - useful for making more memory accessible to the GPU. The alternative competing architecture is a disaggregated CPU/GPU architecture - which is what the DGX Station uses. It has a large pool of slow LPDDR5X CPU memory (496GB @ 396GB/s), and a small pool of high-speed HBM3e GPU memory (252GB @ 7.1TB/s). It has a high bandwidth link (900GB/s) between the CPU memory and GPU memory, enabling fast disaggregated inference e.g. Attention on GPU, FFN on CPU. This enables running really large models like Kimi K2.6 (1T parameters) by offloading experts from CPU memory to GPU memory as they are needed. You could imagine something like this in a smaller form factor.
Hardware today:
- Apple M5 Max MacBook Pro: 128GB unified memory.
- NVIDIA DGX Spark / RTX Spark: 128GB unified memory.
2. Memory bandwidth.
In a data center, multiple user's requests can be batched together, which amortizes the cost of moving model weights into memory across many requests, pushing up arithmetic intensity to compute bound territory - meaning FLOPS matters a lot. Locally, everything runs at low batch size, which is low arithmetic intensity, i.e. memory bound - so FLOPS don't matter. What matters memory bandwidth. High memory bandwidth -> fast TPS. Low memory bandwidth -> slow TPS.
Hardware today:
- Apple M5 Max MacBook Pro: 617GB/s memory bandwidth.
- NVIDIA DGX Spark: 273GB/s memory bandwidth.
- NVIDIA RTX Spark: TBC.
3. Power.
In a data center, we talk about MegaWatts. Locally, we talk about Watts. Laptops have limited battery life. The best laptop batteries have a capacity of ~100Wh. LLM inference on a MacBook Pro consumes ~140W, meaning battery life with a persistent personal agent is less than an hour. This is unusable. The game will become how long can you run a useful agent on a laptop battery. Apple and NVIDIA will compete on how long an agent can run on battery - this will become the new battery life metric. This could be where an NPU or NPU/GPU hybrid really shines. Apple ANE has about 10x better power efficiency than the GPU on Apple Silicon (but has ~4-5x less memory bandwidth, with about the same FLOPS as the GPU). There will be an entire design space of how to build energy efficient agents - this will involve co-optimizing the harness, models, inference engines together.
Hardware today:
- Apple M5 Max MacBook Pro: Consumes 140W, battery capacity ~100Wh
- NVIDIA DGX Spark: Rated for 240W, consumes 140W. No battery (direct PSU).
- NVIDIA RTX Spark: TBC.
The hardware battle will be fierce, and I expect a move towards co-design, i.e. hardware designed *with* personal agent workloads. On top of this, models are improving, we're getting more intelligence per bit/watt, and open-source harnesses like @NousResearch Hermes / OpenClaw are improving rapidly. Within the next 2 years, we'll inevitably have unmetered, private Opus-4.8 / GPT-5.5 level intelligence running locally on a future version of a MacBook or RTX Spark. I like this future a lot better than the one where OpenAI / Anthropic control the intelligence layer of the internet and can rent-seek on intelligence.
Beyond this, NVIDIA is ahead on general AI ecosystem, i.e. the CUDA moat. Apple is ahead on local AI ecosystem, i.e. models quantized/rightsized for MacBooks, native macOS apps, and ease of setup. We'll see how this might change as the new RTX Spark also brings full native CUDA to Windows-on-Arm laptops for the first time, potentially closing the gap.
There are many other factors I haven't mentioned here, but I believe I've covered the timeless, most important things for the new era of personal computing.
Good math, but not all quite there:
First, SpaceX pays fairly average, but for more than a decade they have offered regular (~bi-annual) liquidity to employees. To live comfortably (especially to have a family) in LA County, most employees would have sold a little bit here and there, if not a lot (e.g., if they were the sole earner in a household).
Second, critically, because there is no double trigger (in order to facilitate the liquidity), most people default to "sell-to-cover" โ i.e., ~40-50% of their holdings are immediately sold to cover the taxes on vest. Remember these vests are W-2 events. In order to not do this, the employee would need to come up with significant cash (because the taxes are paid against the price at vest, not the price at grant) โ especially later on.
However, two things make SpaceX particularly awesome IMO:
1. They gave employees the option to choose stock or options along the way. Someone who took options and paid the taxes with cash would have done very well.
2. They gave stock to everyone. There are a bunch of highly skilled workers that we on X never think of, like Tube Benders, Orbital Tube Welders, Cleanroom Technicians, etc. that are going to make significant fortunes.
Maybe it's overly quixotic, but this last point is underrated part of @elonmusk attacking physical problems, not just software ones, with 100x thinking: a bunch of people in the types of jobs America needs and romanticizes (for good reason) will be rewarded with the kind of wealth that really would not be possible at any other company they would have chosen.
An incredibly positive story that, if you can't see it in that light, you should look inward.
overheard from a fortune 20 company - ceo asked for $1 billion in AI generated opex savings at the beginning of this year.
the team as a result has spent $200 million on tokens trying to achieve those savings year-to-date, with minimal results other than some modest Cx savings and a bit of savings on engineering due to less hiring driven by coding assistants. now as back-half budgets are being reviewed, it appears that the ceo has ordered token costs to be dramatically slashed as he/she doesn't feel the ROI is there yet (for their company).
gonna be interesting to see if this is a trend amongst the rest of the fortune 500.
Avoiding Death on the Yellow Brick Road: Why The App Layer Isn't Dead
The Yellow Brick Road is our shorthand for the path the labs are walking, where theyโre committing extraordinary resources.
The reason the labs are best-suited for problems like code generation, writing, or image-creation is because these problems improve with raw model capability: every dollar spent on pre-training and post-training improves product quality.
Meanwhile, the rest of Oz is inhabited by more complex, often vertical problems, that arenโt as simple as giving a business user a horizontal tool with access to standard tools and computer use.
The value comes less from the underlying modelโs raw capability (though thatโs still important!) than from the scaffolding around it that makes the output trustworthy, compliant, and operational inside a specific industry.
Full piece by @joeschmidtiv: https://t.co/84QN5Mj9T3
What it takes to reach $100B
Fewer than 0.1% of startups will ever be worth more than $100B, but those that do will have an outsized impact, so itโs worth understanding which companies have the potential and what it takes to get there.
Examining the history of these massively successful companies, it becomes clear that there are two ingredients necessary to reach $100B.
First, they must be building in a rapidly growing market of unlimited size. For example, Microsoft, Apple, Intel and AMD all emerged as part of the exponentially growing microcomputer market. These companies started when microcomputers were still relatively new and obscure. Micro-softโs first product, Altair BASIC, was incredibly niche โ MITS only ever sold about 25,000 Altairs, but that was the start of what is now a $3T company.
Likewise, Amazon, Google, and Facebook all became $T companies by growing with the Internet. Stripe ($160B) makes this dynamic explicit in its mission statement: โOur mission is to increase the GDP of the internetโ.
Why now? $100B opportunities only exist for a limited time. If a company could have been started 20 years earlier, then itโs unlikely to have $100B potential. Important new technologies create massive new opportunities, but those windows of opportunity donโt last forever. For example, it was not possible to start Uber or DoorDash five years earlier because mobile platforms such as the iPhone did not yet exist, and it wasnโt possible to create them five years later because the opportunity had already been captured.
Large but slow growing markets rarely produce $100B companies. For example, startups selling to dentists or auto mechanics are not good candidates to reach $100B. A simple test is to ask if demand will increase 10x or 100x in the next ten years. Startups thrive when capturing a slice of a rapidly growing pie, not fighting zero-sum games against incumbents.
The second ingredient is defensibility, a durable control point in the market. If your company is making billions of dollars, that will attract a lot of interest from potential competitors.
This defensibility is typically provided by one or more of the following dynamics:
- Marketplaces like Amazon, Google, Facebook, and Uber aggregate supply and demand.
- Platforms like Apple, Microsoft, NVIDIA, Salesforce, and OpenAI provide a foundation for large ecosystems to grow.
- Foundational infrastructure companies like TSMC, AWS, Stripe, and Arm become hard-to-replace dependencies.
- Workflow systems like ServiceNow, Intuit, SAP, Oracle, and Workday become the default systems of record and action.
- Deep tech companies like ASML, Tesla, and SpaceX require extraordinary technical, manufacturing, operational, regulatory, and capital execution to reproduce.
Competing head-on with these companies is nearly impossible. They effectively โownโ their slice of a large and rapidly growing market, which earns them high revenue multiples, lower cost of capital, and the ability to acquire smaller companies and hire top talent.
Probably fewer than 1% of startups have the potential to reach $100B, and of those, fewer than 10% will ever realize that potential. However, it is our belief that we can improve those odds by building a community of the most impressive founders working on the most ambitious ideas.
We are creating the โ$100B Seed Groupโ to bring together these early founders in our group office-hours format, to periodically meet, review, revise and strategize their Path to $100B. Apply now if you would like to be a part of the program. The first cohort will be limited to 10 companies.
Any startup from YC P26, W26, F25, or S25 is eligible. Applications are due by May 28th at 9pm PT.
https://t.co/HrECOFvyz8
I just got back from SF and I FEEL INSPIRED.
I spent 5 days with frontier AI model teams, AI startup founders, and 3 billionaires.
My takeaways:
1. I had lunch with 3 billionaires. All of them are buying SaaS companies and rebuilding them agent-first. They were deeply inspired by Bending Spoons and Ryan Cohen's eBay deal. Buy the company, cut the headcount, rebuild the tech, add agents, add features, make more valuable experience, raise prices.
2. The frontier model companies are hungry for usage data from the field. They can see API calls and token counts. They can't see the actual workflows. If you're deep in a niche using these models in ways the model companies haven't seen, that understanding is incredibly valuable. Usage intelligence is the new alpha.
3. Consumer AI is massively underbuilt. Every billboard in SF is either B2B inference infrastructure or vertical agent companies. The entire city is optimized for enterprise. Meanwhile you have companies like Cal AI doing $50M ARR in 18 months as a consumer app. I met with a cool few teams doing consumer AI (@paulscherer / @ekuyda)
4. MCP came up in literally every conversation. The companies exposing their product as MCP endpoints are getting pulled into deals they never pitched for. The ones that aren't are becoming invisible to agents. This is the new SEO. If agents can't find you, you don't exist. Building products for agents is the new zeitgeist in general.
5. Not uncommon for hot seed rounds to be $25-50 million valuations. I saw a Series A at $450 million
6. If I had a dollar every time someone mentioned "forward-deployed engineer" this trip I could have funded a seed round. It's the hottest role in SF right now. The person who sits between the agent and the customer, making sure everything actually works.
7. The mood around open source shifted. A year ago it felt like open source was chasing the frontier models. Now founders are telling me Gemma and DeepSeek are good enough for 80% of what they need at a fraction of the cost. The "which model do you use" conversation is being replaced by "which model for which task." Model loyalty kinda feels dead.
8. Voice agents came up more than I expected. Multiple founders told me voice is the interface for the next billion users. The billion people who will never type a prompt will absolutely talk to one.
9. The Obsidian community in SF is weirdly intense. Multiple founders showed me their vaults unprompted. Like showing someone your home gym. It's a flex now. The quality of your knowledge base (second brain?) is becoming a status symbol among builders.
10. Maybe it was just the people I met but the age of the founders is shifting. I met more founders over 40 this trip than any trip before and more founders under age 21 than ever before. Founders getting older and younger at the same time.
11. I spoke to a lot of fast-growing startups, VCs and frontier models who are hiring content creators right now.
12. The restaurant scene in SF is actually better than it's been in years. Founders are going out more. Alcohol is out, not surprisingly.
13. SF doesn't feel like the only place anymore. We all have access to the same frontier models. We all read the same X feed. A founder in NYC or Lagos is calling the same APIs as a founder in SoMa. So in the past it felt like SF was always lightyears ahead, doesn't feel that way anymore. It's okay not to live in SF and have BIG DREAMS.
14. The coworking spaces in SF are half empty but the coffee shops are packed. People want to be around people. I had a few startup ideas here....
15. Walking around the Mission I noticed something: the street-level businesses, the taquerias, the barbershops, the laundromats, none of them use any AI at all.
16. I heard the phrase "agent debt" for the first time. Like technical debt but for agents. When you hack together an agent workflow fast and never clean it up, the system prompts conflict, the memory gets polluted, the tools overlap. 6 months later the agent is doing weird things and nobody knows why lol.
17. Met a few people who carry two phones now. One for personal. One that's basically an agent terminal running Telegram or iMessage connections to their agent fleet.
It's always amazing to get that dose of inspiration in SF. I FEEL INSPIRED.
But I'm so happy to be back home, locked in and building.
We're 12-18 months into a shift that will take 15 years to play out. The urgency in every conversation was real.
What an incredible time to be building.
This is super interesting
You now have non-tech normal people outship tech people in terms of reaching revenue fast
I have lots of techy software engineer friends and they have been trying for years to get any MRR for their sideprojects and they still haven't
Here's an Indonesian girl, who's tapped into TikTok culture, knows what to ship, can't even code but ships it fast thanks to AI and gets to $800 MRR in the first month
So we're officially in a new time now: it's now literally just a competition of being as tapped into the culture as possible, to then be able spot a trend and rapidly built and launch a site/app/biz around it, and make money
There is little to any benefit being in tech now over normal people, maybe even the opposite as tech people are very up to date on tech things but often quite out of date on many non-tech cultural trends
This is a great thing, but a bitter pill to swallow: another gatekeeper wiped out and every tech builder has to now stop putting effort into tech skills, and instead put effort into understanding culture trends to see what to build next
And build it fast!
WOW! This is the first time we've ever been able to see Starship in space from another object. This view comes from modified Starlink satellite they just deployed that has a camera and a light on it.
Super cool!
Just found out that Berkeley course staff are writing hooks inside course repos so if a student opens an assignment in Claude Code or Cursor the agent will automatically ping the staff ๐ตโ๐ซ well played