Some thoughts on these points:
3. I think Kirkland is more differentiated than people realize. They bring in almost as much revenue as Airbnb. The Amlaw 100 brings in almost as much revenue as Nvidia. If these firms didnโt have some unique advantage you would expect this business to be more spread out and the rankings to fluctuate more. The newest firm in the AmLaw 100 is Quinn which is 40 years old. The average age is something like 100 years old.
4. These firms also arenโt interchangeable the way most people believe. Wachtell who does large public M&A is not substitutable for Kirkland who does private market M&A. These firms are highly specialized across over a hundred practice areas and industries and each of these has completely different expertise, processes and economics to make them profitable.
5. I donโt think Harvey can build better versions of Kirkland's workflows and that isnโt our goal. What I do think we can do is build generic M&A workflows / models and provide infrastructure to allow firms to customize those with their unique data / differentiation.
9. Iโm somewhat skeptical that firms like Kirkland are going to get unbundled. Low end work like NDA review etc have already been unbundled by in-house teams and folks like Ontra but that isnโt what top firms charge for. As a GC you use outside counsel because you have a very specialized legal problem and you want to pay someone a lot of money to handle it for you and take full accountability for the result.
OpenAI CFO Sarah Friar: "In 2026, if you want to buy more compute, good luck to you. Tell me because I dont know where else to find it."
Jason: "Elon has some."
Rubbing in the fact that Elon sold compute to Anthropic just a few weeks ago just to get back at Sam Altman
INVESTIGATION: A whistleblower has leaked Stanford's private foreign-funding records to the Review, revealing millions in funding from Chinese state-linked entities and CCP donors.
@gabepereyra Since your clients like Kirkland are moving to vertically integrate their own AI, what does that customer relationship look like in the future?
Maxi/mini takes on AI:
Maxi-
1) Cybersecurity will be an enormous consumer of tokens. All public-facing code needs regular, in-depth pen testing.
2) We are barely scratching the surface in code. Enterprises will have tens of thousands of apps, dynamically generated apps, per user/per customer apps, etc.
3) Enterprise adoption is still nascent. Leading companies are spending 10x+ as much on tokens as laggard companies.
4) Video/image understanding/generation will be huge drivers of token spend- as will anything to do with operating IRL/physics.
Mini-
1) Many use cases (consumer chat, various enterprise "agent" use cases) are close to intelligence-saturated. Intelligence saturation will render many major use cases today dirt cheap in a few years. See @StatueofIBBertY's posts on this topic.
2) Enterprise spend rationalization is real and imposed spend ceilings may be durable. As costs decrease/efficiency increases token volumes will go up, but budgets may not.
3) The global compute "shortage" is driven by capacity hoarding, not inference volumes per se (yes individual companies are inference compute constrained, to be clear). "Enterprise" adoption probably won't be fast enough to keep up with capex, we need prosumer-y use cases like chat/code that are very responsive to model quality increases. If we go 8-12 months without those, funding for training cools even slightly and hyperscalers like Meta pivot to being neoclouds, we could easily find ourselves in a compute glut.
4) No one knows what % of frontier model API spending is from cleverly hidden distillation attacks (or partly motivated thereby). Many, many deep-pocketed actors are motivated to do this at scale.
In general, I don't know if there's ever been a cycle where there are so many considerations on both sides that have order-of-magnitude-sized impacts on how supply/demand work out for tokens, infra, etc.
What a time to be alive.
Some enterprise tasks are challenging to hill-climb with RL-based methods since they involve very out-of-distribution behavior. On-policy self-distillation (OPSD) gives a model learning signal for every token it writes, far richer than the single scalar reward of RL.
But that channel is noisy: most tokens don't reflect the behavior you're after. We introduce Relevance-Masked Self-Distillation (RMSD), which uses a two-step filtered loss mask to cut through the noise and find the tokens with the highest signal. Compared to OPSD it trains more stably, provides higher data efficiency, and reaches a higher performance ceiling.
Run the math
No VC's, large allocation to points holders
Will open $1bn+ and get bid up as only way to own is open market
Not too late to farm, incentives soon
https://t.co/qMhSpXdtkC
@Shaughnessy119 The SPV craze is unlikely to die anytime soon because the IPO as a wealth creation event has largely died and preference has shifted to private markets
I am surprised more people are not paying attention to this update from Anthropic on its stock policy. This seems like a potential bombshell.
There is an active secondary market purportedly in Anthropic stock or derivatives including on fairly reputable (or at least well-known) platforms like Forge. Anthropic is calling them out *specifically*, by name, and essentially *saying* 100% of these are illegal.
Some may be frauds (people selling Anthropic stock or interests in Anthropic stock that they don't truly own), but more likely many are legit attempts at transferring Anthropic equity (directly, as SPV shares, or as some type of 'beneficial interest' or future, etc.)
Anthropic appears to be saying it will treat all these transfers as void. I don't have access to their terms, but it's very interesting to think what this could mean. Do the 'first purported sellers' in the chain potentially have an opportunity to do a double-dip? Does the first seller and all downstream buyers get the entire entitlement nuked?
Anthropic is threatening that--are they just bluffing? If they're not bluffing, what litigation is likely to ensue? This can get into really esoteric areas of corporate law that depend on exactly how the transfer restrictions are drafted as well as the language around how violations of transfer restrictions are treated--for example, if they are merely voidABLE then downstream buyers can assert various equitable claims/defenses, but if they are VOID ab initio then in some jurisdictions that forecloses equitable defenses.
some thoughts on the shape of foundation labs
1) epoch ai estimated anthropic @ $9m in revenue per employee and openai @ 5.6m in revenue per employee
2) these rates would be the highest among public technology companies; but, i'm not sure how valuable it is to look at on its own
3) the closest equivalents are quant firms like jane street @ $12m and hudson river @ $9m and energy infrastructure companies like valero energy @ 13m
4) revenue per employee is a complicated measure because a lot of it depends on accounting and different firms consider different things revenue
5) but, quant shops have high revenue per employee because they have a lot of revenue on top of a small number of specialized, expensive researchers
6) and, oil refineries have high revenue per employee because they can process a lot of oil with small number of employees, using very expensive tooling
7) foundation labs feel like a combination of these two things
8) like quant shops, they have a small number of very highly paid researchers and, like energy infrastructure companies, each employee is very heavily capitalized
9) traditional technology companies don't capitalize their employees very heavily; claude estimates nvidia spends $100k in r&d opex per employee per year, apple $80k per employee per year
10) in contrast, openai will probably spend ~$35bn in r&d compute this year with ~5000 employees; this would imply openai will spend ~70x what traditional tech companies spend in r&d opex per employee
11) now, in practice, this r&d opex spend is concentrated on a small team of core researchers and this would make the comparison even more stark
12) in this respect, they really are a new kind of tech business; they are not quite like hyperscalers, saas, ad-tech, e-commerce or hardware companies, etc...
13) they have unrivaled tam, distribution like plg saas, lower gross margins, an employee base more like quant shops, unique r&d dynamics, and capital requirements that if you squint sometimes look like a hyperscaler
I guess what you are saying is bulk-data problems are solved but for longer-horizon physical tasks it will require solving the long-tail problem. However, the difference between robotics and FSD is not all robotics tasks need the same number of nines (for tasks that don't risk human fatality as an outcome)
Also the breakthroughs in Fan's video are tackling some of the problems you state. The DreamDojo world model is video prediction conditioned on actions. It will inherently teach/learn physics bc in order to predict the next frame, the model has to know what actions to do the world
What is different in the robotics paradigm is that more compute will create more training data, as data is just the byproduct of running compute on the world model
I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;)
And stay till the end, more easter eggs and predictions for your polymarket!
00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum!
01:42 The Great Parallel
03:31 Robotics, the Endgame
03:39 Why VLAs fall short
04:32 Video world models as the 2nd pretraining paradigm
06:09 World Action Models (WAM)
07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation
11:06 EgoScale and the Dexterity Scaling Law we discovered recently
14:00 Physical RL: bridging the last mile
15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico
17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think.
Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.