Kalshi's first example of a small business using it as hedging tool is The Jeffrey, an NYC bar that's promising free drinks to all customers if New York Knicks wins NBA Finals Game 1 on Wednesday
México, desde La Conquista (+500 años) ha tenido 130 jefes de estado. Solamente UNO de ellos sabía hablar náhuatl, el idioma indígena dominante:
Maximiliano de Habsburgo.
I am really loving Claude Code for bespoke animations/simulations, where you probably wouldn't want to spend a week of a team's time to compress into 30 seconds but where CC can do that in about 10-30 minutes of one person's time.
(Probably possible in most modern coding tools.)
🚨 New Experiment: Everyone thinks AI firms will look like little companies. A manager model decomposes the task and worker models do subtasks. The manager red-teams, revises, and recombines. A seemingly simple org chart.
But when I ran the experiment, the current in-vogue org setup, manager-subagent, cost 4x more and performed worse than letting a rather simple market do the trick.
I tested 3 ways to organize multiple AI models:
1. Solo: Onefrontier model does everything itself
2. Hub-Spoke: A "manager" model splits tasks, delegates, red-teams, revises
3. Market: Models bid on tasks, winner gets the job, reputation updates
I also tested were 3 types of tasks - Coding, Reasoning and Synthesis.
- Coding required most "global state" management, which the solo model did best at. In future @a1zhang's RLM will probably do even better here
- Reasoning is the hardest to cleanly decompose, and the market worked the best here
- Synthesis too, the market beat hub-spoke as the framing could be ambiguous
The reason is, a hub isn't a "manager" as we know it. It's a model that must somehow know:
- What the subtasks are
- What good recomposition looks like
And if either fails, as it does for complex or not-easily-decomposable tasks, competent workers still produce garbage.
As we move from coding to letting multi-agent systems do work across the entire economy we'll end up with more not-easily-verifiable tasks with ambiguous settings and uncertain payoffs. In those, we won't be able to use the factory approach to get work done.
The Coasean argument is that firms will get smaller, and the smaller firms will transact more, since the organisational premium reduces with AI. But how? Through central hubs, or markets? The fact is, Coase here needs Hayek. Setting up markets is not trivial, as @AndreyFradkin and I looked in our recent paper.
Essay: https://t.co/kK3gMQfbCs
Many public discussions center around trends and statistics that are not real at all.
For over a decade, there was widespread public discourse about the causes of high and rising maternal mortality in the US.
But, as I've written about before , CDC analyses showed that the apparent rise from 2003 to 2017 was due to a change in measurement https://t.co/pBBcRoDoXQ , when a pregnancy checkbox was added to death certificates, which flowed directly into maternal mortality counts in most cases. Rather than mortality rising, the rate had been stable. Many deaths had been previously missed, and many other countries were undercounting maternal deaths.
This isn't an isolated case.
- People often cite the IHME's estimate of childhood height having fallen in the UK over the past decade. Looking at the data sources, it missed one of the key sources of data on height - a national dataset measuring the height and weight of almost all schoolchildren in the UK, which showed no decline (that data wasn't publicly available until an FOIA request) - and instead the IHME estimates were likely extrapolated based on a global model and smaller, less reliable surveys. https://t.co/dOxnt7ewPD
- I often hear claims about disruptive science having declined over time based on a highly influential paper in Nature. https://t.co/pTAlXnvanB But the key results were affected by a coding bug, which would have showed a decline simply due to this artefact https://t.co/0EXvL55Zer
- The idea that interstate migration in the US has collapsed has led to lots of concern about dynamism and unemployment. But recently, it's been shown that much of the apparent decline was a statistical artefact of how the survey filled in missing responses, causing it to systematically overcount non-movers. Correcting this shows only a very slight decline over time https://t.co/CeIp2kchWL
- The dramatic rise in autism diagnoses, which has spurred lots of commentary about pesticide use and vaccines, actually reflects changes in how autism was defined. In the 1960s, autism described severely disabled, mostly nonverbal children: if a child was verbal or succeeding at school, they were excluded from the diagnosis by definition. The criteria then widened across successive editions of the DSM. Alongside it, it became much easier to get assessed, from requiring a specialist with months-long waiting lists to something that could be done in a few appointments. https://t.co/0L1Y4tKCUd
--
I think this is a persistent problem of people undervaluing data quality and measurement. It may sound dull or academic to care about these issues, but numbers and statistics are a big part of public discussions. They can be the premise of debates that can go on for years and sometimes even decades, and mislead people about social and policy interventions to fix them.
So before spending time arguing about the causes and consequences of a trend or statistic and what should be done about it, it's worth digging into the data to see if it supports the premise at all.
I suspect there are many other discussions affected by this too. Are there others I've missed?
People often ask how breakthroughs occur in cancer biology-often the story is more complex - the survival plot for myeloma outcomes is extraordinary - improvements come about in incremental steps - in my lifetime treatment of Myeloma has almost transformed into a curable disease
AI agents are autonomously doing R&D, now what? @slimshetty_ and I give a formalization of the low-hanging-fruit metaphor & draw some implications:
1. Agents can make autonomous contributions without being full human substitutes.
2. You switch from agent-labor to human-labor as expenditure grows.
3. You can calibrate an agent's value by its human-equivalent time. (1/n)
We're trialling a new kind of forecasting tournament. The challenge: submit forecasting questions that trigger divergent predictions from the top AI forecasting systems.
There's a $25k prize pool for the question writers, allocated by how much disagreement you can elicit.
Motivation:
- AI forecasters are becoming competitive with human pros.
- Many questions are "solved", e.g. if I ask "Will a nuclear bomb go off in Europe this month?" all the models know it's <1%.
- Still, other questions are intractable, because of aleatoric uncertainty. "What will be NVIDIA stock price in 1 year?" Again, the models will agree (this time by being very uncertain), and there's not much to learn.
- If you can make the AIs disagree, you've found something interesting: a place where the AIs have divergent models of how the world works or differences in what information sources they're relying on.
- Identifying these wedge questions will help the field develop AI forecasters that can tackle genuinely challenging problems. This is exactly what we'll need them for, as we navigate the uncertain world ahead.
Please apply! Link in reply.
California’s only nuclear plant, Diablo Canyon, just won approval to stay open until 2045.
It was scheduled to shut down in August 2025. Now it will keep delivering clean, reliable electricity for 4 million Californians for another 20 years.
In 2022, I spoke at an American Nuclear Society event when Diablo Canyon’s closure seemed inevitable. The mood in the room was pure resignation. I asked the audience:
“How did we convince ourselves it’s easier to shut down a safe, operating nuclear plant that employs thousands of people… and replace it with renewables plus batteries… than to simply keep it running?”
Several people came up to me afterward and said they’d never thought about it that way.
That’s what happens when a narrative takes over: we stop seeing the obvious truth right in front of us.
But the days of insanity and delusion about the reality of our energy needs is over.
Long live sanity! Long live Diablo Canyon!
¿Se ha dicho todo sobre Rosas, el dictador republicano? Con toda razón, Marcela Ternavasio piensa que no. Como el presente ilumina el pasado tanto como el pasado al presente, en este libro vuelve a interrogar al Restaurador, su política y su tiempo. Un lujo para HH, @sigloxxiarg.
Overwhelmingly, we work on replacing humans.
Sometimes, we talk about augmenting individuals.
Too rarely, we think about improving institutions.
This is what @prashaant_x and I have been exploring. We prototype intelligent organizations, where an LLM reduces the coordination tax. Manifesto + Giuseppe (our first prototype) + sign up to be part of prototypes—all at https://t.co/2ieOAscvUX
the ABS system sits at the vexed crossroads of several highly charged dynamics in our collective life. successful challenges (by your team) feel amazing, like a long-awaited blow against capricious and unearned authority; but the overall existence (and putative infallibility) of ABS inevitably ignites anxieties about the superfluity of human judgement. and yet, the challenge system relies on human hubris, intuition, boldness, and risk. it's a very compelling encounter between populism and the machine.
People asked why I was so blown away by Claude Cowork, so I thought I’d puke some quick thoughts out
The true promise of Claude Cowork, and ultimately any sort of agentic, AI powered workflow tool is to realize the perfect embodiment of the organization as described by Peter Drucker, who famously said:
“Because the purpose of business is to create a customer, the business enterprise has two--and only two--basic functions: marketing and innovation. Marketing and innovation produce results; all the rest are costs”
Build the product and generate demand. That’s what drives value. Everything else is a cost
If you’ve never worked in a large organization, it’s hard to truly explain how many “costs” there truly are, and how many of those costs are just a coordination tax.
Take the launch of a new software product: The business needs to document how the product works, where it breaks and has errors. The support reps need to know how the support it. The onboarding and implementation team need to learn how to set it up. The Account Management team needs to learn how to upsell it and drive value through adoption. The sales team needs to learn how to sell it. The marketing team needs to position it in the marketplace and run campaigns about it. The partner network needs to learn it
The amount of coordination, repackaging, enablement, internal distribution etc is. Absolutely. Staggeringly. Enormous. Hundreds of people involved. Thousands at larger businesses.
Every one of these businesses have created convoluted templates and processes to document, enable, support, service, and sell
Now imagine taking all the market research, customer feedback, data, decisions, positioning, and yes, code, and cascading that automatically through the organization, repackaged using the templates that have already painstakingly been created and refined and honed through hundreds of launches, to the relevant team with the correct context and packaging, directly into the hands of actual internal or external end user
That’s the world that just got way, way, way closer to reality. In fact, the main reason it won’t happen any time soon are the people, many of whom will fight tooth and nail against this automation because they will fight like crazy to protect the status quo
This is why you are already seeing AI-native startups move so quickly. Because product launches are cascaded through the organization and out to the customer with way less friction than incumbents can ever dream of
Incumbents are going to have to whip their companies into the AI era. Their employees will not go willingly. But the future is here, and the startups are moving way, way faster
Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)
KISSINGER: The feasibility of…you’re asking about-
NIXON: Fusing with a sandworm, Henry. Yes. And ruling as God Emperor. I want a, I want a frank feasibility assessment.
KISSINGER: Mr. President, I want to…I’m going to approach this as I would any, ah, as a strategic question and simply note that the first obstacle would appear to be-
NIXON: The sandworms don’t exist. I know the sandworms don’t exist, Henry. I’m not- do I seem like a man who doesn’t know that? I’m talking about the principle. I’m talking about whether the architecture of the thing, whether a version of this, a, an analogous-
HALDEMAN: An analogous sandworm.
[EIGHT SECONDS OF SILENCE]
NIXON: Don’t do that, Bob.
The PM playbook was built on an assumption that the technology underneath your product is roughly stable
With the current pace of model progress, this is no longer true. Here's how we've evolved the PM role: