Mythos at Palo Alto Networks "found more than two dozen critical vulnerabilities in around three weeks, roughly five times what the company would typically find using existing tools"
But the company "burned through more than $1 million worth of tokens using Mythos"
@Tim_Hua_ Over 1 month they spent a lot on Mythos but I would guess it has rapidly diminishing returns, i.e. if you spend $2M you're not going to find a lot more bugs. While if you keep scaling labor then you'll keep finding more bugs. This would keep capital share under control (for now).
The last line would probably be better: "Impressive autonomy, but not human level; a lot of reward hacking, some deception, but no egregious scheming yet."
End of an era at OpenAI -- Pamela did a streak of great work founding/running the econ research team, I’m grateful she hired me into it, and we continued many of the projects she started.
Makes sense! Though this all gets much messier if expenditure is based on *perceptions* of returns, not reality, and it seems likely that perceptions are way off right now, that people are throwing money at agents with only very vague idea about the long-run cost-benefit of the work they're doing.
I think most domains look like this at the moment: the returns to expenditure on agents diminish much more quickly than the returns to expenditure on human labor: (1/n)
Interesting! Could you elaborate on why these would push the expenditure-share of agents up?
When the expenditure-share on agents is 50%, this implies that the following two interventions would have the same effect on your productivity:
- Work twice as many days (holding tokens fixed)
- Spend twice as many tokens (holding your days worked fixed)
I think we're not there yet, & it seems to me plausible that we'll always have steeply diminishing returns to tokens, meaning value will exceed expenditure-share.
A more abstract argument: AI consolidates the knowledge of the world. If you spend a lot of tokens solving a problem once, then the knowledge of that solution can be distilled into future responses (whether through online learning, or post-training RM). This implies people will always get huge value from LLMs with a small amount of expenditure, and you'll only need to spend a lot of tokens on problems that are truly novel.
@herbiebradley (to clarify: the "testable predictions" are my conjectures about what happens in your thought experiment, which would make it consistent with my original claim)
OK here's an attempt to formalize your story, but I might be getting this wrong, do push back! Suppose we have Y(H,A). In your scenario you can do something 6 months alone, or 0.5 months with agent help: Y(6,0)=Y(0.5,A).
This tells us two points for the function Y(H,A), to draw the graph above we need to specify the whole function.
A simple funcitonal form is this: Y(H,A)=H^alpha*(1+A^beta). Here agentic labor is optional, while human labor is necessary. And each has diminishing returns.
My empirical claim would be that the the returns to agentic labor are diminishing more steeply. Testable predicitons for your thought experiment:
- If you double human labor, 0.5 months to 1 month, then quality increases a lot.
- If you double agent expenditure, A to 2A, then quality doesn't increase a great deal.
(Note that the value($) for the two axes won't add up to total value, because they're complements)
Could you elaborate on this? The aggregate elasticity will be the (value-weighted) average of individual elasticities, so I don't think it's generally true that we expect aggregate elasticity to be lower than individual.
I think elasticity will fall if you can adjust other factors at the same time (Le Chatelier). But this would perhaps apply to both human & agent labor.
Inference-time scaling rotates the red curve upwards, increasing the elasticity.
But there's also the countervailing force of distillation: once an agent solves a problem once, it then becomes cheap to do it again.
A test for this: if you doubled your token use, how much would you increase the value you get from AI? This gets elasticity.
My guess would be it's much less than double. (and if you don't usually hit your token limits then implied marginal value is zero).