@stephensacks@grok I'm not sure—I'm not a deep expert in this area, nor do I have that much personal experience using Grok. (I generally find the big 3 more useful, each in different ways.) It certainly seems to be fine-tuned to have at least a superficially more oppositional character.
It genuinely feels to me like GPT-5.2 and Opus 4.5 in November represent an inflection point - one of those moments where the models get incrementally better in a way that tips across an invisible capability line where suddenly a whole bunch of much harder coding problems open up
This is close to, but not the same as, "LLM in a loop with tools," because (in the context of the piece) it emphasizes the significance of the shift to one universal, general-purpose tool which is "just using a computer" (e.g. Bash, etc.)
@simonw This quote from the recent "Bitter Lesson of LLM extensions" post (https://t.co/bgqghTsJqK) resonated in a way that felt like it belonged in your canon:
"An agent isn't just a[n] LLM in a while loop. It's an LLM in a while loop that has a computer strapped to it."
@jon_barron Also, back of the envelope carbon analysis: CVPR has ~10k submissions and ~10k attendees. If the mean attendee flies round trip NY-LA (some less, some much more), that’s 1 MT CO2. Equivalent mean compute / submission is ~3000 H100-hours (4 months) with average US electricity.
@abrakjamson@simonw This is great, thanks!
I’m unsurprised that Microsoft is out front on this given their longstanding enterprise productivity tools focus and resulting culture. (And good on you for it all the same!)
I’m very surprised that others aren’t taking it more seriously by now.
@simonw Anyway, could be another useful thing to elevate more prominently with your platform, as you so effectively have for the “lethal trifecta”!
(And: big fan/love your work/thanks for everything you do :)
@simonw I understand why lawyers don’t want them to promise specifics, but it seems like a huge problem not to have a clear answer. I would have hoped the product/business owners would see this bigger picture cost/benefit and overrule the lawyers’ narrow conservatism by now.
@simonw@bleuonbase I don’t know either, and it’s *possible* the labs are just empirically confident it won’t memorize PII. But they memorize an awful lot of the training set! See all the stunts to regurgitate copyrighted content. Seems like a huge risk.
I therefore suspect Google *aren't* e.g. directly throwing chat history into pretraining. But it certainly seems like something everyone—both users and providers—should want to be very clear about.
It would be a privacy and PR nightmare any time people figured out how to exfiltrate private information memorized from training on chat logs or browser usage.