@georgepickett Yeah, I'm unsure if the faster time-to-validation of evals or the new requirement of an eval in the first place will win out - esp for less verifiable tasks.
@chamath Yes and no.
Your fundamental premise is correct, but we've seen tremendous value in model-harness symbiosis, since Nov. '25, which isn't reflected in Chatbot Arena.
The US-China gap for end-to-end AI has widened at the frontier, and is similar cost/value when run in low mode.
Curious - how do you define a "wholly autonomous company"?
An entity that reports to minimally-involved (human) owners is fair game to me.
For "rogue" AIs (i.e. close the loop on self-hosting, report to no one), it gets blurry: what should be banned beyond existing corporate regs?
We go where we need to be, and today that was @NASAKennedy.
Some of my senior engineers and I spent time at @blueorigin with @JeffBezos and @davill, speaking with the workforce and seeing the damage at LC-36 firsthand. I appreciated the opportunity to hear directly from those working through the aftermath and better understand the challenges ahead.
There is a lot of work to do, but this is exactly why people choose careers in aerospace, whether at NASA, Blue Origin, or across the industry. The talent in this field thrives under pressure and performs at its best when solving the toughest problems.
We have been saying for months at NASA that we are not going to sit on our hands and wait for the capabilities necessary to achieve the nation’s most pressing objectives. We are going to take an active role alongside our partners, just as we did in the 1960s, to overcome setbacks, remove obstacles, and deliver the intended outcomes.
@NASA is committed to helping the Blue team recover, continue to advance their lunar lander and get New Glenn back to launching as soon as safely possible.
America’s greatest achievements in space were never the result of avoiding setbacks. They came from overcoming them. We have done it before, and we will do it again🇺🇸
I think you have to look outside of "tasks" and "domains" for the answer.
It's the difficulty models have in actually recognizing these "unanswerable" questions. This is really apparent in their inability/unwillingness to seek clarification from users on genuinely subjective/creative decisions.
These Claude dupes seem like another (more sophisticated?) version of the "fully duplicated Salesforce" that came out of some guy's Ralph loop.
It cannibalizes the market for the free tier users who occasionally need the headline functionality, but the power users of the "killed" app aren't switching.
@mattyglesias@AlecStapp Setbacks above a certain level are a great way to add height without being claustrophobic.
Floors 1-4 can be flush to the property line, 5-8 can be recessed without impeding sunlight to the street.