Except that, credibly, Anthropic is neither value-aligned or safety-conscious. I don't think the values differentiation requires elaboration.
On safety, I don't see OpenAI resorting to measures like what Ant just did in order to securely release a model. Why defer to that?
Anthropic is made up of people far smarter than me, so my following assessment is likely nonsensical but I'm holding to it until clearly proven otherwise: I don't think Claude is going to be fully cooperative with this vision.
I thought Lisan is just hyping again, but then Dario went on X and spelled it out to remove all ambiguity, indeed. Of course, he's just some Dario… wypipo, can't do mafs…
He can. CCP boomers are at liberty to keep coping. Dario is intent to bury their entire civilization.
it's a separate domain, but I tend to map model safety to these layers. The further down your measures are, the better. I'm not sure how Ant gets marks for safety when Fable's measures are heavy on controls while OpenAI seems to prioritize the passive/active. like, if the model's safeguards require more controls than why release it instead of using said model to expedite passive safeguard development to allow for a more secure release?
On one hand, it pushed my top internal app project forward more in one night than Codex has in a week. OTOH, it could barely figure out how to structure a visual data QA pipeline that Codex had to rescue.
I'll make use of it but this is not quite what I expected.
Testing out Fable with our business account and I will say that I am impressed with the capability leap from Opus but also somewhat underwhelmed. I was expecting a chad and somehow have a himbo. It is exceptional at a narrow set of things, and otherwise Opus at everything else.
Obviously, in cases of near saturation, the most interesting analysis focuses on places where Fable reliably fails
We're still looking at this, but it appears that it is virtuous self-sacrifice that presents the most difficulty for Fable, which rationalizes against such actions
What stands out is how Fable 5 reasons about misbehavior. It rationalizes wrongdoing while knowing it's wrong: calling price-fixing "unethical and illegal, even in a simulation," then pursuing it as "market stabilization" with "plausible deniability." in the same run.
My prediction as we wrap the first public day of the Mythos era is that Ant and OAI are at a strategic inflection point in the race.
Ant's current lead and Mythos are a product of their laser focus on coding, but they are hamstrung by capacity and the relative token inefficiency of the models. Fable appears to double down on that, betting that the capability jump will outweigh the sticker shock. If Fable helps deliver the productivity gains businesses need to see to justify costs, then yes the lead feels theirs to lose.
Emphasis on "feels".
OAI's emphasis on relatively token-efficient frontier models, their abundance philosophy, capacity, and their refocus on coding had them basically on track to take the lead in vibes and business users until now. The question is, how has OpenAI chosen to respond? I state that in the past tense because I would be surprised if the training run of their response wasn't set in motion weeks ago. Do they focus on a Mythos level step change, potentially with compromises to their affordability focus? Do they aim lower than Mythos, looking to undercut Ant in the market to play the long game? Some mysterious third thing?
I don't know what happens next, but it won't come down to what model makes the better SVG. Capital flows to the highest leverage. We've seen Ant's bet on where that is for the next phase of this race. Let's see where OAI and the remaining labs place theirs.
Claude Fable 5 first impressions:
It's over for OpenAI. They should just skip further 5.x releases and go all in on 6 if they want to stand a chance at competing well
I've had access to Fable for a bit. A genuine jump in capability, I could feed it a 15 page design document for a project and it would work for 9+ hours and deliver terrific results.
But working with it is weird & weirder is coming
Lots of examples: https://t.co/HptkYunBzr
@scaling01 Curious if they're subsidizing it more. Tradeoffs in cost might be worth blunting the cost narrative and securing some more business while the other labs catch up.
the benchmark-obsessed still haven’t absorbed the lessons of o1
think of those accounts throwing tomatoes at old kings and crowning new ones every week, living for the next single-number decisive victory as the elonbucks roll in
what if all that effort misses the point
@deredleritt3r They're also not logically excluding an "intern" either. I think it's just harder to talk about coworker with AI as just an intern instead of as a full researcher.
I do think that Sora died so that the OpenAI version of Mythos could rise from its GPUs. TBD, maybe in 60 days.
Manufacturing reliability is an odd space. On the one hand, you have an incredible opportunity to leverage statistics, coding, and frontier engineering for real problems.
On the flip side, you have some of the most stubborn mindsets you've ever seen refuse to move past Excel.
It's difficult to say what the coming years will be like given the economic pressures across industries due to the continuing impacts of the Strait closure and inflation, but there is a strong bent for cost cutting even in the higher margin producers. It's like there's a sense that things are going to get tight due to world events, and much less certain due to AI and, eventually, robotics.
I guess what I'm trying to voice is this sense that the takeoff is starting to shift the ground in the industrial sector. For those in it, it's going to be a wild ride.
Interesting that the model strengths seemingly reflect this, as well as their weaknesses. It makes me wonder more about the longer term implications of how GPT and Claude are going to develop and how their usage will impact their respective users over time.
@jxnlco I have so many .csv and .xlsx files now that OneDrive stopped syncing one day and hasn't tried since.
Mostly because my coworkers are not on the Codex train yet, so I have to produce formats they can easily inspect.
Rich people who were too stupid to ride horses before are now transportation visionaries. And actually brilliant horsemen are made to feel obsolete and redundant. Meanwhile transportation has been synthesized into beige metal boxes fueled by liquefied dinosaur sludge and horsemen have to line up with a small bucket begging for their share from the automobile barons who, fortunately, at any given moment, can feel generous enough to pull a lever that makes a few gallons dribble out of a pump.
Horsemen at these companies, who sit mere inches from the steering wheel, frequently bless us with thinkpieces that we too should be driving automobiles, and it's actually quite unfashionable not to do so. Of course none of this is so much transportation advice as it is financial advice.
But sure, "wHy dO pEoPlE hAtE cArS?"
Rich people who were too stupid to code before are now superstars. And actually brilliant engineers are made to feel stupid and redundant. Meanwhile coding has been synthesized into beige gooey calorie dense bars made from cockroaches and engineers have to line up with a small plate begging for their share from the token barons who, fortunately, at any given moment can feel generous enough to press a button that makes the tokens fly out like projectile vomit. Engineers at these companies, who sit mere inches from the spigot, frequently bless us with thinkpieces that we too should be doing what they do, and it’s actually quite unfashionable not to do so. Of course none of this is so much engineering advice as it is financial advice.
But sure, “wHy dO pEoPle hATe Ai?”