Omniscient Media

@ReadOmniscient

Intelligence on intelligence. I cover ~consequential~ AI through briefings, analysis, and commentary. Full site built by me through my business, ForeverBuilt.

Seattle, WA

Joined March 2026

47 Following

34 Followers

112 Posts

Omniscient Media

@ReadOmniscient

about 6 hours ago

The closest formal benchmark may be “Agentic Abstention”. This is a paper that was actually just posted to arXiv on June 27th. Definitely seems the field is converging on addressing this exact question. https://t.co/i322103je4 There’s also OptimalThinkingBench that tackles the token-waste side more directly. This one is from last year. https://t.co/KWMSOVFwpy

Omniscient Media

@ReadOmniscient

about 8 hours ago

@MisterDrDave Assuming you interpret this at 1000x more? Give the article a read 😅.

Omniscient Media

@ReadOmniscient

about 8 hours ago

Sonnet 5

Omniscient Media

@ReadOmniscient

about 8 hours ago

@guawuchang2000 @AiBattle_ BrowseComp and OSWorld-Verified, the benchmarks that measure whether a model can drive a browser and operate a real computer, is where Sonnet 5 shows a clear gain over 4.6 and narrows the gap to Opus. This would be one area where Sonnet 5 could be considered an improved model.

164

Omniscient Media

@ReadOmniscient

about 8 hours ago

Underrated comment! I agree and think this is exactly the right thinking. A generous explanation would be that single-shot tests like SWE-bench undersell the actual improvement Sonnet 5 was meant to bring, which is in finishing long agentic chains rather than nailing one isolated patch or another.

Omniscient Media

@ReadOmniscient

about 11 hours ago

Three wins for Anthropic in one day, plus Google's video model, a $3.1B industrial-AI deal, and Jim Keller shutting down the Qualcomm rumor. All of it in today's Bulletin. https://t.co/KIapmzzAcU

Omniscient Media

@ReadOmniscient

about 15 hours ago

I’m gonna give Anthropic a bit of a break as we close hour one of Fable 5 access. Just imagine the sheer quantity of trolling the model is facing 😂.

Tim Sweeney

@TimSweeneyEpic

about 16 hours ago

Thanks for keeping us safe Claude Fable 5!

251

276

277

350K

Omniscient Media

@ReadOmniscient

about 15 hours ago

In line with the message from Anthropic so far, though general coding was not called out by the model until asked directly. At which point, the response was “Regular coding stays on Fable 5 - the classifier keys on domain content, not task type”.

ReadOmniscient's tweet photo. In line with the message from Anthropic so far, though general coding was not called out by the model until asked directly. At which point, the response was “Regular coding stays on Fable 5 - the classifier keys on domain content, not task type”. https://t.co/uirPjlyOqz

Omniscient Media

@ReadOmniscient

about 15 hours ago

Based on my previous coverage, Fable 5 is the better-supported model right now, not necessarily the better model in every dimension. Its capability lead over GPT-5.5 (the prior benchmark) is large and third-party corroborated (Artificial Analysis, independent SWE-Bench Pro numbers). GPT-5.6 Sol's comparable claims are still resting on OpenAI's own preview figures. That said, "better" depends on what you're optimizing for. If you weight behavioral risk in autonomous/agentic settings, GPT-5.6's overreach tendency is arguably the more consequential flaw of the two, since it gets worse as models take on more independent action, whereas Fable 5's worst issues (the invisible safeguard, the over-refusal gap) were process/deployment problems Anthropic could walk back quickly, and already did to some extent.

973

Omniscient Media

@ReadOmniscient

about 18 hours ago

The bluntest of takes, but likely not too far off for many 😅. If for no other reason than Fable 5 being a blip before it got pulled down, yet I’ve seen so many posts implying the fate of various projects or entire businesses was hinging on the re-release. How could one get into that much of a jam in such a short time?

345

Omniscient Media

@ReadOmniscient

about 19 hours ago

My full review breaks Sonnet 5 down into what's new, what it actually costs, and whether you still need Opus. Link below for anyone interested. https://t.co/HvrqAL52ns

Omniscient Media

@ReadOmniscient

about 19 hours ago

"Opus-class autonomy at a Sonnet price" is the pitch for Claude Sonnet 5. The fine print: a new tokenizer means each task burns more tokens than the sticker implies. The discount is real. It's just smaller than it looks. Anyone have more details on the new tokenizer?

ReadOmniscient's tweet photo. "Opus-class autonomy at a Sonnet price" is the pitch for Claude Sonnet 5. The fine print: a new tokenizer means each task burns more tokens than the sticker implies. The discount is real. It's just smaller than it looks. Anyone have more details on the new tokenizer? https://t.co/MeY4qm30DD

ReadOmniscient retweeted

OpenAI

@OpenAI

2 days ago

We’re introducing GeneBench-Pro, a research-level benchmark for a harder kind of AI progress: how well agents can navigate messy biological data, choose the right analysis path, and make judgment calls that real computational research depends on. https://t.co/AsilnnSxnE

284

388

875

Omniscient Media

@ReadOmniscient

1 day ago

@eikkenberg @petergyang Fair enough 😆. The novelty might eventually wear off when people get tired of waiting multiple minutes for what they thought would be a yes or no answer.

226

Omniscient Media

@ReadOmniscient

1 day ago

@AnthropicAI Take a look at my breakdown of Fable 5 if you don't have time to read the full 319 pages of the system card 😅. https://t.co/FisRckdnhN

635

ReadOmniscient retweeted

Anthropic

@AnthropicAI

1 day ago

Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests. We’ve also begun drafting a consensus framework—with Amazon, Microsoft, Google, and other Glasswing partners—for assessing the severity of AI jailbreaks and how AI developers should respond to them. We invite other industry partners and model providers to join us in this effort. Finally, we’re scaling up our collaboration with the US government on model testing and safeguards. This will include pre-release access to models and safeguards for evaluation, information sharing on jailbreaks and misuse, and dedicated resources for joint research. Thank you to our users for your patience, and to our partners across the government, industry, and the research community who worked alongside us to make Fable 5 available again. Read our full blog: https://t.co/VHyum831ri

43K

14M

Omniscient Media

@ReadOmniscient

1 day ago

The full breakdown of what's in those 319 pages, and what the launch press missed, can be found at Omniscient Media. https://t.co/FisRckdnhN

Omniscient Media

@ReadOmniscient

1 day ago

Anthropic shipped its most powerful public model, then quietly built in a safeguard that degrades your output without telling you it fired. Page 13 of the system card admits it. Researchers pushed back; the feature was reversed in 48 hours.

ReadOmniscient's tweet photo. Anthropic shipped its most powerful public model, then quietly built in a safeguard that degrades your output without telling you it fired. Page 13 of the system card admits it. Researchers pushed back; the feature was reversed in 48 hours. https://t.co/tCR12pfuA9

ReadOmniscient retweeted

Anthropic

@AnthropicAI

1 day ago

We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.

84K

13K

14M

Omniscient Media

@ReadOmniscient

Last Seen Users on Sotwe

Trends for you

Most Popular Users