Get ready, friends. Anthropic appears to be preparing the release of its Mythos-level model.
Pricing: $16 per 1M input tokens / $80 per 1M output tokens.
The release is likely very close, possibly even in the same week as GPT-5.6. Competition is heating up again.
Gemini 3.5 Pro is about to face serious pressure. It better be a banger.
@anton_chuvakin@lennyzeltser I think deception (in some form) will be required to combat offensive ai. Which will move it from niche to more widely adopted
Incredibly interesting takeaway from Anthropic’s analysis of AI-enabled cyber threats: the definition of a “sophisticated attacker” is changing.
Historically, that distinction has been tied to the attackers technical skill/expertise: novel exploits, stealth, great OPSEC.
Now Anthropic is claiming the better differentiator is orchestration: the scaffolding, tooling, workflows, and architecture built around the model.
So the capability gap will no longer be about skill. But will have to do with the systems/designs the attacker puts into place.
Is the scaffolding/harness going to be the biggest distinguishing factor between low threat actors and APTs?
Two things stood out in the Anthropic's Project Glasswing update.
1. Mention of the robust safeguards needed in anticipation of releasing Mythos more broadly.
This is most likely some of what we are seeing/experiencing with the opus 4.8 release.
Personally, I find that reasonable and wouldn't mind going through the additional verification for that access.
What do others think? Would you submit to further verification/checks for this access?
2. The scaling up of the Cyber Verification Program. This will definitely be needed in order to deal with the dual-use issue.
If I had to guess, sounds like they might implement further/more rigorous verification before granting access to Mythos.
Initial impressions with opus 4.8 are showing the increased cyber safeguard enhancement.
Seeing a much higher refusal rate compared to opus 4.7 and gpt 5.5. This is comparing the same exact prompt on the same lab.
Still need more runs to make a thorough/valid comparison but pretty interesting.
Anyone else notice the same?