In engineering the new SAGE-3.0-4B model for the wellness domain, our primary objective was to maximize factual reliability and minimize dangerous hallucinations. To achieve this, we made a deliberate architectural shift: we completely removed generic coding datasets from the training pipeline.
The Win (Safety & Truthfulness): We achieved a massive 18-point surge on TruthfulQA (rising to 64.0%) and a significant boost in situational common sense (HellaSwag rising to 25.6%). The model is now vastly more resistant to misinformation and better suited for clinical and wellness contexts.
The Trade-off (Math & Coding): Because mathematical reasoning in LLMs relies heavily on the logical patterns learned from code, removing those datasets caused a expected dip in GSM8K (dropping to 33.5%).
The Bottom Line: SAGE-3.0-4B was not built to solve algebra puzzles or write software; it was engineered to be a highly dependable, safe, and factual companion for health and wellness applications. We traded abstract logic for user safety.
Claude Fable 5 & Mythos 5 — Anthropic's new Mythos-class models:
• SOTA on nearly all benchmarks — longer tasks = bigger lead over Opus
• Stripe: months of engineering into days. 50M-line Ruby migration in 1 day
• FrontierCode: highest frontier score at medium effort
• Vision: beat Pokémon FireRed with vision alone, rebuilt web apps from screenshots
• Memory 3x more effective than Opus 4.8 across millions of tokens
• Mythos 5 drug design: matched human operators autonomously, ~10x faster
• Scientists preferred its biology hypotheses 80% of the time vs Opus
• Safety classifiers auto-fallback to Opus 4.8 on cyber/bio queries — 95%+ sessions unaffected
• $10/M in, $50/M out — half the price of Mythos Preview
• Included on Pro/Max/Team through June 22
• Mythos 5 (cyber safeguards lifted) restricted to Glasswing partners only
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
One of the Discord Community servers we operate https://t.co/MEMtS2NR0e is seeking staff and moderators 🛠️ If you’re interested⚡️apply directly: https://t.co/NmzTIFwdp7 or join the server and apply in the #guild channel!
ROSY will also be rolling out 💵 PAID staff positions this June. Join and get familiarized!
NVIDIA N1X SoC specs:
• Up to 20 CPU cores
• Up to 6,144 Blackwell CUDA cores
• TSMC 3nm process
• Up to 128GB unified memory
💻 New era is definitely incoming.
SAGE2mini-2B is in testing — our most compact agentic model yet.
• We trained it for tool use, long-context reasoning, and wellness-domain alignment. Despite being under 2B parameters, it handles 262K context windows and runs fully local.
🪷 1/2
SAGEmini-2B 🪷 (~1.2GB RAM @ Q3 with K-Quants) will be able to run locally on even an iPhone 13/13 mini (4GB RAM)
Although 2B is already compact, we’re exploring even smaller variants to boost efficiency and improve stability on low-RAM devices.
SAGEmini-2B will be largest model in the SAGEmini family.
One of the Discord Community servers we operate https://t.co/MEMtS2NR0e is seeking staff and moderators 🛠️ If you’re interested⚡️apply directly: https://t.co/NmzTIFwdp7 or join the server and apply in the #guild channel!
ROSY will also be rolling out 💵 PAID staff positions this June. Join and get familiarized!
Honored to be selected for Xiaomi MiMo Orbit’s 100T Token Grant for Builders.
Full access to MiMo V2.5’s reasoning, multimodal, and TTS models — huge thank you to the @XiaomiMiMo team for the support.
Meet Hy3 preview: @TencentHunyuan's most capable Hy model yet.
Built for coding, search, agents, and next-gen applications. 256K context window. 40% inference efficiency gain. Open-source and available now.
📖 https://t.co/3cytnXfsGW
#AI#LLM#AIForGood
Anthropic's Claude Mythos Preview found thousands of zero-days in every major OS and browser. A 27-year-old OpenBSD bug, a 16-year-old FFmpeg flaw hit 5M times by automated tools without detection.
The model won't be released. Instead, Anthropic formed Project Glasswing — Apple, Google, Microsoft, NVIDIA, CrowdStrike, and others — using Mythos defensively to secure critical software.
83.1% on CyberGym, 77.8% on SWE-bench Pro, well beyond Opus 4.6.
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://t.co/NQ7IfEtYk7