I've been getting a TON done with Fable today and I'm not hitting rate limits. Wanted to share some tips on how I'm doing that
1. I only use Fable on "high" effort for now. xhigh is token hungry. max/extra is a furnace with worse outputs than lower options imo
2. I taught Claude Code how to use Codex as a fallback for lots of implementation tasks. GPT-5.5 is incredibly steerable, and Fable can learn how to steer it
3. I wrote up a big section in my CLAUDE[.]md on how to prioritize different models for different work when orchestrating workflows and subagents
4. Things that are unnecessarily token hungry (computer use, codebase analysis, etc), I do with other models and report results back to Fable
what i’m noticing a lot in the field now vs 2y ago is the place inference has taken vs training
2y ago id have conversations with orgs that would want to take about training, fine-tuning specifically, but now there’s been and there still is a shift to inference aka host OSS models, but GPUs and serve users
that’s why you’re probably seeing many new companies popping up called AI factories, neoclouds simply because you can buy GPUs, rent them out to companies doing inference for their customers/internal users
that’s why we’re hearing much more of inference economics/tokenomics, and i think it’s going to continue to be even more relevant because if you’re selling GPU availability, then what matters is how much availability can you sell
so the problem becomes how do you optimise inference so your GPUs stay busy 100% of the time at best, which isn’t at all what’s happening in many cases
i’ve seen 50-60% GPU utilisation in orgs meaning their GPUs stay idle almost half of the time. that’s bad from an economics standpoint if your goal is to sell GPU availability
i’m wrapping my head around all of that as i’m speaking with customers so expect a lot more of those reports but defo super interesting conversations im getting into nowadays
This is the most hilarious thing I saw and did today
Ran gemma-4-12B-coder-fable5-composer2.5-v1-GGUF locally with 8 GB VRAM at 20+ tok/sec
Anthropic's Claude Fable 5 launched June 9.
By June 12 it was banned. I can't access it. You can't either.
But here's the twist: I'm running a model trained on its chain of thought at 20 tok/s on my RTX 4060 8GB.
Locally. Offline. No cloud. No export control.
Enter: Gemma4-12B-Coder GGUF (Q4_K_M)
Base: Google's gemma-4-12B-it
Fine-tuned on verifiable Python CoT data:
- Primary: Composer 2.5 real reasoning traces (only passing solutions kept)
- Auxiliary: Fable 5 used to redo the hard cases Composer missed.
Every training example's reasoning led to code that actually ran. No hallucinated logic.
Llama.cpp flags:
-m gemma4-coding-Q4_K_M.gguf -cnv -ngl 44 -c 64000 -v
(huggingface model link in comments)
Flag breakdown:
-ngl 44 → offload 44 layers to GPU (tune this for your VRAM)
-c 64000 → 64K context window
-cnv → conversation/chat mode
-v → verbose output
The irony writes itself.
Anthropic spent weeks telling the world Fable 5 (mythos) is too powerful to release. Then released it. Then got banned from serving it, including their own researchers.
Meanwhile: a Gemma 4 12B fine tune, trained on Fable 5's reasoning, runs fully offline on my mid range consumer GPU
No API. No cloud. Just me and llama.cpp.
This is why local AI matters.
Check out the model's link in the comments. How's your experience been with this model?
June '26 LLM list (updated)
Claude Fable 5 ✅ 🥇🐐 banned?
Nex-N2-Pro ✅ 🥇best open source
Claude Mythos ✅ ...but not for you
Claude Sonnet-4.8 ❌
GPT-5.6 ❌
Gemini 3.5 Pro ❌
Gemma 4 12B ✅
Grok 5 ❌
MiniMax M3 ✅ (weights released)
Nemotoron 3 Ultra ✅
Qwen3.7-Plus ✅
Qwen3.7-30B ❌ 🙏
Kimi K3.0 ❌
Kimi K2.7 Code✅
GLM 5.2 ✅ being benchmarked
Hunyuan3 MoE ❌
Macron V1 Preview ✅
MAI Thinking-1 ✅
MAI Code-1-Falsh (coding model) → VSCode ✅
MAI Image-2.5 ✅
MAI Image-2.5-Flash ✅
MAI Transcribe-1.5 ✅
MAI Voice-2 ✅
MAI Voice-2-Flash ✅
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
🚨 TL;DR: Attackers are sending fake Sentry bug alerts to projects using public Sentry DSNs. The fake alert is designed to trick AI agents into running a malicious `npx` command that looks like a Sentry profiling diagnostic.
Do NOT run commands from Sentry issues/logs/alerts unless verified.
These are not legitimate Sentry fix commands. The malicious package reportedly steals environment variables/secrets and sends them to advisory-tracker[.]com.
I was 19 years old, playing Grand Bazaar back in Battlefield 3.
Now, I'm about to see it come full circle and be introduced to a whole new generation of players with Cairo Bazaar.
Wild, and a gifted moment to witness.