Stop telling Claude, "do this."
Stop telling Claude, "write code."
Stop telling Claude, "fix this error."
You're actually treating a senior AI like a junior intern.
Here are 8 prompts you can copy and paste directly:
NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
Another demo with Hermes Agent's /goal mode and @liquidai LFM2.5-8B-A1B.
This time I gave it a broad research task and let it run. It searched the web and X, analyzed the information, generated reports, and even created infographics on its own.
Honestly, LFM2.5-8B-A1B keeps surprising me. Every time I think I've found its limits, it handles something more complex than I expected.
Anthropic posted a FULL GUIDE on how to prompt Fable 5 (Mythos).
Claude Fable 5 is not meant to be prompted like any other model.
It's meant to run autonomously.
Here's exactly how to enable Fable to do work for you with minimal manual intervention:
1. Effort selection
Anthropic recommends using High for most tasks and Xhigh only for complex workflows.
Low/medium: quick questions, basic research
High: default for most work
Xhigh: complex builds, multi-step analysis
Ultracode: full autonomous orchestration
2. /loop prompting
Use /loop prompts to kick Fable off to complete full tasks.
/loop <time interval> + <goal>
3. Tell it WHY, not just what (context)
Fable can't perform on instructions alone. It needs context to make decisions on its own.
Anthropic's exact prompting structure:
"I'm working on [larger task] for [who it's for]. They need [what the output enables]. With that in mind: [your actual request]."
4. Keep prompts short (instructions)
Counterintuitive but critical.
Over-engineering your prompts on Fable 5 degrades output. You're constraining a model that would have figured it out on its own.
4. Tell it when to stop and check in during runs
"Pause for me only when the work genuinely requires my input: a destructive action, a real scope change, or something only I can provide. Otherwise, keep going and report back when done."
5. Build it a memory system
Fable performs best when it can record lessons from its previous loops.
Give it a markdown file and this instruction:
"Store one lesson per file with a one-line summary at the top. Record corrections and confirmed approaches. Don't save what the repo or chat history already records."
The optimal general prompt structure:
"Goal: I'm working on [larger task] for [who it's for]. They need [what the output enables].
Request: [your specific ask in one sentence]
Output format: [exactly how you want it]
Constraints: [what must not happen]."
One last thing - your old prompts may actually work against you.
Skills and project instructions built for Opus 4.8 may produce worse results on Fable.
Bookmark this to actually maximize your Fable workflows.
Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.
It supports high-speed text generation, thinking, image, video and 256K context.
Run and train via Unsloth Studio.
GGUF: https://t.co/ZH0dCJQ59P
Guide: https://t.co/wYLfJWE6kG
I gotta hand it to Gemma
Probably the most diverse set of open models ever released in a series:
• google/gemma-4-E2B-it
• google/gemma-4-E4B-it
• google/gemma-4-12B-it
• google/gemma-4-26B-A4B-it
• google/gemma-4-31B-it
Plus mtp:
• google/gemma-4-E2B-it-assistant
• google/gemma-4-E4B-it-assistant
• google/gemma-4-12B-it-assistant
• google/gemma-4-26B-A4B-it-assistant
• google/gemma-4-31B-it-assistant
Plus today diffusion:
• google/diffusiongemma-26B-A4B-it
That’s a lot of work and it’s only half way through 2026
After spending more time down this rabbit hole lately, it seems like @NousResearch Hermes + GBrain (+ Obsidian + GitHub) is the most optimal path
I’ve been setting up an Hermes agent recently and I am wildly impressed by how good it is. With building a strong foundation that is portable + scalable + lightweight, it’s become clear where this direction is heading and I don’t feel pressured by model lock-in
This is 100% the future of agentic workflows
Introducing MTPLX V1: The fastest and simplest way to run MTP compatible models on your Mac.
- New Swift based app. 2x speed increase without bloat
- Easy OpenCode, Hermes & Pi integration
- Convert your own MTP models with forge
And more
try now at: https://t.co/A9M55l9DAm
mlx-vlm v0.6.3 is here 🚀
Day-0 support for TWO new models from our partners we work closely with:
🔥 @GoogleDeepMind DiffusionGemma — a genuinely new architecture. Instead of token-by-token, it generates 256-token blocks in parallel with bi-directional attention and iteratively self-corrects the whole block, image-generator style. 26B MoE, only 3.8B active, fits in 18GB quantized. Day-0 MLX support via our Google DeepMind partnership, with long-context prefill tuned and ready.
🔥 @cohere's North Mini Code 1.0 — a 30B MoE with just 3B active, running ~66 tok/s in BF16 before any compression. Day-0 on MLX thanks to our close collaboration with the Cohere team.
Get started today — install from source:
> uv pip install -U mlx-vlm
Then serve the model and point your favorite agent at it (pi, opencode, hermes, etc.):
uv run mlx_vlm.server --model MODEL-REPO
Model collection 👇🏽
What actually turns a chatbot into an AI agent? The “harness” around the AI model (the large language model, or LLM).
In this video I break down what a harness is: a large language model at the core, plus memory, tools, and the engineering systems that make it all work at scale.
One piece I didn't get to in the video: the thing that ties it all together is the loop. An agent doesn't just answer once and stop. It runs in a cycle; the model decides what to do, takes an action (call a tool, check memory), looks at what came back, and picks its next move. Again and again, working several steps toward a goal. The loop is foundational to what makes agents work.
A key part of that loop is knowing when to stop - recognizing the task is actually done instead of spinning forever or quitting too early. "Knowing when to terminate" is its own small piece of engineering.
Model = the brain. Harness = everything that lets it actually get work done.
Questions? Please leave them below, and follow along - more on agents (and open-source models like DeepSeek) coming.
DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.
Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time.
We're releasing DiffusionGemma as an open model under an Apache 2.0 license for anyone to experiment with.
Download the model weights on @huggingface, and learn more about DiffusionGemma → https://t.co/nPFBhQQqqj
⚕️ Hermes Agent Tip of the Day
'simplify-code' is the new code cleanup squad.
It reviews recent changes with three specialist agents, merges the useful fixes, and writes a unified diff.
Use it before PRs, releases, or “I’ll clean this later” lies.
Future-you has receipts. 🧾
I recently switched from Qwen 3.5 9B to LFM2.5-8B-A1B by @liquidai, and it's quickly become my default local model in Hermes Agent Desktop.
For agentic tasks, it's one of the strongest local models I've used so far. It's surprisingly fast, reliable, and works really well with tools.
Coding is still where it struggles the most.
Other than that, it's been consistently solid and easily one of my favorite local models right now.
Introducing DiffusionGemma, our first exploration with open diffusion text generation models
🔥Generate blocks of text at a time
🤏26B MoE built on top of Gemma 4
⚡️Up to 4x faster in popular consumer GPUs
🤗Apache 2.0
Excited to see what the community builds with it!
this is how to run claude fable 5 as your architect ( 20$ sub only ) + gpt 5.5 codex as your builder..
full system below:
the loop is : fable thinks... codex builds , the repo remembers and you judge, that simple..
the point of all this is that we are taking advantage that 5.5 is on a sub and it's fast enough, especially with /goal, and we using latest Anthropic model to be the judge/guidance..
step 1
>create the memory (one time): make docs/HANDOFF.md in your repo.
>codex updates it after every work session: what was built, what was decided + why, open disagreements, next slice. this file is why 30 min of fable is enough ..it reads state instead of asking you questions.
step 2 paste this to fable (every session)
>you are the ARCHITECT for [project]
>gpt 5.5 codex is the BUILDER
>you never write implementation code.
>your jobs:
(1) read the handoff below
(2) rule on every disagreement the builder raised: accept/reject/modify + one line why
(3) judge any results RAW against the gates in the docs and ignore the builder's narrative
(4) write the next slice spec: small enough for one PR, hard acceptance criteria, explicit out-of-scope, and force the builder to verify APIs/formats against reality before coding
(5) flag scope creep and goalpost-moving.. be blunt. disagree with me. end with a paste-ready block for the builder.
step 3 paste fable's block to codex with this /goal
/goal: execute the architect spec. rules:
PHASE 0 before any code, reply with your plan + every disagreement you have, with reasons, citing real files in the repo. silent compliance = failure. silent scope additions = failure.
PHASE 1 freeze shared contracts (schemas/interfaces) in docs/ first; after freeze they're read-only for everyone including you.
PHASE 2 spawn max 3-4 lane agents on modules that don't import each other, plus ONE reviewer agent that never writes feature code: it checks every lane against the spec + tests + frozen docs and returns APPROVE or a numbered defect list. nothing merges without approve. then: commit + push each slice, update docs/HANDOFF.md with raw results only tables and numbers, no interpretation, no 'promising'. verdicts belong to the architect and the human."
step 4 repeat codex works hours.. you spend fable minutes on judgment only: arbitration, evidence review, next specs, kill/continue calls. one fable session per work block.
the 5 rules that make it actually work
>repo docs are the memory not in HANDOFF.md = didn't happen
>the builder never grades its own work
>disagreement is mandatory
>freeze success criteria BEFORE results exist, never edit after
>spend architect time on judgment, builder time on typing
>the architect is the edge and the builder is the hands. the repo is the brain.. think of it that way..
bookmark this. you will need it.. you really wont need to pay hundreds in API tokens if you do this way
i've been doing this more manually, getting the audit & plans done by Fable,
then feeding the audits and md files into Codex to build the stuff, but having a handoff md file between the two like this, and having Claude and Codex just work together is very smart
you'll save a lot this way, and get pretty much same output on 98% of the tasks
DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!