๐ Anthropic just changed the game.
Yesterday they launched the Mythos-class โ a new tier above Opus โ with two models: Claude Fable 5 and Claude Mythos 5.
Quick primer on Claudeโs tiers: Haiku (fast & light) โ Sonnet (balanced) โ Opus (frontier) โ and now Mythos sits on top
Deployed through Project Glasswing with the US government โ helping cyber defenders secure critical software.
Use cases Iโm watching: long-running autonomous coding agents, deep research workflows, enterprise knowledge work, and defensive security at scale.
The risks:
Its biggest danger zone is cybersecurity โ so Anthropic shipped it with conservative safeguards: sensitive queries get routed to Opus 4.8 instead, triggering in under 5% of sessions.
And Mythos 5?
Same underlying model, safeguards lifted in some areas.
What Fable 5 can do:
State-of-the-art on nearly all tested AI benchmarks โ exceptional in software engineering, knowledge work, vision, and scientific research. The longer and more complex the task, the bigger its lead. It even beat Opus 4.8 by over 10% on some benchmarks.
Step-By-Step LLM Engineering Projects Roadmap
- Build a tokenizer
- Learn embeddings
- Implement RoPE / ALiBi
- Hand-wire attention
- Build MHA
- Build a Transformer block
- Train a mini-former
- Compare objectives
- Build sampling
- Speculative decoding
- KV cache
- MQA / GQA / MLA
- Long context
- FlashAttention
- Hardware budgets
- Toy MoE
- Sparse model trade-offs
- State-space / linear attention
- Diffusion language models
- Data pipelines
- Synthetic data
- Scaling laws
- SFT / DPO / RLHF / GRPO
- Quantization
- Serving stacks
- Eval harnesses
- RAG
- Tool use / agents
- Vision-language adapters
- Interpretability
- Red-team suite
- Full capstone model system
One request:
Choose an Opensource AI lab when you make it
Opensource is where humanity gets to keep the tools
DM me when you've made it ;)
5 months ago, I was on a panel with @maximelabonne and @Chip Huyen.
One question has been on my mind ever since:
โHow should someone approach getting into AI engineering?โ
The biggest mistakes in AI engineering is thinking you need to pick between being a specialist or a generalist.
Thereโs no correct answer.
It's all about the life you want for yourself.
There's two paths:
1/ Go deep
Become world-class at one thing.
Maybe:
Fine-tuning
Inference optimization
RL
GPU systems
Retrieval
Evaluation
Usually this leads toward:
Bigger companies
More niche roles
Higher compensation
Smaller job pools
Strong technical reputation
2/ Go wide
Become highly versatile.
You understand:
Models
Backend systems
Infrastructure
Data pipelines
Monitoring
Product constraints
Deployment
Agents
You may not be the best at one thing...
But you can build entire systems end-to-end.
This path tends to align much more with:
Startups
Founding engineering
Product building
Entrepreneurial environments
Agentic engineering
Interestingly, the industry is already moving in this direction.
Companies like @Anthropic have popularized roles such as โTechnical Staff.โ
They sit at the intersection of research, engineering, and product.
Iโve mostly gone wide.
Iโm deeply interested in AI...
But I also spent years learning:
Software engineering
Infrastructure
Ops
Data engineering
Distributed systems
Because once AI can help implement many individual components, the leverage moves toward understanding systems.
This doesnโt mean specialization stops mattering.
The highest leverage engineers over the next decade will likely be T-shaped:
Deep in one area
Broad enough to reason across the full stack
There is no objectively โbetterโ path.
Only tradeoffs.
You should optimize for the kind of work and life you enjoy.
Both paths can compound extremely well over time.
P.S. This systems-first way of thinking is exactly how we approach production AI systems inside our Agentic AI Engineering course with @towards_AI
If you want to learn how modern AI systems are designed end-to-end, check it out here: https://t.co/QEuz9UaAkl
@TheOSSObserver@_avichawla One would be latency, while RAGs make available, additional resources at inference time adding a fractional overhead, fine-tuning seeks to improve at train time and not necessarily inference time.
We just open-sourced the full @aiDotEngineer workshop!
You can clone it and run everything yourself...
โ https://t.co/O7UXJG7sE4
You'll build:
โข A Deep Research Agent (grounded search + YouTube analysis)
โข A LinkedIn writing workflow (generate โ review โ edit loops)
โข An evals layer to measure quality instead of guessing
But this isnโt just a set of scripts.
Itโs a full multi-agent system built around:
โข MCP servers (tools, resources, prompts)
โข Structured outputs with Pydantic
โข Tool-use agents deciding their own flow
โข End-to-end pipelines you can actually inspect
You can go through it in 3 steps:
1. Watch the full ~2-hour workshop
2. Run the system on real inputs
3. Rebuild everything yourself with 24 guided tickets
But hereโs the interesting part...
You donโt rebuild it manually.
You do it using agentic engineering.
We set up:
โข A software engineer agent
โข Claude Code subagents
โข An orchestrator skill
This takes all 24 tickets and:
โข Implements them
โข Tests them
โข Iterates step by step
So youโre learning how agents work and how to build with agents as collaborators.
This is the same system we use and teach in our Agentic AI Engineering course...
Just compressed into ~2 hours.
If you read the code, youโll understand more than watching 10 demos.
Link: https://t.co/O7UXJG7sE4
Millions of people use ChatGPT, Claude, and Gemini every day.
But almost nobody understands what actually happens between hitting Enter and seeing words appear on the screen.
So I'm providing the entire pipeline into one clean visual ๐
Hereโs the breakdown:
โ ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฒ๐ฟ
Your input isnโt processed as words.
Itโs split into tokens.
โgravityโ โ ["grav", "ity"]
Each token โ a numeric ID.
Thatโs why LLMs struggle with letters โ they never truly โseeโ them.
โ ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด ๐๐ฎ๐๐ฒ๐ฟ
Every token becomes a high-dimensional vector (e.g. 4096 dims).
This is where meaning begins:
Similar words โ closer in vector space.
โ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ ๐๐น๐ผ๐ฐ๐ธ๐ (ร๐ก)
The real intelligence lives here.
โข Self-attention (Q, K, V) โ tokens โlookโ at each other
โข FFN โ processes each position
โข Repeated dozens of times
This is how context is built.
โ ๐๐ฉ ๐๐ฎ๐ฐ๐ต๐ฒ
The most important optimization.
Instead of recomputing everything โ
the model stores past attention (K, V)
โก Faster generation
โ ๏ธ But memory grows with sequence length
โ This is the real bottleneck
โ ๐ฆ๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด ๐ฆ๐๐ฟ๐ฎ๐๐ฒ๐ด๐
The model doesnโt pick words.
It outputs probabilities.
How you sample = how it behaves:
โข Greedy โ predictable
โข Top-K โ limited randomness
โข Top-P โ balanced
โข Temperature โ creativity control
Same model. Completely different outputs.
โ ๐ฆ๐ฝ๐ฒ๐ฐ๐๐น๐ฎ๐๐ถ๐๐ฒ ๐๐ฒ๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด
A hidden speed hack.
โข Small model predicts ahead
โข Large model verifies in parallel
โ If correct โ multiple tokens generated instantly
This is how responses feel fast.
โ ๐๐ฒ๐๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฒ๐ฟ + ๐ฆ๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด
Token IDs โ converted back to text
That โtypingโ effect you see?
Itโs not UI animation.
Itโs literally token-by-token generation.
โ ๏ธ Two things most people miss:
โ ๐ฃ๐ฟ๐ฒ๐ณ๐ถ๐น๐น phase = compute-heavy
โ ๐๐ฒ๐ฐ๐ผ๐ฑ๐ฒ phase = memory-heavy
Different problems. Different optimizations.
This is why AI inference is still expensive.
Iโve studied these systems deeplyโฆ
and honestly, the more you learn, the crazier it gets.
Itโs not just AI.
Itโs engineering at insane scale.
Which part of this pipeline surprised you the most? ๐
SLO vs SLI vs SLA
An ๐ฆ๐๐ (Service Level Indicator) measures whatโs actually happening in your system; like request latency, error rate, or uptime. Itโs the raw signal that tells you how your service is performing.
An ๐ฆ๐๐ข (Service Level Objective) defines the acceptable range for an SLI; like โ99.9% of requests succeed within 200ms.โ Itโs what your team aims to achieve to maintain reliability.
An ๐ฆ๐๐ (Service Level Agreement) is a formal contract with users or customers, often tied to penalties if targets arenโt met. It defines the consequences, not just the goal.
And when failures occur, thatโs where postmortems should close the loop.
But hereโs the gap:
Most postmortems are written after the fact. Digging through Slack. Rebuilding timelines. Guessing what actually happened. Which means theyโre slow, and painful. So they get deprioritised, half-finished, or never written at all.
Thatโs not just a process problem. Itโs an experience and tooling problem.
incident[.]ioโs new postmortem workflow flips this:
โ It builds the draft for you from real incident data
โ So you start with context, not a blank page
โ It turns postmortems into a collaborative, structured workflow thatโs actually easy to write and complete
Worth a read โ https://t.co/bZ3jEokLnf
SLIs tell you what happened.
SLOs define what should happen.
SLAs define what happens if you fail.
Postmortems are where you make sure it doesnโt happen again.
What else would you add?
โโ
โป๏ธ Repost to help others learn and grow.
๐ Thanks to @incident_io for sponsoring this post.
โ Follow me ( Nikki Siapno ) to improve at system design.
A tricky LLM interview question:
You're fine-tuning a model for Python code generation. The data was generated using the strongest LLMs like Opus/GPT.
But the fine-tuned model performs better when you use a weaker teacher instead.
How could this happen?
(answer below)