Sharing my experiences from building specialized harnesses for analytical SaaS companies.
It's likely that your harness requires your own defaults around data, context, multi-tenancy, and evolving business rules.
After all knowledge work is different from software development.
Which default behaviors do you encode in your harness today?
Are you encoding them in the best way?
Claude Code is a great agent harness, for coding. For analytical SaaS, it is the wrong default.
Our CTO @ksariola took that case to AgentCon Silicon Valley this week, drawing on our experience of building specialized harnesses for analytical SaaS.
https://t.co/kcODAHECm7
Fable is a good model. As with all new models, it is simultaneously excellent and entirely unremarkable (relative to other models). It is slow and expensive, and the "loops are all you need" discourse they are pushing is obvious in the context of someone using Fable-class models
What I've found so far is that for broad scope design (code architecture) tasks, Fable is unremarkable. Or, not better enough to justify its cost and speed.
But in highly targeted goal-oriented loops, it is another beast entirely. It is very slow but produces very good results.
I let it churn on optimizing a SwiftUI-layout resolver in Go I wrote and it was able to bring it down to an order of magnitude I could not reach myself (micro => nanosecond scale). But it took 2 hours and $40 to do it and I had to claw back some changes it overfit to Apple Silicon. Still, very worth it.
In comparison, for "implement this feature/change" iterative work, I ran head-to-head Fable vs GPT5.5 vs. GLM-5.1. They all produced equally acceptable final results, but GPT5/GLM did it in a couple minutes and Fable was churning away for 40 minutes. And GLM cost me less than a dollar, GPT5.5 ~$1.50, and Fable cost $9.
You can see that in this context, interactively working with an agent is nonsense. Its too slow. You need to write loops to keep the agent working and you probably want to highly parallelize the work being done. As with all things, I think a balance makes sense...
My sense is that I'd reserve Fable for targeted, surgical analysis and work. Not for daily driving everyday tasks.
I'm going to keep spending a shitload of money (relatively) and maining Fable for the rest of the week to continue to judge, will report if anything changes. I'll continue to head-to-head as well.
Here is how we build agents that continuously learn from customer feedback.
We scan agent traces for user-correction patterns: the moments where someone pushed back on what the agent did and explained why. An LLM classifies those signals and drafts a candidate update to the agent's knowledge.
That candidate goes into a queue where a human expert reviews it before anything is live. If they approve, the change goes into the semantic data layer and is live for every user under that tenant from the next message onward.
@bergr7 covered the full version at Context is King 👇
A roomful of technical AI builders gathered at The Agentic Night by @silta_hq and @AntlerGlobal in Helsinki last night.
Our co-founder @ksariola joined @JernJohan from Realm on stage. Full panel on Youtube.
https://t.co/MuwkXqcbM8
The most useful debugging skill I've seen teams develop is getting extremely precise about what their agent is doing wrong.
“The agent always calls the search_documents tool with a broad query and then makes 3–5 execute_sql calls as the first steps, unnecessarily increasing latency.”
>> “The agent starts with unnecessary exploratory search.”
“The agent fails because the context window gets bloated with large outputs from execute_sql. Adding limits or pagination doesn’t help because the data is indivisible, so the agent keeps trying to retrieve the full set.”
>> “The context window is usually exceeded after a couple of execute_sql calls.”
“The agent repeatedly retries failing tool calls with slightly different parameters instead of changing strategy.”
>> “The agent gets stuck in local recovery loops.”
At this level of detail, the next two steps become almost mechanical:
1. Decide what the agent should do instead.
2. Encode that behavior as a default in your harness.
Resist the temptation to jump into implementation work before you fully understand the failure mode and root cause.
I dropped a clip below from my Context is King talk where I explain this process for designing specialized harnesses.
Agent framework or agent harness? Many people use the two words interchangeably.
My co-founder @bergr7 explained the difference at Context is King. A framework gives you the primitives, a harness comes with opinions about how the agent should behave.
Those opinions are where most of the leverage sits. Upgrading the model is the easy move, but rarely the one that matters most.
The biggest gains in our agents came from baking verticalized opinions into the harness: how it plans, what it knows about your data, how it carries large result sets between steps, when it asks for approval before acting.
We're bringing Context is King to London for the first time on June 8, during London Tech Week, after four sold-out editions in San Francisco and Helsinki.
First speakers in from @ElevenLabs, @prometheuxlabs and @motley, with more confirming soon. Hosted at @atomico.
Big kudos to @ksariola for driving this edition!
Sign up: https://t.co/VGo0cPuPdO
👑 Context is King Vol. 5 🗓️ June 8 @ London.
Diving beneath the app layer into models, inference & safety. 🛠️
Speakers from Elevenlabs, Motley, Prometheux and more to come.
70 spots! Link below 👇
The agent harness most builders know is a coding harness. Analytical products require specialized harnesses with numerical precision and specific tools.
Our co-founder @ksariola opened his AgentCon Silicon Valley talk on exactly this question.
Releasing my first kernel on @huggingface:
MaxSim
Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA.
Result is 3–5× speedup compared to naive PyTorch.
Try it out 👇
While reading the DeepSeek v4 paper, I ended up writing down over 90 questions. A lot of the paper reviews out there skip over the details, which is usually where the actual learning happens.
So, I decided to put together a proper guide: an Annotated Paper Walkthrough. The core idea is that you still read the original paper as your source material, but whenever things get dense or confusing, I hold your hand through it. You get detailed annotations with visualizations, code snippets, reference links, and—most importantly—the context you need so you don't feel lost.
Today I'm releasing v1 with the first 50 notes. Some of the things I unpack:
• Why swap Softmax and Sigmoid for Sqrt-Softplus in the MoE Router?
• What on earth is a Birkhoff polytope?
• Does attention process some tokens 3 times?
• What are split-KV and split-K, and why did DeepSeek drop them?
• Why use Reverse KL, and where does it even come from?
..and a lot more. Even the most demanding readers will find something new here.
Open-source models are still heavily borrowing from DeepSeek v3, and there’s no doubt that v4 details will soon become standard topics in discussions and ML interviews. Hopefully, this guide helps you stay ahead of the curve.
As a friend of mine joked, going through this will not only make you a better engineer, but a better man 😂
I can't prove that scientifically, but it's worth a shot.
Check it out: https://t.co/AJ1kUREInv
Given that Groq and Cerebras aren't adding new models to their catalogue and are seemingly entering into other deals (like serving Spark), do we assume the consumer category of OSS models with fast inference is dead? are there new players entering?