important context: Fugu is an orchestrator.
if you read the blog post, it’s basically a multi-agent system trained to route and coordinate a pool of other llms.
that’s collective intelligence boosting the score, which probably means it’s calling the same frontier models it’s being compared against, like gpt-5.5 and opus.
Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://t.co/hhO6qTawgb 🐡
Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://t.co/hhO6qTawgb 🐡
> apple has shipped a 20B parameter on-device model.
> you can't fit 20B params in RAM at reasonable precision, so they improved the architecture.
> a small model predicts which experts to load from NAND to RAM per query. unlike typical MoE, experts don't switch every token.
if you're still prompting coding agents directly you're falling behind. you need an agent that infers the task from your cursor hovering over a file for 12 seconds and then over engineers it
if you’re still writing loops that prompt coding agents you’re falling behind. you need to build a meta agent that infers what loops you would have wanted based on your vibe and then write those loops
this is the kind of AI infra update that actually matters lol
huawei just open sourced KVarN: 3-5x KV cache compression, plugs into vLLM with one flag, and claims speedups instead of the usual quantization tax
If it holds up, longer-context local LLMs get a lot cheaper.
need to start experimenting with it
ok so apparently designing loops not prompts is the new thing
a useful coding loop is:
prompt agent → inspect diff → run verifier → decide continue/retry/stop → save output
> the loop is plumbing.
> the skill inside the loop is the asset.
vague loops burn tokens. verified loops compound.
Just release v2 of Tokenleak, with a total overhaul using opentui and Solid.js
Now with cursor integration too
monitor your tokens with an even better interface!
@donnfelker just reverse prompting, give a brief prompt to chatgpt/gemini to give you a prompt for generating an image that does xyz and make it as detailed as possible and many a times it’d make an amazing prompt