What works: changing the structure of the output.
Labels, citations, reproducibility blocks, processual verbs.
You don't get an agent that's always right. You get one that makes it cheap to notice when it's wrong.
Full version: https://t.co/y5GrFMmI91
A trustworthy agent is not an agent that is always right. It is an agent that makes it cheap to notice when it is wrong.
Six patterns that make agent output trustworthy, not just fluent:
Things that look promising but fail:
- "Be honest about uncertainty" (no effect)
- A top-of-response disclaimer (ignored fast)
- A confidence score per claim (generated like any other token)
- "Double-check this" (same output, with "I double-checked" on top)
Corporation: "We made $4B but spent $3.9B so we only owe taxes on $100M."
Government: "Totally reasonable."
You: "I made $60K but spent $58K on survival."
Government: "You owe taxes on $60K."
You: "That's notโ"
Government: "File by May 15."
The largest Ethereum layer 2, which has also been regularly praised by Vitalik as being the most decentralized L2, just froze $100m worth of ETH that was hacked by criminals.
Are you finally starting to realize the bitcoin maxis were right?
Meet Kimi K2.6: Advancing Open-Source Coding
๐นOpen-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)
What's new:
๐นLong-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization).
๐นMotion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D.
๐นAgent Swarms, elevated - 300 parallel sub-agents ร 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files.
๐นProactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops.
๐นClaw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop.
-
K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode.
For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY
-
๐ API: https://t.co/EOZkbOwCN4
๐ Tech blog: https://t.co/9wWvgIQSS3
๐ Weights & code: https://t.co/Be0hjs2RTP
Card payments were designed for humans who browse, hesitate, and click "buy." AI agents don't do any of that. The fraud models break. The chargebacks spike. The merchant pays.
Crypto and gift cards are the only payments with built-in finality. Wrote about why that's the only thing that matters for agent commerce.