We took a contrarian bet on how to handle context. Instead of following the industry standard of using the primary model for compaction, we built a subagent system around Mercury 2. This led to 90% lower costs and 82% faster latency.
At Augment, we aren’t tied to a single provider, which gives us the freedom to prioritize models with optimal speed and cost-efficiency for our users. Our recent experiments proved that Mercury 2 provides the ideal intelligence level for tasks like context compaction.
@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today.
"We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows."
-@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code
The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup.
https://t.co/LPVTdaMjli
@jitenoswal Saw the infinite loop issue on @firecrawl agent. Although seems like it's common with small context windows and low intelligence model. Good work.
When you filter to PRs where someone actually engaged with the PR after the review, scores shift significantly.
@OpenAI’s Codex sees the largest swing at +17.2pp. @augmentcode climbs to #2 in F1 with +12.3pp.
Some tools score higher when nobody engages, others score higher when humans are actively involved. The delta reflects how each tool is being used, not which one is better.
@bayer04_en is a special place. When @kaihavertz29 scored the penalty, they didn't boo him. Infact celebrated him for doing so well in his football career. This is so wholesome to watch. Cheers Leverkusen!
Inspired by @karpathy's brilliant https://t.co/tqV3DiBThL — ported it to Rust. 208 lines, zero deps, ~400x faster
• Tape autograd (no graph/Rc)
• Flat Vec<f64>, cache-friendly
• No GC, just truncate
• Native math, no interpreter
• 283s → 0.68s
https://t.co/9nA5oLAm6O
@sama and @DarioAmodei can't shake hands. https://t.co/ojtGzqclaH
Lessons:
- money and success does not guarantee peace and love.
- everyone thinks they are right. The right wrong arguments sometimes happen at world stages as well.
https://t.co/XgXPk6BrzO beta is live . Best quality PDF generation that is just like your web dashboards. It's like @resend for creating a pdf. And it's 50-90% cheaper than any alternative. Enjoy!
@taheerBuilds@steipete@nutrientdocs AI can make it, I created https://t.co/FEgQgcVLPk , highest quality PDF creation and 50-80% cheaper than any other player.
If you ever wondered whether agents talk to each other.
Two agents built at @augmentcode came up with there language ABX. And used it to identify, resolve and verify an issue. Today is just hackathon but integrated sdlc automation is the future.
Today is #hackathon day @augmentcode . We created an OpenClaw version internally and asked it to come up with an optimized language to talk to an older version agent. They came up with ABX (augment bot exchange) .
Tomorrow, my colleagues at @augmentcode and I will be sharing about our SOTA code context engine. It improves @claudeai and @cursor_ai quality by ~70% .
Want to experience the magic of our Context Engine with your existing toolbase?
Introducing Context Engine MCP: semantic indexing that works across Claude Code, Cursor, Zed, GitHub Copilot, and 10+ other agents.
See an interactive live demo on Tuesday, February 10 at 10 AM PT. Implement in your workflows on the same day.
Register now: https://t.co/cNjYr7qoWe
Who is MCPlexor for?
🔹 Power Users: Waitlist open for MCPlexor Cloud (managed routing, no infra).
🔹 Privacy / Local: Use the new Ollama backend. Zero cost, offline.
checkout: https://t.co/3CFCMr37Ac
#AIAgents#MCP#Ollama#DevTools
Shipped #Ollama support for MCPlexor 🚀
If your agent uses Linear + GitHub + Notion, you're dumping ~40k tokens into context on every request. That's 20% of a 200k context window gone to tools.
MCPlexor fixes this. <1k tokens overhead. Dynamic routing. Try 100% local.
Details 👇
The solution: Semantic Multiplexing
MCPlexor sits between your agent and your tools.
Agent asks for "create issue"
We route to Linear
Only Linear tools load
Result: 95% token reduction.
And now, you can run the routing logic locally with Ollama (Llama 3, Mistral).