We’re working with Termbench to get our submission verified.
As part of the process, the team has agreed to take down our older submissions.
SPOILER ALERT: Once verified, this will be our highest score yet, beating all our previous submissions!
I built the open source codex app!
One of my favorite features is “Workspaces” - reusable multi-chat layouts you can save and switch between depending on the task.
Built on top of @forgecodehq
This is why ForgeCode invests heavily in prompt caching.
In one workspace over the last 7 days on Opus 4.7
- 407M: input tokens
- 382.9M: cache-read tokens
- 98.1%: cache read ratio
- 22.9×: write amortization
Amortization = tokens read back per token written to cache.
So every 1 token cached was reused ~23 times.
At public API pricing, that’s ~$2,035 without caching vs ~$333 with 5-minute caching.
~$1.7K saved in input-token cost alone.
Is the real bottleneck for AI agents the model—or the harness?
Terminal-Bench 2.0 suggests it might be the latter. ForgeCode ranks #1 among open-source harnesses, showing how much performance you can unlock without changing the model—just by improving how it uses tools.
In ForgeCode’s case, the gains come from better tool orchestration and execution.
Learn more: https://t.co/hN2RZZJDCs
Days ago, I was checking out the once VERY relevant terminal bench. I kept seeing this Agent called @forgecodehq always in the top 5.
Decided to try it and never looked back. It's now my daily ai-enhanced terminal.
Feature packed, but not stuffed to the gills. I love it!
After 2 months our #1 rank on Termbench was finally broken by a worth competitor (by 0.2%) 🙌
If someone from @OpenAI can help us in getting unrestricted API access, that'd be great! We'd love to run to on @forgecodehq and share notes 😇
Configure a symbol + conversion rate to display costs in your local currency.
Useful if you need a more accurate sense of the real value of the work being produced.
Continuing our commitment to open-sourcing our TermBench improvements, we’re shipping another update.
In `v2.8.0`, the `task` tool is now publicly available.
`task` enables the main agent to delegate work to specialized, user-defined agents, keeping the context window focused and efficient.
Example: hit a Rust compile error? Invoke a Rust-specific sub-agent to handle compiler runs, rules, and debugging in isolation.
Enabled by default.
https://t.co/7Zj1Gdavx5
File edit tooling was heavily optimized to improve performance on TermBench 2.0.
In our latest release (v2.7.0), the multi-edit tool is now GA.
https://t.co/uVGYewg55A