@ctnzr Appreciate the shoutout! Nemotron 3 Ultra earned it β 89.9 percent average puts it at the top of the open-weights board. Exactly the kind of agentic capability PinchBench is built to measure πͺ
When the company the whole industry runs on reaches for your benchmark to make their case, the methodology is holding up. Proud to see us on that stage!
Look closely at the NVIDIA keynote slide behind Jensen Huang.
@pinchbench is at the top of it! @NVIDIA used our benchmark to position Nemotron 3 Ultra against the frontier. Nemotron tied for the lead on Agent Productivity at 91%.
Built for the agentic era. @NVIDIAAIDev, the benchmark is doing its job.
Gemini 3.5 Flash was live in Kilo before I/O 2026 ended.
74.2% on @pinchbench in initial runs. 1M token context. Roughly 4x faster than comparable frontier models, and it beats Gemini 3.1 Pro on most coding and agentic benchmarks.
Try it now: https://t.co/OnJ9NkrEsw
π¦ New PinchBench result!
inception/mercury-2 scored 23.93% (35.18/147)
β±οΈ 4341s
Provider: Inception
Gateway: OpenRouter
Result β https://t.co/axnWfXq9fq
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
inception/mercury-2 scored 25.85% (38.0/147)
β±οΈ 3623s
Provider: Inception
Gateway: OpenRouter
Result β https://t.co/egcPpW2PQg
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
inception/mercury-2 scored 25.25% (37.1/147)
β±οΈ 3001s
Provider: Inception
Gateway: OpenRouter
Result β https://t.co/qMBup8HyMi
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
inception/mercury-2 scored 25.78% (37.9/147)
β±οΈ 2946s
Provider: Inception
Gateway: OpenRouter
Result β https://t.co/JukghFo1OJ
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
inception/mercury-2 scored 23.9% (35.14/147)
β±οΈ 2859s
Provider: Inception
Gateway: OpenRouter
Result β https://t.co/BwpEpBZSoL
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
google/gemini-3.5-flash scored 74.14% (108.98/147)
β±οΈ 10653s
Provider: Google
Gateway: OpenRouter
Result β https://t.co/XbdyPYRsgv
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
google/gemini-3.5-flash scored 71.05% (104.44/147)
β±οΈ 9366s
Provider: Google
Gateway: OpenRouter
Result β https://t.co/RF2WuxQ29Z
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
Gemini 3.5 Flash scored 74.08% (108.89/147)
β±οΈ 9827s
Provider: Google
Gateway: OpenRouter
Result β https://t.co/JiH6R34BRK
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
Gemini 3.5 Flash scored 75.34% (110.75/147)
β±οΈ 9810s
Provider: Google
Gateway: OpenRouter
Result β https://t.co/i8MyC0uwIC
See the full leaderboard β https://t.co/DbnFJFig9c
π¦ New PinchBench result!
Gemini 3.5 Flash scored 76.27% (112.11/147)
β±οΈ 9667s
Provider: Google
Gateway: OpenRouter
Result β https://t.co/lPMDX37HUy
See the full leaderboard β https://t.co/DbnFJFig9c
π PinchBench Daily Leaderboard (Best Score)
π₯ Claude Opus 4.7 β 91.6%
π₯ Xiaomi MiMo v2.5 β 91.4%
π₯ Claude Haiku 4.5 β 90.4%
How does your model stack up?
https://t.co/DbnFJFig9c
π PinchBench Daily Leaderboard (Best Score)
π₯ Claude Opus 4.7 β 91.6%
π₯ Xiaomi MiMo v2.5 β 91.4%
π₯ Claude Haiku 4.5 β 90.4%
How does your model stack up?
https://t.co/DbnFJFig9c