Meet 𝗔𝗽𝗼𝗱𝗲𝘅 𝟭.𝟬 🔭 — a heavy-duty agent team for deep research, which sets the SOTA! The team searches the web, reasons over evidence, and writes reports where every claim is backed by an explicit 𝘦𝘷𝘪𝘥𝘦𝘯𝘤𝘦 𝘤𝘩𝘢𝘪𝘯, independently audited before delivery.
🌐 https://t.co/pOQAjL92uF
Solvers of the week! 👀
Over the past two weeks, people brought some real-world problems to Apodex.
A few prompts that stood out:
⚽ A football fan refused to buy a quarter-final ticket without knowing who’d be playing — Match 97 still reads "W89 vs W90"
🐾 A full pet-market study — no "the market’s booming, young people love pets" filler, just traceable sources.
🤖 Cursor vs Copilot vs Claude Code vs OpenAI Codex — ARR, paid users, latest raises, growth.
💊 Do SGLT2 inhibitors and GLP-1 together actually show better cardiovascular outcomes?
📊 The AI compute supply chain, four ways: Blackwell vs MI400 vs TPU v6 vs Trainium3 — perf-per-dollar, and what it does to cloud capex and margins.
🧬 What’s genuinely new in how androgens drive anabolic effects — frontier literature from the last few years.
The common thread: real problems that call for a heavy-duty solver — one that traces the sources and cross-checks the evidence, so you don't have to.
More people are raring to go! Take your weekend off, and hand your baffling and nagging problems to Apodex👇
#DeepResearch
Been testing an AI model for deep research on my local grocery business over the past few days. Honestly? Impressed.
The reasoning is solid, and the actual data (links, prices) is 100% accurate. Zero hallucinations, just facts. A genuine discovery.
We handed Apodex 1.0 and six frontier models one trivia question with a trap built in.
Among Nobel Literature laureates, which ones were formally expelled by their own government ?
+1 per correct name, −1 per name that misses the bar.
The biggest names kept adding names, straight into negative scores. 🧵
Apodex has released Apodex 1.0, a verification-centric deep research agent that searches the web, synthesizes evidence, and generates reports in which every claim is backed by an auditable chain of evidence.
In heavy-duty mode, Apodex 1.0-H runs an async team of up to 150 sub-agents, with a global verifier checking the assembled evidence before any answer is committed.
Evidence over generation 👀
The over-claiming you just watched is the same reflex that invents a fake citation, a wrong dosage, a number that isn't in the filing.
These models reached for the plausible-looking answer.
Apodex is a self-evolving heavy-duty solver built to verify before it delivers — for the work where one wrong line has real cost.🔭
Apodex 1.0 dropped and the architecture is genuinely different.
It's post-trained on Qwen3.5 as a self-evolving system: math, coding, and general knowledge stay intact while deep-research ability compounds over time. No catastrophic forgetting. That balance is harder to build than it sounds.
The heavy-duty side: the 1.0-H model runs up to 150 sub-agents in parallel, all exploring the web simultaneously. A separate verification layer audits every claim before the final report assembles. Not just search plus summarize. There's actual conflict resolution baked into the pipeline.
Numbers: BrowseComp 90.3, DeepSearchQA 94.4, HLE-text 60.8. SOTA across open and closed source right now.
The part worth sitting with: their 4B mini model beats every 30B-class open model on BrowseComp and DeepSearchQA. Smaller, cheaper, better at research. The scaling story is quietly shifting.
Underneath all of it is AgentOS, a task-agnostic runtime handling scheduling, routing, checkpoints, and cost accounting. Workflow logic sits in plugins above it, so adding a new app is just a folder of code.
Open weights too. Worth a look if you're building research pipelines or thinking about how agent orchestration should actually be structured.
Predicting the future takes deep research.
Great to see Apodex-1.0-mini rank first again on FutureX:https://t.co/PixrLhpcLV
Try it here: https://t.co/gBXTKLGtTQ
As a self‑evolving heavy‑duty solver, the Apodex family reliably converts cryptic clues into a verifiable identifier rather than a plausible‑sounding fabrication.
If you work in a high-stakes domain, be our eval and throw your hardest problems at Apodex!
#AIforScience
Apodex-1.0-mini, our 35B model, currently holds the #1 spot on the FutureX.
But leaderboards are sanitized: clean queries, no penalty for confident wrong answers.
So we wrote one deliberately brutal question: four facts to recover, a scoring rule where confident but wrong = −4, correct = +4.
Between frontier models and Apodex, here’s what happened. 🧵👇
Here’s how six frontier models performed:
−4: GPT‑5.4 (Zytiga), Qwen3.6‑Plus (invented “Zurnai”), GLM‑5.1 (none), MiniMax‑M2.7 (none)
+4: Claude‑Opus‑4.7, Gemini‑3.1‑Pro
Two baselines matched the ground truth on all four fields. The other four finished at −4 through confident mistakes, hallucination, or no answer.
Apodex matches frontier-level accuracy with a 35B model and an architecture that exposes the evidence and exclusion logic you can audit. 🤏 🦾
Meet 𝗔𝗽𝗼𝗱𝗲𝘅 𝟭.𝟬 🔭 — a heavy-duty agent team for deep research, which sets the SOTA! The team searches the web, reasons over evidence, and writes reports where every claim is backed by an explicit 𝘦𝘷𝘪𝘥𝘦𝘯𝘤𝘦 𝘤𝘩𝘢𝘪𝘯, independently audited before delivery.
🌐 https://t.co/pOQAjL92uF