We’re open sourcing the first benchmark for enterprise AI search: EnterpriseRAG-Bench.
Retrieval is the foundation of every AI agent that works with company data. 1M token context windows and extended thinking are wasted if the agent can't find the right information across your tools.
EnterpriseRAG-Bench evaluates how well an AI agent can navigate hundreds of thousands of Slack messages, contradictory sources, and “official docs” that haven’t been updated in years.
The dataset is 500k synthetic documents and 500 questions across 9 sources like Slack, GitHub, and Google Drive. The corpus was created with thematic clusters, jargon, and ambiguities to reflect how real companies operate.
We benchmarked major AI platforms, popular open source projects, and enterprise search products. Here are some of our findings:
💡 Agent harness matters just as much as retrieval technique - OpenClaw with a BM25 tool cleared nearly all platforms
💡 Recall tracks quite closely with answer correctness - if you can surface the correct information, today’s LLMs can reliably generate the right answer
💡 Onyx was the only open source product at the top of the benchmark - most RAG projects are built for personal use and could not keep pace at 500k docs
This is the most technical work we’ve ever published, and the data has so many insights on how enterprise AI systems perform on company docs.
The results, dataset, and our white paper are all available on GitHub: https://t.co/1hqdkQuGi9
many say file search has "killed vector search"
but our tests show it's more complicated than that:
- hybrid search is faster and more token efficient
- hybrid search better at scale
- file search better on complex, multi-document questions
blog: https://t.co/IRJN7Ek9Zs
@ArtemXTech nice writeup. I'm exploring/writing a piece on FRAG (filesystem rag) where we combine bash commands with BM25 for maximum search capability. Have you given this combo any thought?
Every enterprise AI tool right now is trying to be a chatbot.
This works for maybe 30% of workplace questions.
Most of the time, people just need to find something. A doc, a thread, a spec.
Onyx now does both in a single unified interface.
One input bar, automatic intent detection, instant results for lookups and AI answers for real questions. Sub-400ms on search, sub-1s on classification, 90%+ accuracy on routing.
We created the first agentic RAG benchmark with real workplace questions and data.
- 99 questions that were actually asked by us or our users. For example, “What common pains usually come up in discovery calls with prospects?”
- 220k messy real documents from email, Slack, Github, Linear, Fireflies, Hubspot, and Google Drive.
- 4 independent LLM judges.
- ChatGPT, Claude, Notion AI, and Onyx as competitors.
Onyx outperformed ChatGPT, Claude, and Notion AI by ~2:1. ChatGPT came in a distant second, followed closely by Claude, with Notion AI in the rear.
We’ve published the raw results across different agents (and what we do differently to outperform) in our full blog here: https://t.co/FGvRVFEJen.
Craft represents your Slack, Google Drive, Notion (etc.), as a file system, and gives a coding agent the ability to run bash and python against them.
Compared to RAG or MCP, this allows Craft to work well at 100k+ doc scale.
Try at https://t.co/47344MZpfs
Introducing Craft — Cowork, but over *all* your workplace docs instead of just your desktop.
Craft lets anyone perform complex ad-hoc analysis and build repeatable, always updating dashboards based on that analysis. And it’s all open source.
1/7 🧵Figured out how ChatGPT does web search.
Here's what OpenAI, Claude, and Perplexity are actually doing under the hood (and how we fixed our 60-second search times)
6/7 We rebuilt using this approach with Exa (adding Google PSE and Firecrawl soon). Web search is actually usable now.
If you're building AI search, don't overthink it. The SOTA approach is elegantly simple.
One thing teams don’t talk about enough is the importance of good docs.
We put 2 weeks into revamping our docs from the ground up and immediately saw a 3x on engagement.
Shoutout to @mintlify
If you want to see how we did it, check it out here: https://t.co/LNey1FS1A4