Chris Weaver @ChrisWeves - Twitter Profile

Pinned Tweet

Chris Weaver

@ChrisWeves

3 months ago

https://t.co/wDam7qCoW4

0

11

2

10

804

ChrisWeves retweeted

Yuhong Sun

@Yuhong_S

about 1 month ago

We’re open sourcing the first benchmark for enterprise AI search: EnterpriseRAG-Bench. Retrieval is the foundation of every AI agent that works with company data. 1M token context windows and extended thinking are wasted if the agent can't find the right information across your tools. EnterpriseRAG-Bench evaluates how well an AI agent can navigate hundreds of thousands of Slack messages, contradictory sources, and “official docs” that haven’t been updated in years. The dataset is 500k synthetic documents and 500 questions across 9 sources like Slack, GitHub, and Google Drive. The corpus was created with thematic clusters, jargon, and ambiguities to reflect how real companies operate. We benchmarked major AI platforms, popular open source projects, and enterprise search products. Here are some of our findings: 💡 Agent harness matters just as much as retrieval technique - OpenClaw with a BM25 tool cleared nearly all platforms 💡 Recall tracks quite closely with answer correctness - if you can surface the correct information, today’s LLMs can reliably generate the right answer 💡 Onyx was the only open source product at the top of the benchmark - most RAG projects are built for personal use and could not keep pace at 500k docs This is the most technical work we’ve ever published, and the data has so many insights on how enterprise AI systems perform on company docs. The results, dataset, and our white paper are all available on GitHub: https://t.co/1hqdkQuGi9

Yuhong_S's tweet photo. We’re open sourcing the first benchmark for enterprise AI search: EnterpriseRAG-Bench.

Retrieval is the foundation of every AI agent that works with company data. 1M token context windows and extended thinking are wasted if the agent can't find the right information across your tools.

EnterpriseRAG-Bench evaluates how well an AI agent can navigate hundreds of thousands of Slack messages, contradictory sources, and “official docs” that haven’t been updated in years.

The dataset is 500k synthetic documents and 500 questions across 9 sources like Slack, GitHub, and Google Drive. The corpus was created with thematic clusters, jargon, and ambiguities to reflect how real companies operate.

We benchmarked major AI platforms, popular open source projects, and enterprise search products. Here are some of our findings:
💡 Agent harness matters just as much as retrieval technique - OpenClaw with a BM25 tool cleared nearly all platforms
💡 Recall tracks quite closely with answer correctness - if you can surface the correct information, today’s LLMs can reliably generate the right answer
💡 Onyx was the only open source product at the top of the benchmark - most RAG projects are built for personal use and could not keep pace at 500k docs

This is the most technical work we’ve ever published, and the data has so many insights on how enterprise AI systems perform on company docs.

The results, dataset, and our white paper are all available on GitHub: https://t.co/1hqdkQuGi9

4

20

6

9

1K

Chris Weaver

@ChrisWeves

2 months ago

many say file search has "killed vector search" but our tests show it's more complicated than that: - hybrid search is faster and more token efficient - hybrid search better at scale - file search better on complex, multi-document questions blog: https://t.co/IRJN7Ek9Zs

ChrisWeves's tweet photo. many say file search has "killed vector search"

but our tests show it's more complicated than that:
- hybrid search is faster and more token efficient
- hybrid search better at scale
- file search better on complex, multi-document questions

blog: https://t.co/IRJN7Ek9Zs https://t.co/CutySTzIjz

0

1

0

61

Chris Weaver

@ChrisWeves

3 months ago

@ArtemXTech nice writeup. I'm exploring/writing a piece on FRAG (filesystem rag) where we combine bash commands with BM25 for maximum search capability. Have you given this combo any thought?

1

2

0

2K

Chris Weaver

@ChrisWeves

4 months ago

Every enterprise AI tool right now is trying to be a chatbot. This works for maybe 30% of workplace questions. Most of the time, people just need to find something. A doc, a thread, a spec. Onyx now does both in a single unified interface. One input bar, automatic intent detection, instant results for lookups and AI answers for real questions. Sub-400ms on search, sub-1s on classification, 90%+ accuracy on routing.

1

7

2

0

187

Chris Weaver

@ChrisWeves

4 months ago

We created the first agentic RAG benchmark with real workplace questions and data. - 99 questions that were actually asked by us or our users. For example, “What common pains usually come up in discovery calls with prospects?” - 220k messy real documents from email, Slack, Github, Linear, Fireflies, Hubspot, and Google Drive. - 4 independent LLM judges. - ChatGPT, Claude, Notion AI, and Onyx as competitors. Onyx outperformed ChatGPT, Claude, and Notion AI by ~2:1. ChatGPT came in a distant second, followed closely by Claude, with Notion AI in the rear. We’ve published the raw results across different agents (and what we do differently to outperform) in our full blog here: https://t.co/FGvRVFEJen.

ChrisWeves's tweet photo. We created the first agentic RAG benchmark with real workplace questions and data.

- 99 questions that were actually asked by us or our users. For example, “What common pains usually come up in discovery calls with prospects?”
- 220k messy real documents from email, Slack, Github, Linear, Fireflies, Hubspot, and Google Drive.
- 4 independent LLM judges.
- ChatGPT, Claude, Notion AI, and Onyx as competitors.

Onyx outperformed ChatGPT, Claude, and Notion AI by ~2:1. ChatGPT came in a distant second, followed closely by Claude, with Notion AI in the rear.

We’ve published the raw results across different agents (and what we do differently to outperform) in our full blog here: https://t.co/FGvRVFEJen.

0

2

0

91

Chris Weaver

@ChrisWeves

4 months ago

Craft represents your Slack, Google Drive, Notion (etc.), as a file system, and gives a coding agent the ability to run bash and python against them. Compared to RAG or MCP, this allows Craft to work well at 100k+ doc scale. Try at https://t.co/47344MZpfs

0

2

0

207

Chris Weaver

@ChrisWeves

4 months ago

Introducing Craft — Cowork, but over *all* your workplace docs instead of just your desktop. Craft lets anyone perform complex ad-hoc analysis and build repeatable, always updating dashboards based on that analysis. And it’s all open source.

3

5

1

3

1K

ChrisWeves retweeted

Yuhong Sun

@Yuhong_S

8 months ago

We matched ChatGPTs deep research, but with open weight models. In a blind head-to-head, win-rate was exactly 50%.

3

9

2

0

524

Chris Weaver

@ChrisWeves

8 months ago

we built the most beautiful open-source AI chat 😎

1

2

0

166

Chris Weaver

@ChrisWeves

8 months ago

For those interested in how it works, check out the implementation https://t.co/JY3PYMJBNp

1

0

98

Chris Weaver

@ChrisWeves

8 months ago

We just added MCP support to Onyx. It’s awesome

1

3

0

169

Chris Weaver

@ChrisWeves

9 months ago

7/7 Checkout the implementation here https://t.co/tOyXhgyIoN Or try for free here https://t.co/RD5xl303ww

0

2

0

80

Chris Weaver

@ChrisWeves

9 months ago

1/7 🧵Figured out how ChatGPT does web search. Here's what OpenAI, Claude, and Perplexity are actually doing under the hood (and how we fixed our 60-second search times)

ChrisWeves's tweet photo. 1/7 🧵Figured out how ChatGPT does web search.

Here's what OpenAI, Claude, and Perplexity are actually doing under the hood (and how we fixed our 60-second search times)

2

5

1

151

Chris Weaver

@ChrisWeves

9 months ago

6/7 We rebuilt using this approach with Exa (adding Google PSE and Firecrawl soon). Web search is actually usable now. If you're building AI search, don't overthink it. The SOTA approach is elegantly simple.

1

0

88

ChrisWeves retweeted

Yuhong Sun

@Yuhong_S

9 months ago

One thing teams don’t talk about enough is the importance of good docs. We put 2 weeks into revamping our docs from the ground up and immediately saw a 3x on engagement. Shoutout to @mintlify If you want to see how we did it, check it out here: https://t.co/LNey1FS1A4

Yuhong_S's tweet photo. One thing teams don’t talk about enough is the importance of good docs.

We put 2 weeks into revamping our docs from the ground up and immediately saw a 3x on engagement.

Shoutout to @mintlify

If you want to see how we did it, check it out here: https://t.co/LNey1FS1A4 https://t.co/50pRR5eY1X

7

34

4

3

3K

Chris Weaver

@ChrisWeves

Last Seen Users on Sotwe

Trends for you

Most Popular Users