Mixedbread @mixedbreadai - Twitter Profile

Pinned Tweet

3 months ago

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

mixedbreadai's tweet photo. Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages.

Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos...

You can now get the best retrieval performance on your data, no matter its format. https://t.co/PYT3Ryerxm

35

945

120

753

201K

Mixedbread @mixedbreadai

1 day ago

Read more here: https://t.co/Yx9xq9meSt

0

24

0

25

1K

Mixedbread @mixedbreadai

1 day ago

By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain than you think: you can extract sparse Latent Terms from them. And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.

6

176

22

169

33K

Mixedbread @mixedbreadai

1 day ago

Having language-adjacent properties means that tools designed for lexical approaches "just work". BM25, always refusing to exit the scene, is strong here: applied over the Latent Terms extracted from nomic-embed-v1.5, it results in a near state-of-the-art sparse retriever.

mixedbreadai's tweet photo. Having language-adjacent properties means that tools designed for lexical approaches "just work".

BM25, always refusing to exit the scene, is strong here: applied over the Latent Terms extracted from nomic-embed-v1.5, it results in a near state-of-the-art sparse retriever. https://t.co/Ss3vqu47w4

1

23

0

4

2K

Mixedbread @mixedbreadai

7 days ago

docs: https://t.co/4KcMPBFELZ

0

8

0

1

562

Mixedbread @mixedbreadai

7 days ago

New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives to perform at their best.

mixedbreadai's tweet photo. New: grep for exact matching

grep → keyword / regex matching
search → fine-grained semantic retrieval

Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription).

Give your agents both retrieval primitives to perform at their best.

2

64

5

37

5K

Mixedbread @mixedbreadai

9 days ago

View and export traces directly from your dashboard:

0

8

0

2

611

Mixedbread @mixedbreadai

9 days ago

Feature: Native agentic search on Mixedbread Search with auto-planning, exploration, and multi-hop reasoning across documents. Built for: - evidence discovery - exhaustive search - cross-document reasoning → Topped MADQA @snowflake with 93.4% accuracy across 18,000 PDF pages.

mixedbreadai's tweet photo. Feature: Native agentic search on Mixedbread

Search with auto-planning, exploration, and multi-hop reasoning across documents.

Built for:
- evidence discovery
- exhaustive search
- cross-document reasoning

→ Topped MADQA @snowflake with 93.4% accuracy across 18,000 PDF pages.

1

81

13

47

9K

Mixedbread @mixedbreadai

9 days ago

Steer search with more instructions. Docs: https://t.co/5JnkhrFHL9

1

11

0

786

Mixedbread @mixedbreadai

11 days ago

New: Traces for Mixedbread agentic search See every search call an agent makes directly in the dashboard, and tune instructions for better retrieval quality.

0

48

9

21

7K

Mixedbread @mixedbreadai

23 days ago

Read more here: https://t.co/pfj5KP92Uu

0

10

0

4

1K

Mixedbread @mixedbreadai

23 days ago

Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.

mixedbreadai's tweet photo. Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance.

It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules.

+11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3.

Available today in preview in Mixedbread.

5

136

18

72

25K

Mixedbread @mixedbreadai

2 months ago

Mixedbread search's ultimate aim is to power all workflows, no matter their modality or language. Try it for your own knowledge-intensive tasks today: https://t.co/FlUA03fS8b

0

11

1

4

2K

Mixedbread @mixedbreadai

2 months ago

You can read more about this in our blog post, where we present more detailed benchmark results and elaborate on the nature of the three benchmarks, and why we're very proud to be topping all three of them. https://t.co/I0mjNnPPl9

1

17

2

18

3K

Mixedbread @mixedbreadai

2 months ago

So what is the Oracle gap? Optimising agentic systems is complicated. There are many individual components you need to get just right. Retrieval is one of those components, and its impact is best measured by the Oracle gap: the difference between the performance of the same system between an imperfect retriever and perfect, fully-relevant results that would be provided by a so-called Oracle.

1

12

2

4

3K

Mixedbread @mixedbreadai

2 months ago

Agents are increasingly performing knowledge work: Deep Research, generating financial reports, reasoning across historical knowledgebases... Many high-quality benchmarks now focus on evaluating such tasks, among which BrowseComp-Plus, @databricks's OfficeQA, or @Snowflake's MADQA, released just last week.

1

22

1

5

3K

Mixedbread @mixedbreadai

2 months ago

For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.

mixedbreadai's tweet photo. For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time.

We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks. https://t.co/K11VvMLigO

4

147

22

122

80K

mixedbreadai retweeted

Omar Khattab

@lateinteraction

3 months ago

I've been eagerly awaiting this release from the @mixedbreadai folks. They're world-leading experts in late interaction retrieval. And today they remind us that late interaction done well makes all your favorite embedding models look like they don't work.

lateinteraction's tweet photo. I've been eagerly awaiting this release from the @mixedbreadai folks. They're world-leading experts in late interaction retrieval.

And today they remind us that late interaction done well makes all your favorite embedding models look like they don't work. https://t.co/NLnTKtbF94

8

199

23

87

22K

Mixedbread @mixedbreadai

3 months ago

Find out more about the model and its performance here: https://t.co/Jp9xk4a09u

0

37

0

9

6K

Mixedbread @mixedbreadai

3 months ago

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

35

945

120

753

201K

Mixedbread @mixedbreadai

3 months ago

Wholembed v3 is available immediately through Mixedbread Search. You can try it on our platform now, for free: New users get 2M free tokens to get started. Startups can receive much more through our partnered accelerator programs with Vercel and TinyFish. https://t.co/koj5FFazp2

1

49

0

13

7K

Mixedbread

@mixedbreadai

Last Seen Users on Sotwe

Trends for you

Most Popular Users