Is Matryoshka dead?
Every frontier embedding model uses MRL.
But we tested it across a full hyperparameter sweep and it's lossy at every dimension.
A small projection matrix trained on top of zembed-1 beats MRL across the board. Including at full dim.
Results:
zembed-1 @ 160 dims > OpenAI @ 1536 dims
zembed-1 (no MRL) > voyage-4 (MRL)
@ZeroEntropy_AI
Super excited to share that @ZeroEntropy_AI is now a provider in the @vercel AI SDK
If you're already building with `ai`, our models are one import away.
→ zerank-2 for reranking
→ zembed-1 for embeddings
→ more models to come 👀
Happy shipping!
The excellent zerank-2 reranker model by @ZeroEntropy_AI is now fully compatible with Sentence Transformers, no `trust_remote_code=True` needed.
It's 4B and cc-by-nc-4.0, and performs very well.
I'm quite fond of their training methodology, I'll explain in the 🧵
On Gemini Flash 3.5 pricing
Prices for mini/Flash models have been dramatically increasing with every release
4x increase on input and more than 8x on output between Gemini 1.5 Flash and Gemini 2.5 Flash
And now another 3x increase
The world needs more cost efficient, lightweight models to run at the scales needed, especially for task specific workflows that don't need frontier models
Token efficiency and intelligence compression are what's needed
zembed-1 is finally here!
🔥 The world's best embedding model, by @ZeroEntropy_AI
It outperforms @OpenAI , @GeminiApp , @Alibaba_Qwen , and Voyage's latest embeddings on 100+ languages, and across verticals.
Available now via our API/SDK, @huggingface, and @awscloud Marketplace.
Full launch post in the thread for benchmarks and more about our secret sauce 👀
We're building the entire retrieval stack... and we're just getting started.
🤫 PS: We're giving out free credits to try it, just comment on the post or DM me!
We just built a free tool to ask questions over the 2025 @NeurIPSConf research papers.
Try it out at neurips dot zeroentropy dot dev
No signup, no credit card, just the best way to learn more about this year's papers!
It’s always amazing to see small teams outperform companies with $100M+ in funding, and even more amazing when you get to be a part of it. 😅
Stoked that we were able to support @ZeroEntropy_AI on training their state of the art reranker model family!
Read here about the zerank family: https://t.co/wjnhNioePu
@ghita__ha@npip99
We are very excited to release zerank-2, @ZeroEntropy_AI 's newest reranker model. 🔥
It shows major improvement on the 5 most common RAG failure modes below.
Existing rerankers consistently fail on seemingly “simple” tasks:
🔢 Comparing numbers and date: “Biggest deals closed after 04/2024.”
🗄️ Aggregation: “Top 10 objections of customer X?”
🌍 Multilingual: Major pain point, especially non-English to non-English.
🙏 Instruction-Following: “Find the *counterargument* of the claim in the transcript”
🥇 Calibrated scores: You ask "what should I cook for dinner?", and "I am allergic to nuts" scores too low for your threshold.
Many rerankers overfit public benchmarks, and don’t generalize to these real issues. zerank-2 outperforms existing rerankers considerably on all of these failure modes, in real production environments.
With zerank-2, you get:
* 15% improvement vs Cohere rerank 3.5 on Arabic/Hindi (Miraql dataset)
* +12% NDCG@10 on sorting tasks (new open-sourced eval set)
* +7% vs Gemini Flash on instruction-following (MAIR dataset)
* $0.025/1M tokens, 150ms p90 latency at 100KB
🤗 We are open-sourcing the model weights, along with new challenging eval sets on @huggingface. Our Elo-inspired training methodology is already open-source!
We're starting a series of technical deep dives to explain various failure modes zerank-2 fixes, with concrete prod examples, methodologies, and benchmarks.
First technical deep dive in the comments.
next week will be extra @elastic-packed in SF
monday meetup: https://t.co/mEH5z2Bc15
* @ghita__ha, @ZeroEntropy_AI: search tools for efficient AI agents
* jesse, @fintoolx: LLMs and the next generation of financial search
* @joshnkeezy, @reductoai: building a vision-first RAG pipeline with reducto and elasticsearch