Since starting https://t.co/D7GG7t4fs3, we have been obsessed with building the most efficient retrieval infrastructure for a new kind of user: agents.
Thanks to our partners running some of the largest agent search API workloads, we've made real progress on building the most efficient engine for the new workload.
For our partners serving agent-driven query traffic at scale, it lands as a direct cut in cost to serve compared to legacy engines: more queries per node, and fewer nodes for the same load.
I'm proud of our cracked founders Henning Baldersheim (CTO) and Jon Marius Venstad who have lead this effort. They have spent months on improving rank-safe top-k query processing, low-level posting optimizations, and lots of profiling to find the hotspots from the new workload.
I had a lot of fun analyzing how GPT-5 searches in BrowserComp-Plus.
- Long queries with many terms
- Keyword-oriented with query operators like phrase, "site:", - etc.
- 98% of the traces contains at least one phrase search
https://t.co/fEX2edQEmX
"It is tempting to treat hybrid search as something you can tune once: pick a merge algorithm, choose a lexical/semantic weight, and ship it - but there is no globally correct merge strategy"
https://t.co/KgucsXGaKr
You know me as the BM25 guy, but embeddings are cool too.
New post from the @HornetDev team just dropped. ANN tuning at 100M scale, covering embedding bias, graph connectivity, and quantization ceiling
https://t.co/aPWYLXiGtK
On the cost of learning.
Wrangling with large embedding datasets is not trivial and the feedback loop is slow. We take on this to be able to better advise our partners and customers so they don’t have to.
https://t.co/Pbiczk9wQn
Big day at @HornetDev HQ. The vision we set out with is taking shape.
Huge credit to the team for turning bold bets into something real. An exceptional team delivering exceptional results way beyond my imagination what could be delivered in a few months.
I'm excited to announce that @BEBischof and @HamelHusain will also join us for our https://t.co/kWsq4haB04 event at @SHACK15sf.
The no-bs panel could become interesting.
https://t.co/eviqJjpkVS
I spent the last month researching using LLMs as a relevance assessor, putting emerging IR research into practice for @vespaengine's RAG over our docs.
https://t.co/kDxHYTrkLu
The latest Vespa newsletter is here to help you stay up to date on what's happening on the leading edge in RAG, IR and vector search:
- A new SPLADE embedder
- ONNX models with float16
- @cohere embedding model guides
- Support for an array of chunks with ColBERT
- And list of our latest blog posts you shouldn't miss
https://t.co/77NUlZkWXW
Seems everybody is migrating their search and recommendation systems from Elastic to Vespa now.
Here's the experience of Stanby, Japan's leading job search site: https://t.co/CbFwjrTwh0
The Singaporean government has deployed Vespa to search every word ever said in their Parliament.
"A good decision is an informed one [...] The heart of a good RAG system is a good search engine to retrieve the relevant data chunks for ingestion"
Many teams are racing to make use of these new methods to achieve superior quality, but the Singaporean government may have been first to put them in production!
Gigaom published their Sonar for Vector Databases today, positioning Vespa as a leader.
While what we are - and what you need - is much more than a vector database, it is gratifying to be recognized as a leader also on these core features alone.