Very proud to see Transformers Interpret hit 1k stars today with over 200,000 downloads. Maintaining a library has been one of the hardest and most rewarding things I've done. Looking forward to adding some very cool new explainers **cough** Seq2Seq LLM's models very soon ๐
๐ข Toot! The team is shipping again!
Weโre seeing a massive wave of developers & agents building advanced AI memory capabilities on top of Weaviate. To make that even easier, we just launched Engram, a dedicated memory service built right on our database.
Most agent memory systems are just glorified context windows.
And this is exactly why production agents fail at scale.
We've been working on this for months, and it's finally here: ๐๐ป๐ด๐ฟ๐ฎ๐บ ๐ถ๐ ๐ป๐ผ๐ ๐๐.
If you've been building agentic applications, you know the problem. Agents that should get smarter over time stay flat instead. They forget user preferences, re-solve the same problems repeatedly, and waste tokens on work that can't be reused. Long context windows help, but cramming them full degrades accuracy, inflates costs, and increases latency.
๐๐ป๐ด๐ฟ๐ฎ๐บ ๐๐ผ๐น๐๐ฒ๐ ๐๐ต๐ถ๐.
It's a managed memory service built on Weaviate that ๐ข๐ค๐ต๐ช๐ท๐ฆ๐ญ๐บ ๐ฎ๐ข๐ช๐ฏ๐ต๐ข๐ช๐ฏ๐ด memory instead of just storing it. Asynchronous pipelines extract relevant information from raw data, reconcile it with existing memories (handling deduplication, preference changes, time-evolving facts), and persist clean, structured memory state ready for retrieval.
๐๐ฒ๐ ๐ฐ๐ฎ๐ฝ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ถ๐ฒ๐:
๐๐ถ๐ฟ๐ฒ-๐ฎ๐ป๐ฑ-๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ฃ๐ โ Add raw data and continue working. Pipelines run asynchronously in the background with durable execution.
๐ง๐ผ๐ฝ๐ถ๐ฐ๐ ๐ฎ๐ ๐บ๐ฒ๐บ๐ผ๐ฟ๐ ๐บ๐ฎ๐ด๐ป๐ฒ๐๐ โ Natural language descriptions that pull matching information from raw data. You control what's worth remembering.
๐ฆ๐ฐ๐ผ๐ฝ๐ฒ๐ ๐ณ๐ผ๐ฟ ๐ถ๐๐ผ๐น๐ฎ๐๐ถ๐ผ๐ป โ Project-wide, user-scoped, or property-scoped memories with hard and soft isolation enforced at the platform level.
๐๐ผ๐บ๐ฝ๐ผ๐๐ฎ๐ฏ๐น๐ฒ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ๐ โ Extract, transform, buffer, and commit steps that manage memories dynamically based on data type and preferences.
๐๐๐ถ๐น๐ ๐ผ๐ป ๐ช๐ฒ๐ฎ๐๐ถ๐ฎ๐๐ฒ โ Memory retrieval inherits Weaviate's vector + keyword + metadata search on the same production stack you already trust, using native multi-tenancy to isolate instances.
Whether you're building chatbots that remember user preferences, agents that learn from experience, or multi-agent systems that need shared context, Engram gives you memory as infrastructure.
As a promotional offer, weโre giving $75 in credits for your first three months of Engram! Sign up before July 15th to claim it.
Read the blog: https://t.co/24koipQIjO
Get started: https://t.co/v9fMGMNCej
Introducing Suggest Queries Mode ๐
Weaviateโs Query Agent can now suggest queries that are answerable by the data in your collections.
This helps users discover what kinds of questions they can ask! ๐ฌ
๐จ๐ป๐ฝ๐ผ๐ฝ๐๐น๐ฎ๐ฟ ๐ผ๐ฝ๐ถ๐ป๐ถ๐ผ๐ป: The best hires aren't experienced.
They're hungry. And we're searching for two of them right now.
We have two open positions in the Growth team at @weaviate_io:
๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐ฒ๐ฟ ๐๐ฟ๐ผ๐๐๐ต ๐๐ป๐๐ฒ๐ฟ๐ป
๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐ฒ๐ฟ ๐ ๐ฎ๐ฟ๐ธ๐ฒ๐๐ถ๐ป๐ด ๐ ๐ฎ๐ป๐ฎ๐ด๐ฒ๐ฟ
We're looking for people who aren't afraid to work hard, who want to grind, and who are hungry to learn how to define narratives and win on social media.
Not people who check boxes. Not people who want a comfortable 9-5. We want the next diamonds - people who see an opportunity to learn from a team that's built a developer community from the ground up and think "I want to know how they did that."
You'll learn:
โข How to craft narratives that cut through noise
โข What actually works on social media (not what the courses tell you)
โข How to build genuine engagement with developer communities
โข The difference between marketing ๐ข๐ต developers and marketing ๐ธ๐ช๐ตโ them
This isn't going to be easy. But if you're the type of person who gets excited by that statement rather than intimidated by it, you might be exactly who we're looking for.
Apply here: https://t.co/v0lU0QfX7B
Complete the challenge. Impress us. ๐
How do we build search systems for Agents? ๐พ๐
I am SUPER EXCITED to share a new episode of the Weaviate Podcast with Zijian Chen (@zijian42chen) and Xueguang Ma (@xueguang_ma) from the University of Waterloo on AgentIR! ๐๏ธ๐
When humans search, we write short queries and keep our reasoning in our heads. Deep Research agents do the opposite. They leave reasoning traces that reflect on prior results, clarify intent, and plan what to search next. Existing retrievers completely ignore this signal because they were designed for human queries. ๐ญ
AgentIR jointly embeds the agent's reasoning trace alongside its query, training a retriever that actually understands what the agent is thinking. AgentIR-4B hits 68% accuracy on BrowseComp-Plus compared to 52% for conventional embedding models twice its size. ๐
One idea I found especially interesting is how AgentIR raises context management questions for agents: what should be remembered, compacted, or retrieved just in time? The current reasoning trace naturally curates history by summarizing confirmed findings and filtering out wrong guesses. Forgetting becomes a feature, not a bug. ๐ฌ
We also covered BrowseComp-Plus, their benchmark for disentangled evaluation of agents and retrievers, and the open question of scaling search deeper vs. wider.
If you're working at the intersection of Agents and Search, I think you'll get a lot out of this one! Links below! ๐
We've been tinkering away on something new at @weaviate_io about the role memory plays not just for chatbots but for the agents you have deployed in prod.
It's become clear that memory isn't a nice to have but that it's essential layer for you apps and agents to succeed in the long run.
Your AI agent worked perfectly in January.
By June, it's confidently giving you wrong answers. Here's why:
As AI applications graduate from PoCs to production, we're hitting a wall that better models can't solve: ๐น๐ฎ๐ฐ๐ธ ๐ผ๐ณ ๐ฐ๐ผ๐ป๐๐ถ๐ป๐๐ถ๐๐.
๐ง๐ต๐ฒ ๐น๐ถ๐บ๐ถ๐๐ฒ๐ฑ ๐น๐ผ๐ผ๐ฝ ๐ฝ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ
Today's AI applications treat each interaction as largely disposable. You've felt it already: repeating preferences, restating context, and re-teaching the same facts.
At agent scale, the problem worsens. Agents re-derive the same conclusions, regenerate identical facts, and discard half-finished work, and what looks like forgetfulness for humans turns into systemic chaos for machines.
๐ช๐ต๐ ๐ป๐ฎ๐ถ๐๐ฒ ๐บ๐ฒ๐บ๐ผ๐ฟ๐ ๐๐ถ๐น๐น ๐ณ๐ฎ๐ถ๐น
Here's what happens with a basic memory implementation:
Week 1: Magic! The agent remembers.
Month 3: Responses slow down as memory bloats.
Month 6: Answers drift wildly as the model pulls from conflicting and outdated context.
Helpful continuity has slowly turned into accumulated noise.
๐ง๐ต๐ฒ ๐๐ต๐ถ๐ณ๐: ๐บ๐ฒ๐บ๐ผ๐ฟ๐ ๐ถ๐๐ปโ๐ ๐๐๐ผ๐ฟ๐ฒ๐ฑ, ๐ถ๐โ๐ ๐ฎ๐ข๐ช๐ฏ๐ต๐ข๐ช๐ฏ๐ฆ๐ฅ.
Useful memory systems actively manage context through write control, deduplication, reconciliation, amendment, and purposeful forgetting.
Without these, memory becomes an ever-growing pile of notes. With them, it becomes ๐ฟ๐ฒ๐น๐ถ๐ฎ๐ฏ๐น๐ฒ ๐๐๐ฎ๐๐ฒ.
At Weaviate, we treat memory as a first-class data problem: durable, governable, and safe under change.
Read the full blog post on our vision for memory and signup for the product preview: https://t.co/fmC9r6wMgP
When we were building Weaviate's Transformation Agent out, we knew we needed to saturate our GPUs as much as possible and squeeze every bit of utilization we could. Not only was this possible with @modal but it made the whole process of implementation incredibly straightforward.
It's time to run your own LLM inference. The open models and open source engines are ready. Are you?
We've been working with leading teams like @DecagonAI, @weaviate_io, and @reductoai to ship production-grade inference.
Here's how we do it:
Search Mode is now available in the Weaviate Console! ๐ป๐จ
A new button lets you toggle between Ask and Search! ๐ฑ๏ธ
Another cool aspect of this is that you can switch to Search Mode in the middle of a conversation with Ask Mode! ๐ฌ
Check it out! ๐
Multi-vector models like ColBERT and ColPali are incredible for retrieval quality, but they eat memory for breakfast. We're talking 10-13x more memory than single-vector embeddings because you're storing hundreds of vectors per document instead of one.
MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) changes this by encoding multi-vector embeddings into single fixed-size vectors through space partitioning and dimensionality reduction.
Real results from our LoTTE benchmark tests:
- Memory footprint: 12GB โ <1GB (~70% reduction)
- Import speed: 20+ minutes โ 3-6 minutes
- Recall: 80-90%+ with proper HNSW ef tuning
The trade-off? Some recall loss that you can mitigate with higher ef values, though that reduces query throughput. But when you're looking at tens or hundreds of thousands of dollars in cloud costs per year, this is a no-brainer for large-scale deployments.
Available in Weaviate 1.31+ and it's literally just a couple lines of code to enable!
I've been learning about agent memory for the past few weeks.
This new blog summarizes everything I've learned so far:
โข What is agent memory, and why do you need it
โข What are the types of memory (and what categorization approaches are there?)
โข How do you manage memory in AI agents
Join me in learning about memory for agents:
https://t.co/iVTRKAp3Fz
I am SUPER EXCITED to publish the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin (@xiaoqiang_98), the lead author of REFRAG from Meta Superintelligence Labs! ๐๏ธ๐
Traditional RAG systems use vectors to retrieve relevant context, but then throw away the vectors, just giving the content to the LLM. REFRAG instead feeds the LLM these pre-computed vectors, achieving massive gains in long context processing and LLM inference speed! ๐งฌ
REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts! ๐ฅ๐ฅ
There are so many interesting aspects to this and I loved diving into the details with Xiaoqiang! I hope you enjoy the podcast! ๐๏ธ
As a quick TLDR, there are two key aspects to understanding how REFRAG works:
1. The particular way REFRAG represents context tokens and injects them into LLM decoding, as well as how this speeds up LLM inference. โก๏ธ
2. The training algorithm used to align the encoder, projection layer, decoder, and selective chunk expansion policy! ๐ญ
We benchmarked the Query Agentโs Search Mode vs. Hybrid Search across 12 IR benchmarks from BEIR, LoTTe, BRIGHT, EnronQA, and WixQA.
The results? +17% average improvement in Success @ 1 and +11% in Recall @ 5!
Learn more about the benchmarks and dive into our experimental details:
๐ย Blog post: https://t.co/8FGLNIhTxX
I just finished reading @weaviate_io new blog on Search Mode benchmarks , and itโs a real milestone for the vector search + LLM community.
Search Mode is Weaviateโs new compound retrieval system , basically a smarter way of doing search that goes beyond โhybridโ (keyword + vector).
To go beyond standard hybrid search , Weaviate has developed several important methods .
Query Expansion :
Adds related terms so the engine doesnโt miss relevant docs.
Query Decomposition :
Breaks a complex user request into smaller parts.
Reranking :
Intelligently reorders results so the best answers rise to the top.
Think of it as search that not only finds but also understands how to shape the question, and the response!
Benchmarks tested :
1. BEIR (standard IR benchmark across multiple datasets)
2. LoTTe (long-tail, hard questions)
3. BRIGHT (biomedical retrieval)
4. EnronQA (enterprise email data)
5. WixQA (real-world domain-specific QA)
On every one of these, Search Mode shows gains over hybrid search!
As Weaviate is increasingly turned to as the preferred Search engine for mission critical applications, there are important benefits.
For RAG pipelines, higher precision means fewer hallucinations.
For agentic systems, decomposition means the retriever can keep up with multi-step reasoning.
For enterprises, messy datasets (emails, docs, portals) become far more searchable without manual tuning.
Amazing, absolutely amazing!
I am SUPER excited to share our new Information Retrieval benchmarks! ๐ฅณ
Search Mode is a Compound Retrieval System that utilizes Query Expansion, Query Decomposition, Reranking, and more to achieve super accurate search results! ๐ฏ
The blog post demonstrates how it performs compared to Hybrid Search on the BEIR, LoTTe, BRIGHT, EnronQA, and WixQA benchmarks! ๐
We also describe further what these benchmarks are and why we chose them! I hope you find this interesting! ๐๐
I am SUPER excited to publish the 128th episode of the Weaviate Podcast featuring Charles Pierse (@cdpierse)! ๐๏ธ๐
Charles has lead the development behind the GA release of Weaviate's Query Agent! ๐
The podcast explores the 6 month journey from alpha release to GA! Starting with the meta from unexpected user feedback, collaboration across teams within Weaviate, and the design of the Python and TypeScript clients. ๐๐ค๐จ
We then dove deep into the tech! Discussing citations in AI systems, schema introspection, multi-collection routing, and the Compound Retrieval System behind search mode. ๐ ๏ธ
Back into the meta around the Query Agent, we ended with its integration with Weaviate's GUI Cloud Console, our case study with MetaBuddy, and some predictions for the future of the Weaviate Query Agent! ๐
I had so much fun chatting about these things with Charles! I really hope you enjoy the podcast! ๐๏ธ
We've been heads down the past few month getting the Query Agent ready for graduation to GA ๐
Super proud of all the work from the team at @weaviate_io that has gone into getting it to where it is today.
If you want to supercharge your retrieval or build a complex chatbot on top of your Weaviate data, its a simple as a single line of code with Query Agent!
Weโre excited to announce:
The Weaviate Query Agent is now GA!
WQA is a Weaviate-native agent that transforms natural language questions into precise database operations, giving you reliable, fully transparent results.
It supports:
โข Dynamic filters
โข Smart routing across collections
โข Aggregations
โข Accurate results with full source citations
The result? Faster, more reliable, and fully transparent data-aware AI.
How to get started:
โข ๐๐ป ๐๐ต๐ฒ ๐๐ผ๐ป๐๐ผ๐น๐ฒ:ย Explore your data with natural language. See the Agent's "thought process" and the sources it used.
โข ๐ฉ๐ถ๐ฎ ๐๐ฃ๐๐/๐ฆ๐๐๐:ย Embed this intelligent querying directly into your applications, reducing boilerplate and shipping faster.
Key benefits:
โข Say goodbye to custom query-rewriting pipelines.
โข Get structured, predictable data back.
โข Full transparency: see every filter, aggregation, and source.
โข Automatically handles queries across multiple data collections and tenants.
Try it yourself:
๐ฌ Colab quickstart โ https://t.co/Z7ZMDUVwTf
โ๏ธ Launch blog โ https://t.co/wJHrZ70Lca