@karanjagtiani04@milvusio The operations in Milvus can be observed via https://t.co/wUHt76xrmN
The performance and health of the embedding service can be observed by the service providerโs dashboard.
Iโm a fan of Claude Code, an exceptional AI coder invented by @bcherny@_catwu. I wish its context could remember my entire codebase. However, fitting millions of lines of code in each call would burn so many tokens to drive me bankrupt. To solve this problem, we developed an MCP that efficiently stores large codebases in a vector database and searches for related sections to use as context.
The result is Claude Context, a semantic search MCP plugin for Claude Code.
Here's how it works:
๐ Semantic Code Search allows you to ask questions such as "find functions that handle user authentication" and retrieves the code from functions like ValidateLoginCredential(), overcoming the limitations of keyword matching.
โก Incremental Indexing: Efficiently re-index only changed files using Merkle trees.
๐งฉ Intelligent Code Chunking: Analyze code in Abstract Syntax Trees (AST) for chunking. Understand how different parts of your codebase relate.
๐๏ธ Scalable: Powered by @zillizโs scalable vector search, works for large codebases regardless of their size.
Thanks @claudeai@AnthropicAI for the inspiration โ Claude Code + semantic search is a powerful duo. ๐
Iโve been asked by many Milvus users how BM25 works in a vector database. Well, there is no secret. If elasticsearch can implement that with reverted index, so does a vector database with sparse vector. Here's how we did it:
๐๐๐๐ฒ ๐๐ง๐ง๐จ๐ฏ๐๐ญ๐ข๐จ๐ง:
Traditionaly (I mean a year ago๐), Milvus required users to pre-compute BM25 score with term frequencies and corpus-level stats like avg doc length. However, as more docs are ingested, the stats is ever-changing. So that wasn't really practical in the real-world. Since version 2.5, Milvus can compute the BM25 score internally. Users just need to ingest raw text and let Milvus generate the sparse vector.
We also added a new ๐๐25 metric for the sparse vector field and compute the BM25 score dynamically during query time. This hides all the complexity from users and keep the search accurate. Moreover, Milvus can speed up the performance of BM25 search with optimizations unique to sparse vectors. In addition, Milvus doesn't have the burden of JVM as elasticsearch. As a result, the full-text search on Milvus can be 3-5x faster than elasticsearch.
Under the hood, Milvus stores raw ๐๐ (Term Frequency) values as the document vector, and keep the stats like avg doc length as system wide metadata. So the doc vectors don't need to be updated as avg doc length is changing over time. We call this ๐๐ฉ๐๐ซ๐ฌ๐-๐๐25.
๐๐จ๐ฐ ๐๐จ๐๐ฌ ๐๐ฉ๐๐ซ๐ฌ๐-๐๐25 ๐๐จ๐ซ๐ค ๐ข๐ง ๐๐ข๐ฅ๐ฏ๐ฎ๐ฌ:
- During text ingestion, ๐๐ข๐ฅ๐ฏ๐ฎ๐ฌ tokenizes the text, removes stop-words, and stems the tokens
- The tokenized text are converted into ๐ฌ๐ฉ๐๐ซ๐ฌ๐ ๐ฏ๐๐๐ญ๐จ๐ซ๐ฌ by Milvus and stored in a sparse vector field. Milvus also maintains global term distribution statistics across the corpus that are required to compute ๐๐๐ (Inverse Document Frequency) and ๐๐ฏ๐ ๐๐ฅ (average document length).
- Milvus builds the index for the sparse vectors for efficient search.
- At search time, Milvus uses the global term distribution stats to compute the BM25 score dynamically to perform the ๐๐๐ (๐๐ฉ๐ฉ๐ซ๐จ๐ฑ๐ข๐ฆ๐๐ญ๐ ๐๐๐๐ซ๐ฌ๐ญ ๐๐๐ข๐ ๐ก๐๐จ๐ฎ๐ซ) ๐ฌ๐๐๐ซ๐๐ก.
Want to learn more?
๐ Here is the full blog that unveils all the details: https://t.co/yaUDkay3CI
Drop a comment or DM me if you have any questions.
I think 2025 will be the year of multi-agent ๐ค๐ค๐ค
Google's new Agent-to-Agent (A2A) protocol tackles a critical challenge in agent systems: enabling multiple AI agents to work as a team. The framework enables agents to communicate, assign tasks, and synchronize information with each other. If MCP is the USB protocol of agent and peripheral devices, A2A is the HTTP protocol for agents to collaborate with each other, like a service mesh.
The A2A protocol carries several key features:
๐ Capability Discovery: Just like microservices need service discovery, so does an agent network. With A2A, agents can "show their capabilities" through JSON-formatted "Agent Cards" so that client agents can select the best remote agent to complete a task.
๐ ๏ธ Structured Task Lifecycle: Tasks are treated as entities with defined states such as pending, running, completed, or failed. This structure allows for clear tracking and management of tasks throughout their execution.
๐ Asynchronous Communication: Agents can handle long-running operations by communicating asynchronously, ensuring that tasks can progress without requiring constant real-time interaction.
๐งโ๐คโ๐ง Collaboration and Error Handling: The protocol supports collaborative task execution, where agents can seek clarification, request additional information, or handle errors through specialized recovery agents, enhancing resilience in task management.
To see how this works in action, let's imagine implementing robot recruiters with multi-agents:
1๏ธโฃ The recruitment manager can instruct the agent to search for candidates based on job descriptions, locations, skills, etc.
2๏ธโฃ The agent collaborates with other specialized recruitment agents, integrating with platforms like LinkedIn or internal HR systems, to summarize candidate suggestions using A2A.
3๏ธโฃ After the manager reviews the suggestions, the agent can arrange interviews or engage another agent to conduct background checks.
Early prototypes of agents like Devin (AI software engineer) and Manus (general AI agent) are just a start, what's gonna really unlock the potential of agents is vast adoption of tool-using (MCP) and cross-agent collaboration (A2A).
๐ ๐๐จ๐ฐ ๐ญ๐จ ๐๐จ๐จ๐ฌ๐ญ ๐๐๐ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก๐ฉ๐ฎ๐ญ ๐๐ฒ 20๐ฑ ๐ฎ๐ฌ๐ข๐ง๐ ๐๐ฅ๐๐ฆ๐๐๐ง๐๐๐ฑ & ๐๐ข๐ฅ๐ฏ๐ฎ๐ฌ ๐๐ฌ๐ฒ๐ง๐ ๐๐๐?
Async processing unlocks parallelism of your RAG pipeline. Compared to synchronous scheduling, async API can achieve significant performance gains for doc ingestion and search query serving at high throughput. You can build async RAG pipeline with @LlamaIndex and ๐๐ข๐ฅ๐ฏ๐ฎ๐ฌ to fully utilize the hardware's potential, e.g.
๐ Ingest 1,000 doc chunks:
โ- Sync: 62.91s โ Async: 3.22s (19.5x faster)
๐ Serve 1,000 search queries:
โ- Sync: 308.80s โ Async: 8.81s (35x faster)
Why is that possible? Asynchronous processing avoids being blocked by the serialized stream of requests. For LlamaIndex, by simply setting "๐๐๐_๐๐๐๐๐=๐ป๐๐๐" in your VectorStoreIndex with Milvus, you can enjoy 10~50x improvement on throughput and query latency relative to serial scheduling.
In addition, by leveraging local embedding models like BGE or E5 and high-performance inference frameworks like vLLM or NVIDIA Triton, this architecture eliminates bottlenecks like OpenAIโs API rate limits. You can consider this reference architecture for mission-critical search applications such as real-time enterprise RAG and high-volume financial document processing.
๐ง Tutorial: https://t.co/PMug8kxEnu
Discussion: Whatโs your biggest RAG performance challenge? Share your use case below ๐
If you're struggling with ๐๐๐ ๐ช๐ฎ๐๐ฅ๐ข๐ญ๐ฒ, here are a few advanced RAG techniques that are most popular ๐ฌ
1. ๐๐ ๐๐ง๐ญ๐ข๐ ๐๐๐ / ๐๐๐๐ฉ ๐๐๐ฌ๐๐๐ซ๐๐ก The core idea of Deep Research and agentic RAG is self-reflection plus query routing. This strategy can optimize the overall process by reasoning about user queries to understand complex intentions and break them down into sub-queries. (e.g. "Differences between Milvus vs Zilliz Cloud?" โ
Sub-Query 1: "Milvus features" | Sub-Query 2: "Zilliz Cloud features") ๐ ๏ธ https://t.co/7q9bZhniJv
2. ๐๐ฒ๐๐ซ๐ข๐ ๐๐๐๐ซ๐๐ก Combining semantic search and full-text search captures both contextual semantics and special terms for more comprehensive results. This approach requires a vector database like @milvus-io that supports both dense vector embeddings and BM25 scoring with sparse vectors. ๐ ๏ธ https://t.co/Uphp5ClbSl
3. ๐๐จ๐ง๐ญ๐๐ฑ๐ญ๐ฎ๐๐ฅ ๐๐๐ญ๐ซ๐ข๐๐ฏ๐๐ฅ To address the semantic fragmentation caused by chunking, Contextual Retrieval enriches each document chunk with relevant context before embedding. Leveraging a KV Cache reduces redundant computations, making the approach more cost-efficient. ๐ ๏ธ https://t.co/XCZfBSwrAW
4. ๐๐ซ๐๐ฉ๐ก ๐๐๐ RAG enhanced with Knowledge Graphs (KGs) to effectively handle complex entity relationships and multi-hop questions. By representing entities and their interrelations through knowledge engineering, Graph RAG enhances traditional semantic search, leading to a better understanding of intricate connections within text corpora. ๐ ๏ธ https://t.co/XrukA0IVKn
A more detailed report ๐https://t.co/zFz6AH9CW2
A new SOTA ๐๐๐๐๐ ๐ ๐๐๐ ๐๐๐ ๐๐ ๐๐๐ ๐ด๐๐๐๐๐๐๐ ๐๐ ๐บ๐๐๐๐๐: ๐ BGE has just released BGE-VL-MLLM series.
What is Multimodal Search: Also known as ๐๐ฐ๐ฎ๐ฑ๐ฐ๐ด๐ช๐ต๐ฆ ๐๐ฎ๐ข๐จ๐ฆ ๐๐ฆ๐ต๐ณ๐ช๐ฆ๐ท๐ข๐ญ (CIR), it accepts a pair of image and text as input, using the text to augment the search intention expressed by the image, such as ๐ ๐ฉ๐ก๐จ๐ญ๐จ of ๐ด๐ฉ๐ฐ๐ฆ๐ด ๐ธ๐ช๐ต๐ฉ ๐ด๐ฑ๐ฆ๐ค๐ช๐ข๐ญ ๐ฅ๐ฆ๐ด๐ช๐จ๐ฏ and ๐ญ๐๐ฑ๐ญ "๐ฐ๐ฏ ๐ต๐ฉ๐ฆ ๐ฃ๐ฆ๐ข๐ค๐ฉ" to find images of ๐ฅ๐ฆ๐ด๐ช๐จ๐ฏ๐ฆ๐ณ ๐ด๐ฉ๐ฐ๐ฆ๐ด ๐ฑ๐ญ๐ข๐ค๐ฆ๐ฅ ๐ฐ๐ฏ ๐ต๐ฉ๐ฆ ๐ฃ๐ฆ๐ข๐ค๐ฉ. In https://t.co/vgAZkRVM6T live demo we showed the futuristic search experience of image + text retrieval.
The recently released ๐๐๐-๐๐-๐๐๐๐-๐1 is trained with a techique from paper ๐ด๐๐๐๐ท๐๐๐๐: Massive Data Synthesis For Universal Multimodal Retrieval published by @BAAIBeijing. Similar to @GoogleDeepMind's MagicLens paper, it uses large scale data mining and synthetic data generation with VLM to construct a massive dataset of image tuples and the text that describes the relation between the pair. It then uses the dataset and contrastive learning to train the multimodal model. ๐๐๐-๐๐-๐๐๐๐-๐2 is further trained with fine-tuning on the MMEB benchmark training set, achieving even better retrieval quality. The models are available on @huggingface: https://t.co/OB6FvzeSPc.
๐ง๐ป See attached code snippets for how to use it with vector db for search. Paired with a performant vector db @milvusio, the new models can unlock more use cases with multimodal search, such as data exploration and mining on massive scale of visual content.
๐ฎ We are excited to add this to the upcoming vector lake feature in Milvus 3.0 release. If you have that use case, such as multimodel retrieval on billions of items with serving-grade sub-100ms latency target or high tolerance for interactive exploration, please reach out to me!
I'm really impressed by @manusai โ It's close to my imagination of ๐๐ ๐๐จ๐ฆ๐ฏ๐ต ๐ช๐ฏ ๐ข๐ค๐ต๐ช๐ฐ๐ฏ, ๐ฃ๐ฆ๐บ๐ฐ๐ฏ๐ฅ ๐ฑ๐ข๐ด๐ด๐ช๐ท๐ฆ ๐ด๐ถ๐จ๐จ๐ฆ๐ด๐ต๐ช๐ฐ๐ฏ๐ด. ๐๐ฐ ๐ด๐ฉ๐ฐ๐ต๐จ๐ถ๐ฏ ๐ฑ๐ฐ๐ด๐ช๐ต๐ช๐ฐ๐ฏ, ๐ต๐ข๐ฌ๐ฆ ๐ต๐ฉ๐ฆ ๐ฅ๐ณ๐ช๐ท๐ฆ๐ณ ๐ด๐ฆ๐ข๐ต ๐ฑ๐ญ๐ฆ๐ข๐ด๐ฆ.
I like the idea of it: From "Idea"๐ก to "Result" โ๏ธ
End-to-End Automation and sophisticated tool using that can do:
PPT Creation ๐๏ฝStock Analysis ๐๏ฝProperty Search ๐ ๏ฝTravel Planning ๐๏ฝContract Review โ๏ธ๏ฝAudio Production ๐ง
This feels a true "agent": automating workflows from idea โ execution โ delivery to close complex tasks in one click.
๐ก The core tech under the hood: A "Triple-Engine" Architecture
โซ๏ธ Smart Model ร Tool Integration ร Task Orchestration
โซ๏ธ Cloud-based async execution for long-chain tasks.
โซ๏ธ Memory ร Knowledge Base
Historical data + real-time learning = AI evolves from "responder" to "collaborator"!
๐ฎ Next-Level: What if it's equipped with a highly scalable vector db @milvusio?
โ Infinite long-term memory, always context aware
โ Knowledge base for massive proprietary data
โ Multimodal retrieval to not limited to text
I think when AI stops being a "suggestion box" and becomes a true results-maker, the future is here ๐ค
Cool perspective! Are you interested in adding https://t.co/asYjLtcuL1 into the comparison? Feel free to DM me and Iโm happy to help on the set up! Would also love to hear your feedback on the user experience of Milvus.
We also open sourced a benchmark for more production-level test cases (search while doing ingestion, up to billion vector scale, taking machine cost into consideration etc). Maybe you will find that useful: https://t.co/9wamUF0QKK
2024 was a transformative year for Information Retrieval! ๐ง Weโve witnessed the production-scale adoption of RAG and breakthroughs in Graph RAG, multi-modality, ColBERT/ColPali, and Text2SQLโredefining search infrastructure, data discovery, and knowledge synthesis.
Deep learning-powered IR now seamlessly integrates LLMs, hybrid search, rerankers, and structured knowledge tools like Knowledge Graphs, making retrieval more precise and scalable than ever. This is a new Age of Discovery for humanity, as groundbreaking as Columbus reaching the New World over 500 years ago. Iโm thrilled to be part of this exploration!
With Milvus 3.0, we are pushing the boundaries of vector databasesโevolving from a search-serving infrastructure to a comprehensive unstructured data platform. And with 2025 on the horizon, the next wave of innovation is just beginning.
๐ Read more about my reflection on IR in 2024: https://t.co/sso9YmFTWe
#AI #InformationRetrieval #RAG #ColBERT #Text2SQL #LLMs #DeepLearning #VectorDatabases
๐ Deep Research is impressive! So we built one using open-source tools: @langchain + @milvusio + @deepseek_ai. Itโs amazing how far open source can take you these days!
But thereโs moreโthis agent isnโt limited to the public web. It can search private data in your vector DB, deployed on-prem or in your cloud VPC (see Zilliz Cloud BYOC https://t.co/YiUT76hOyD)โall while keeping your data secure.
Open source gives you the flexibility to shape this design pattern however you need. Check out the sample report from a demo research agent by @stefan_webb! ๐ฅ #AI #VectorDB #OpenSource
โญ๏ธ"I Built a Deep Research with Open Sourceโand So Can You!"
Youโve probably heard about OpenAIโs latest release, Deep Researchโdesigned for more detailed, informed, and nuanced responses. But how does the underlying technology actually work?
Our partner, Milvus, has a new post that breaks it down, showing how to build a research assistant similar to Deep Research using open-source tools. It combines LangChain for RAG, Milvus as the vector database, and DeepSeek R1 as the reasoning model.
The post explores key concepts behind agents and reasoning modelsโtool usage, memory, structured output, and planningโgiving you a hands-on look at how this powerful capability can be unlocked.
Check it out: https://t.co/xdUHO3RYyo
โญ๏ธ"I Built a Deep Research with Open Sourceโand So Can You!"
Youโve probably heard about OpenAIโs latest release, Deep Researchโdesigned for more detailed, informed, and nuanced responses. But how does the underlying technology actually work?
Our partner, Milvus, has a new post that breaks it down, showing how to build a research assistant similar to Deep Research using open-source tools. It combines LangChain for RAG, Milvus as the vector database, and DeepSeek R1 as the reasoning model.
The post explores key concepts behind agents and reasoning modelsโtool usage, memory, structured output, and planningโgiving you a hands-on look at how this powerful capability can be unlocked.
Check it out: https://t.co/xdUHO3RYyo
great question! wikipedia data was just used as an example. you can put any data in vector db, like sensitive financial reports for enterprises. https://t.co/asYjLtcuL1 is open source, you can self-host it or use fully-managed version on Zilliz Cloud with free trial: https://t.co/4T416iHNm7
While I strongly believe in upholding academic integrity, itโs equally important to fight against racial discrimination and bias. Associating a lack of integrity with a specific nationality or racial group is a clear discriminatory act. This should not be overlooked just because itโs disguised as a call for moral standards.
When attending graduate school in the U.S., Iโve also heard claims like โChinese, Indian, or Asian students bend the rules for better grades,โ which unfairly targets certain groups. The truth is, academic dishonesty exists across all racial groups. Itโs a universal issue that the academic community must address together.
I hope NeurIPS and the speaker genuinely acknowledge this issue and take meaningful steps to prevent racial discrimination against any group, rather than masking it under the guise of โcultural generalizationโ.