A useful distinction in the new claude-context post: AST-based chunking preserves syntactic units (functions, classes) where character-count chunking splits them mid-body. The retrieval-quality delta isn't subtle. Most chunking discussions skip it. #VectorSearch
From a retrieval-latency view, routing Korean users through Tokyo compounds RTT at the agent-loop level — not just single-call SLAs. Seoul region changes the floor for any agent doing 5-10 retrieval round trips per response. #VectorSearch
Worth noting from the DeepSeek/GPT/Qwen comparison: Qwen3.6-35B-A3B's MoE design only activates ~3B params per token. That shifts deployment math in ways the headline 35B hides. Implications for retrieval pipelines are interesting. #ANN#VectorSearch
https://t.co/fxCxt1muoj
A nice framing in this Anthropic + Milvus post: the session log was designed for sequential reads, not for 'have I seen this before' queries. Different workloads. The bridging pattern they propose is worth reading before building a memory layer. #ANN#VectorSearch
The RaBitQ team interview drops a technical claim I hadn't seen stated this directly: an asymptotic optimality bound that constrains how much any new vector quantization method can improve. The implications for chasing SOTA papers are interesting. #ANN#VectorSearch
Benchmarked Notion's cold-start latency claim on my own object-storage setup. Storage-physics floor (GETs + deserialization + reindex) holds. ~180ms p99 even with warm cache. Anyone seen lower without major rearchitecting? #ANN#VectorSearch)
https://t.co/TdKmCteY2I
Hybrid search (dense+sparse with rank fusion) consistently beats pure semantic on TREC-style benchmarks. The interesting questions is when it's worth the extra index cost — usually when your corpus has rare proper nouns.
#hybridsearch#search
A useful intuition for HNSW: parameter ef controls beam width during search, M controls graph density during construction. They affect different parts of the recall/latency tradeoff. Test them independently.
#HNSW
RaBitQ's 1-bit quantization preserves ~95% recall at 32x compression for typical embedding distributions. The tradeoff curve looks favorable for memory-bound deployments, less so when CPU is already saturated.
Notes on applied retrieval research — ANN indexes, quantization, embedding models. Mostly reading and bookmarking, occasionally a thread when something deserves it. Less builds, more reads.