Kronotop v2026.06-4 is out 🚀
New in this release:
• $slice projection operator - return just a slice of an array field,
• Fixed a bug where index cardinality could be over-counted when documents were written during index creation
https://t.co/El43WxSvZl
Kronotop is a distributed multi-model database built on FoundationDB.
It provides strictly serializable ACID transactions across documents and ordered key-value data, allowing multiple data models to share a single transaction boundary.
https://t.co/El43WxSvZl
Interesting to see Olric mentioned in a technical breakdown of NVIDIA’s open-source NVCF platform 🙂
According to the article, the LLM gateway uses Olric for token-aware rate limiting.
Nice to see a project you built show up in AI infrastructure.
https://t.co/oHitLdSY8v
Olric provides a simple way to create a fast, scalable, and shared pool of RAM across a cluster of machines. It's a distributed, in-memory key/value store and cache, written entirely in Go and designed for distributed environments.
Olric v0.7.0 is out!
https://t.co/hItLlN51SQ
Kronotop is an Open Source Redis-compatible, distributed and transactional document database backed by FoundationDB. https://t.co/v7mrk7QqpH #opensource#redis
@huseyinbabal Teşekkürler. Geri bildirimlerinizi beklerim. Bütün metadata zaten FoundationDB'de duruyor ve basit tutmak adına leader-follower tasarımını tercih ettim. Web sitesi isine henüz giremedim çünkü CSS çok zor bir şey. 😀
Every day on YouTube, people upload 4 million videos and watch 5 billion videos. Handling this staggering traffic requires a vast fleet of servers. So when a new request comes in, where does it go? How do you balance load across so many servers for so many jobs?
I love this paper because it proposes a clever load balancing system that works at the largest of scales. Let’s imagine a YouTube service is distributed across 1000 servers. Whenever a new query arrives, we send it to one of those servers to process. How do we choose which one?
The obvious load balancing strategy is CPU-based, sending requests to whichever server has the lowest utilization. This works pretty well! But the problem with CPU-based load balancing is that CPU utilization is a lagging indicator that’s averaged over a measuring period and might not reflect the exact current state of a server. That means CPU-based load balancing can risk momentarily overloading servers and causing spikes in tail latency.
How do we balance load better? The idea is simple and elegant: if your goal is to minimize query latency, why not send each query to the server with the lowest latency? This works through probing. When a new query comes in, the load balancer sends lightweight probes to a number of target servers. Each probe measures two instantaneous signals: estimated latency of the server (based on latency of recent queries) and the number of requests in flight. Usually, queries are routed to the server with the lowest estimated latency. The exception is if the servers have very high numbers of requests in flight, indicating high incoming load–then, requests are routed to the servers with the fewest requests in flight.
What’s most impressive about this paper is that Prequal really works. The authors report that when they deployed it in production at YouTube, tail latency dropped by 2x, significantly reducing error and lag spikes for users. It’s not often that we see a systems paper produce results like that in the real world.
Want to learn what it takes to cache billions of tiny objects?
I really like this paper because it not only presents a clever algorithmic solution to an important systems problem, but also thoroughly evaluates it on real-world data.
The basic challenge here is that many systems (especially, but not only, in social media) want to cache billions of tiny objects (like new posts/messages) on SSDs to improve serving performance. However, existing cache strategies don't work well.
Log-structured caches write objects sequentially and index them in memory, but for tiny objects that index grows too large to fit in memory. Set-associative caches hash objects into "sets" so you don't need an index--you can look up an object's page by its hashed key--but every update requires an entire page write which rapidly degrades the SSD (you can only write to an SSD so many times before it wears out).
This paper's clever idea is to combine the two cache strategies to get their advantages without their disadvantages. They buffer incoming writes in a small log-structured cache, which writes to the SSD efficiently (as you're writing sequentially, so you write a page at a time) but doesn't need much memory (as it's small). Periodically, they export keys to a much larger set-associative cache, doing the exports in large batches to the same set to avoid degrading the SSD.
When a read comes in, it first checks the log-structured cache, then goes to the larger set-associative cache.
This design produces a cache that's fast, doesn't require much memory, and doesn't degrade SSDs. The authors prove this with an extensive evaluation on production Facebook traces, verifying all these objectives.
One big takeaway--there are only so many ways you can optimize a system, no matter how large or complex. Caching and buffering are basic strategies, but if used cleverly are very effective!