Meet DOOMbench. We ran a multiplayer DOOM server entirely in SQL and stress-tested it against 6 different data stacks: Postgres, DuckDB, ClickHouse, CockroachDB, AlloyDB, and CedarDB.
https://t.co/SksWsakzwI
@cedar_db is incredibly cool and more people should know about it. They’re a team of PhDs in Munich building a new relational database, on top of almost 10 years of academic research, that crushes existing benchmarks and maybe (finally?) gets us to the HTAP grail.
The core idea is that existing RDBMSes like MySQL and Postgres were built more than 30 years ago, on assumptions about hardware constraints that are just not true anymore. These ecosystems have evolved admirably but ultimately…it’s a database. It’s built not to change very much.
Here are a few of the ways that CedarDB is rethinking every element of the database:
1) A better query optimizer
In the last 30 years we’ve made a lot of progress on how to optimize SQL queries, to the point where an optimized query can easily outperform a not-so-optimized query by a ton.
But not many query optimization improvements have made the leap from research into databases today.
CedarDB did a few things on this front:
Implemented the unnesting algorithm developed by Thomas Neumann (one of the leaders of the Umbra research project CedarDB came from) — an improvement of more than 1000x
Developed a novel approach to join ordering using adaptive optimization that can handle 5K+ relations
Created a statistics subsystem that tells the optimizer things that traditional databases can’t
2) What if your database was actually a compiler?
CedarDB doesn’t interpret queries, it instead generates code. For every SQL query that a user writes, CedarDB processes, optimizes it, and generates machine code that the CPU can directly execute.
This has been a holy grail for a while, and they implemented it via a custom low-level language that is cheap to convert into machine code via a custom assembler.
Another way that CedarDB improves performance is through Adaptive Query Execution. Essentially they start executing each query immediately with a “quick and dirty” version, while working on better versions in the background.
3) Taking advantage of all cores / Ahmdal’s law
Distributing fairly between all available cores is notoriously difficult, and the CedarDB team would argue that most databases underutilize their hardware.
Their clever approach to this problem is called morsel-driven parallelism. CedarDB breaks down queries into segments: pipelines of self-contained operations. Then, data is divided into “morsels” per segment – small input data chunks containing roughly ~100K tuples each.
You can read more in the original paper here: https://t.co/0s0gmnuhXc
4) Rethinking the buffer manager
Modern systems come equipped with massive amounts of RAM; there’s actually much more “room at the club” than database developers initially assumed.
So the idea of the revamped buffer manager in CedarDB is that you can (and should) expect variance not just in data access patterns, but in storage speed and location, page sizes and data organization, and memory hierarchy.
CedarDB’s buffer manager is designed from the ground up to work in a heavily multi-threaded environment. It decentralizes buffer management with Pointer Swizzling: Each pointer (memory address) knows whether its data is in memory or on disk, eliminating the global lock that throttles traditional buffer managers.
5) Building a database for change
Databases are built to not change. It’s exactly this stability that gives each generation the confidence to build their apps (no matter how different they are) on systems like Postgres. You know what you’re getting. But there’s also a clear downside to this rigidity.
CedarDB’s storage class system employs pluggable interfaces where adding new storage types doesn’t require rewriting other components. E.g. if CXL becomes the go-to storage interface at some point in the future, you don’t need to write another whole component, you just need another endpoint for the buffer manager.
Anyway these are just a few of the ideas they’re bringing to the table. Maybe it’s because they’re in Germany, maybe it’s because they’re just really humble, but more people should know about this team!!
Check out the full post here: https://t.co/HV28EElXsQ
What if a database could be your game engine?
During parental leave @VogelLu built DOOMQL: A multiplayer DOOM-like where everything (rendering, game loop, state) runs in pure SQL on CedarDB.
It's fast, ridiculous, and surprisingly elegant.
Full write-up: https://t.co/3j1TEEsvUD
Just released another episode in the "Modern Database" Playlist which has 45+ database episodes.
This time we are talking to @PhilippFent from @cedar_db about #CedarDB and learning about the innovations and the amazing engineering behind this modern ultra fast database.
Key Takeaways:
- CedarDB is built from the ground up to utilize modern hardware effectively.
- The system compiles SQL directly to machine code for performance.
- Parallel processing is a key feature, allowing efficient use of multiple cores.
- CedarDB aims to be Postgres compatible while innovating on performance.
- Transactional workloads are handled efficiently without sacrificing analytical capabilities.
- Data ingestion is optimized for both row-oriented and columnar formats.
- The system uses optimistic concurrency control to manage write conflicts.
- Query optimization leverages statistics to improve join performance.
- Future developments include schema evolution and disaggregated storage.
- CedarDB is designed to be flexible and adaptable for various workloads.
Watch the entire episode and enjoy! Link below
Next episode(releasing soon) is another banger on The GeekNarrator.
@cedar_db with @PhilippFent
We go really deep into the architecture, data structures, analytics, query optimisation and so on.
In case you missed the latest banger on Uni kernels check this out:
You don't need Linux, Docker, k8s? Future with Unikernels ft. NanoVMs
https://t.co/JsGmS3bCJR
Congratulations to SortMergeJoins from TU Munich - winners of the 2025 SIGMOD Programming Contest! Built by the Umbra research group (CedarDB’s roots), their system ran 12× faster than median - entirely open-source and no sort-merge-joins to be found 😉: https://t.co/TV0E5NzQiv
You don’t need an army of C++ devs to hand-optimize every query. We let the code write the code. Read our latest blog post to see how we mix runtime flexibility with almost magical performance!
https://t.co/hEdOvnw0z4
@ifesdjeen@FilasienoF@sunbains@cedar_db This row ID B-Tree is our main data storage, so basically our "heap". Our Blog has more details: https://t.co/OD5hE1ViOW
Have you ever wondered why you see the last entry again when switching to the second page of a website?
The culprit is "offset"! Read why in our blog post and find out what to use instead.
https://t.co/GZaUIXaEnI
Proud advisor moment: happy to share that my PhD student Michael Jungmair has received the Google PhD Fellowship in Software Systems! This supports his further research on LingoDB. @tum_db@TU_Muenchen@Google https://t.co/k2kLU6typL
𝗔 𝗽𝗲𝗲𝗸 𝘂𝗻𝗱𝗲𝗿 𝘁𝗵𝗲 𝗵𝗼𝗼𝗱 𝗼𝗳 𝗗𝘂𝗰𝗸𝗗𝗕
If you want to know what makes DuckDB so fast, check out the new blog post by Tim Ebergen on how the "optimizers play a silent, but vital role when using a database": https://t.co/dSBKfh1Pic
How hard can summing up numbers be? ➕ 🔢 It turns out that correctly calculating the sum of integers has more pitfalls than you might expect! Read our latest blog post for details: https://t.co/m7Wfys95E2
@EmreSevinc@cedar_db The latest paper from @tum_db showed that ternary joins are enough for almost all graph queries:
https://t.co/DkqXjhdVuT
@andy_pavlo I'm not sure if we still need actual worst-case optimal joins.
The underlying data layout of your program can either help or hurt your algorithms.
In our latest post, we explore why optimizing the storage layout is the key to unlocking blazingly fast data processing.
https://t.co/4oonsn2NOh
Having data in memory is considered the gold standard for data analysis, but are you sure that RAM locality alone is enough?
Read on to learn about the new king of efficient data processing:
https://t.co/gpfq0Zkopq
Part 2 of how to use StringView / German Strings to *actually* make queries faster. Comparisons, inlining, gc, and buffer size tuning. https://t.co/sYEhGRnp5M Again all due to our incredible intern Xiangpeng Hao
Why old-school exceptions are still a good idea:
They provide a better user and developer experience and are even faster than error values!
Read more in our blog: https://t.co/isW7MZI2w7