Walid Chergui

@walidchergui

Scala, FP, Spark, Big Data, machine learning, JVM, distributed systems, performance.

Paris

Joined February 2011

395 Following

302 Followers

2.8K Posts

Pinned Tweet

Walid Chergui @walidchergui

almost 9 years ago

Mon talk "La reconnaissance d’image par réseau de neurones" https://t.co/DqulqPDqJJ @voxxed_lu

walidchergui retweeted

freeCodeCamp.org

@freeCodeCamp

about 2 months ago

A database is one of the key parts of many software systems. And over time, database design has changed and evolved, from the early B-Tree structures to more modern LSM Tree options. In this handbook, Ramesh shows you how to build an LSM Tree storage engine from scratch to help you understand how it works. https://t.co/JrhNP56haR

freeCodeCamp's tweet photo. A database is one of the key parts of many software systems.

And over time, database design has changed and evolved, from the early B-Tree structures to more modern LSM Tree options.

In this handbook, Ramesh shows you how to build an LSM Tree storage engine from scratch to help you understand how it works.

https://t.co/JrhNP56haR

286

206

26K

walidchergui retweeted

Apache Iggy (Incubating) @ApacheIggy

3 months ago

Thread-per-core + io_uring? See how we moved Apache Iggy from Tokio's work-stealing to a shared-nothing* architecture to make the most out of modern hardware and provide ultra-low, predictable tail latencies. https://t.co/lEu38aEMWV #io_uring #iggy #rust #streaming #apache #asf

ApacheIggy's tweet photo. Thread-per-core + io_uring? See how we moved Apache Iggy from Tokio's work-stealing to a shared-nothing* architecture to make the most out of modern hardware and provide ultra-low, predictable tail latencies.

https://t.co/lEu38aEMWV

#io_uring #iggy #rust #streaming #apache #asf https://t.co/Jup9AxSTIU

186

106

13K

walidchergui retweeted

Anton Zhiyanov @ohmypy

4 months ago

SwissTable, a high-performance open-addressing hash map originally developed by Google, is becoming more popular in the industry. First, Rust adopted it for its HashMap type. Then Go started using a custom version of SwissTable for its map type. Now, Valkey (a Redis fork) has rewritten its core hash table data structure, switching from the old chained implementation to SwissTable. Pretty cool! https://t.co/VuwuIiJOGn

546

383

39K

Who to follow

Pierre-Antoine Grégoire 🌊🐧🇫🇷📷🎸🎮

@zepag

Geek, photography enthusiast, Open Source and Devops fan, organizer of @devops_lu, @YaJUG and @voxxed_lu conference

walidchergui retweeted

sisyphus bar and grill

@itunpredictable

4 months ago

@cedar_db is incredibly cool and more people should know about it. They’re a team of PhDs in Munich building a new relational database, on top of almost 10 years of academic research, that crushes existing benchmarks and maybe (finally?) gets us to the HTAP grail. The core idea is that existing RDBMSes like MySQL and Postgres were built more than 30 years ago, on assumptions about hardware constraints that are just not true anymore. These ecosystems have evolved admirably but ultimately…it’s a database. It’s built not to change very much. Here are a few of the ways that CedarDB is rethinking every element of the database: 1) A better query optimizer In the last 30 years we’ve made a lot of progress on how to optimize SQL queries, to the point where an optimized query can easily outperform a not-so-optimized query by a ton. But not many query optimization improvements have made the leap from research into databases today. CedarDB did a few things on this front: Implemented the unnesting algorithm developed by Thomas Neumann (one of the leaders of the Umbra research project CedarDB came from) — an improvement of more than 1000x Developed a novel approach to join ordering using adaptive optimization that can handle 5K+ relations Created a statistics subsystem that tells the optimizer things that traditional databases can’t 2) What if your database was actually a compiler? CedarDB doesn’t interpret queries, it instead generates code. For every SQL query that a user writes, CedarDB processes, optimizes it, and generates machine code that the CPU can directly execute. This has been a holy grail for a while, and they implemented it via a custom low-level language that is cheap to convert into machine code via a custom assembler. Another way that CedarDB improves performance is through Adaptive Query Execution. Essentially they start executing each query immediately with a “quick and dirty” version, while working on better versions in the background. 3) Taking advantage of all cores / Ahmdal’s law Distributing fairly between all available cores is notoriously difficult, and the CedarDB team would argue that most databases underutilize their hardware. Their clever approach to this problem is called morsel-driven parallelism. CedarDB breaks down queries into segments: pipelines of self-contained operations. Then, data is divided into “morsels” per segment – small input data chunks containing roughly ~100K tuples each. You can read more in the original paper here: https://t.co/0s0gmnuhXc 4) Rethinking the buffer manager Modern systems come equipped with massive amounts of RAM; there’s actually much more “room at the club” than database developers initially assumed. So the idea of the revamped buffer manager in CedarDB is that you can (and should) expect variance not just in data access patterns, but in storage speed and location, page sizes and data organization, and memory hierarchy. CedarDB’s buffer manager is designed from the ground up to work in a heavily multi-threaded environment. It decentralizes buffer management with Pointer Swizzling: Each pointer (memory address) knows whether its data is in memory or on disk, eliminating the global lock that throttles traditional buffer managers. 5) Building a database for change Databases are built to not change. It’s exactly this stability that gives each generation the confidence to build their apps (no matter how different they are) on systems like Postgres. You know what you’re getting. But there’s also a clear downside to this rigidity. CedarDB’s storage class system employs pluggable interfaces where adding new storage types doesn’t require rewriting other components. E.g. if CXL becomes the go-to storage interface at some point in the future, you don’t need to write another whole component, you just need another endpoint for the buffer manager. Anyway these are just a few of the ideas they’re bringing to the table. Maybe it’s because they’re in Germany, maybe it’s because they’re just really humble, but more people should know about this team!! Check out the full post here: https://t.co/HV28EElXsQ

676

694

287K

walidchergui retweeted

Mitchell Troyanovsky

@mitch_troy

4 months ago

Making devs finally write docs by calling them “skills” is a great arb

250

262

187K

walidchergui retweeted

Milton Friedman Quotes

@MiltonFriedmanW

6 months ago

Milton Friedman: “Keep your eye on one thing and one thing only: how much government is spending, because that’s the true tax.” “If you’re not paying for it in the form of explicit taxes, you’re paying for it indirectly in the form of inflation or borrowing.”

161

18K

596K

walidchergui retweeted

Bilgin Ibryam

@bibryam

6 months ago

What Actually Makes You Senior https://t.co/PQ5ICn5kBS Unlike mid-level engineers who excel at clear, defined tasks, senior engineers thrive in fuzzy situations by asking smart questions, clarifying scope, identifying assumptions, and breaking down complexity. This skill often goes unnoticed because it prevents problems before they appear, making projects look “smooth."

496

562

33K

walidchergui retweeted

Vivek Galatage

@vivekgalatage

10 months ago

Designing a SIMD Algorithm from Scratch by Miguel is absolutely a great read! https://t.co/Xs5iTHR7ul

151

245K

walidchergui retweeted

internet hall of fame

@InternetH0F

6 months ago

A YouTuber recorded the speed of light with a 2 billion FPS camera

370

75K

13K

walidchergui retweeted

Papa qui paie 🇨🇵💸

@Papaquipaie

6 months ago

@EricLarch Créer un micro-état libéral en France

163

walidchergui retweeted

Vivek Galatage

@vivekgalatage

7 months ago

Periodic reminder of this amazing talk https://t.co/V1usGa2akA

731

912

124K

walidchergui retweeted

Frank

@jedisct1

7 months ago

blobd: single-machine object store with sub-millisecond reads and 15 GB/s uploads https://t.co/Zj6fSHuPID

178

136

14K

walidchergui retweeted

PragmaticProgrammers

@pragprog

about 1 year ago

Authors @smamol and @Molio sat down with @pragdave to talk about Self Selecting Teams. (4/4) Just out of Beta and 40% off in this week's Pragprog Best Sellers Sale (thru May 14): Creating Great Teams, Second Edition Links to full interview and to buy the book on sale - 🧵

593

walidchergui retweeted

Bytebytego

@bytebytego

about 1 year ago

25 Papers That Completely Transformed the Computer World. 1. Dynamo - Amazon’s Highly Available Key Value Store 2. Google File System: Insights into a highly scalable file system 3. Scaling Memcached at Facebook: A look at the complexities of Caching 4. BigTable: The design principles behind a distributed storage system 5. Borg - Large Scale Cluster Management at Google 6. Cassandra: A look at the design and architecture of a distributed NoSQL database 7. Attention Is All You Need: Into a new deep learning architecture known as the transformer 8. Kafka: Internals of the distributed messaging platform 9. FoundationDB: A look at how a distributed database 10. Amazon Aurora: To learn how Amazon provides high-availability and performance 11. Spanner: Design and architecture of Google’s globally distributed databas 12. MapReduce: A detailed look at how MapReduce enables parallel processing of massive volumes of data 13. Shard Manager: Understanding the generic shard management framework 14. Dapper: Insights into Google’s distributed systems tracing infrastructure 15. Flink: A detailed look at the uniﬁed architecture of stream and batch processing 16. A Comprehensive Survey on Vector Databases 17. Zanzibar: A look at the design, implementation and deployment of a global system for managing access control lists at Google 18. Monarch: Architecture of Google’s in-memory time series database 19. Thrift: Explore the design choices behind Facebook’s code-generation tool 20. Bitcoin: The ground-breaking introduction to the peer-to-peer electronic cash system 21. WTF - Who to Follow Service at Twitter: Twitter’s (now X) user recommendation system 22. MyRocks: LSM-Tree Database Storage Engine 23. GoTo Considered Harmful 24. Raft Consensus Algorithm: To learn about the more understandable consensus algorithm 25. Time Clocks and Ordering of Events: The extremely important paper that explains the concept of time and event ordering in a distributed system Over to you: I’m sure we missed many important papers. Which ones do you think should be included? -- Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): https://t.co/eVEdOFSYPY

354

492

36K

walidchergui retweeted

Geronimo

@artyshowboy

about 1 year ago

Le 3 avril 1965, Ennio Morricone enregistre ce chef-d’œuvre à Rome. Il y a soixante ans. La BO d’une vie. La mienne. For A Few Dollars More | Ennio Morricone | 1965.

241

10K

416K

walidchergui retweeted

Vlad Minaev

@Vlad_Minaev

about 1 year ago

Did you know you can already turn @WebStormIDE (and other JetBrains IDEs) into an #MCP server? That means Claude Desktop (and other hosts) can deeply integrate - read/add files, launch run configs, toggle breakpoints etc... It already looks insane in some scenarios. Still needs polishing, but the potential?🤯

walidchergui retweeted

Bilgin Ibryam

@bibryam

about 1 year ago

Free e-books by 37signals https://t.co/XHmarxpguX

walidchergui retweeted

Olivier Poncet 🦝 @ponceto91

over 1 year ago

Awesome CTO est un dépôt avec une curation importante de contenus à destination des CTO, mais pas que, touchant au développement, le recrutement, les startups, etc ... ⬇️ https://t.co/lJ6I9hNVbH

walidchergui retweeted

Distributed Bytes @DistribSystems

over 1 year ago

Best resources to learn about data and distributed systems https://t.co/lAyPGJAIUh

604

875

28K

walidchergui retweeted

InfoQ @InfoQ

over 1 year ago

In this #InfoQ talk, Liz Rice uses demos and examples to explain how #eBPF works, and why the ability to customize the kernel’s behavior leads to such powerful and efficient capabilities. 🔗 Watch the video now: https://t.co/dOUP68TIMR #SoftwareDevelopment #QConLondon

Walid Chergui

@walidchergui

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users