I am pleased to announce the release of a new #Haskell#ETL Library, namely DBFunctor-0.1.0.0.
For more information please check out the project homepage:
https://t.co/ALiBv8fcg7
7/7 Paper, proofs & appendices:
https://t.co/EKHgdtaWFb
Feedback from the DB-theory and data-engineering folks very welcome. 🙏
#dataengineering#databases#typetheory
6/7 It's a mathematical proof, not a test: correctness for ALL inputs, from the type structure alone — without running a single query (zero compute cost on cloud warehouses).
All theorems machine-checked in Lean 4; cross-validated against PostgreSQL.
5/7 That atom determines how two data types integrate (their entity key) and the canonical way to read/write their elements (behavioral class).
And it scales: grain propagates through a whole pipeline DAG — hundreds of operations — verifying the output grain matches the target.
4/7 New preprint: we make grain formal.
Grain theory is a type-theoretic framework that defines grain on *any* algebraic data type — relational data, but also recursive types (lists, trees) and streams of unbounded data.
Grain is the atom of a data type.
3/7 Fan traps & chasm traps (silent data loss) are just symptoms of one problem: a transformation that misaligns grain — the level of detail of data.
Grain's been around since Kimball, but only informally: prose, fact tables only, no rules for how transformations change it.
2/7 Join is on customer+date, but the tables sit at customer×channel×date vs customer×product×date. So it cross-products the unmatched parts, duplicating rows before aggregation — a fan trap.
Invisible to schema checks, type checks, and small-data unit tests.
1/7 This SQL query compiles, runs, and returns the right-shaped result.
It also silently inflates both metrics — in production, not in your tests.
The reason: the two tables have different *grains*. 🧵
AI is writing a growing share of the world's software. No one is formally verifying any of it.
New essay: "When AI Writes the World's Software, Who Verifies It?"
https://t.co/8zjS9FkdA8
excerpt from a longer paper
Ensuring safety for powerful learned systems requires a fundamentally different foundation based on mathematically provable constraints on the acts an AI may perform. Such a foundation must rest on a simple principle: we should never trust an AI’s outputs or intentions by default, no matter how competent or aligned it appears; trust must be earned only through verifiable, enforceable proofs of safety for each act.
Our assumption that enacting unsafe AI acts is worse than rejecting safe AI acts leads to the central premise of our work, the Universal Declaration of AI Acts: No AI act may be treated as safe unless harmlessness is proven mathematically.
Note that this is the mathematical dual of the Universal Declaration of Human Rights: No person may be treated as guilty unless guilt is proven, which is based on the assumption that punishing an innocent person is worse than letting a guilty person go.
@catalinmpit Reduce your sleep hours and work very early in the morning or late at night, or both. In any case you are going to suffer so you must be really determined. Don’t do this everyday to retain some balance.
You don’t hate math.
You hate the way it was taught to you.
But because you haven’t learned math properly, you confuse correlation and causation, and therefore think that you hate math.
Nice blogpost by @muratdemirbas on the foundational treatment of serializability theory in databases from Chapter 2 of the book Concurrency Control and Recovery in Database Systems (1987) by Bernstein, Hadzilacos, and Goodman.
I started reading the post and after a bit of exploration found this foundational 1979 paper on Serializability of Concurrent Database Updates by Christos H Papadimitriou. This paper has been referenced in Bernstein's book as well. Looks like Papadimitriou has also written quite a lot on the theory of database concurrency control ..
Looking forward to some weekend readings ..
links to blog post and the paper: 👇
🏆 The ACM SIGMOD Test of time award (2025) goes to
K-shape: Efficient and accurate clustering of time series
John Paparrizos, Luis Gravano
https://t.co/BnRx4JPNAW
This paper covering the internals and architecture of @ClickHouseDB is one of the best in database architecture that I read in 2024.
Some great insights on SIMD and Multicore parallelisation of the query processing layer, query compilation based on LLVM, the various data structures for aggregation and hash joins, data pruning techniques in the storage layer and lots and lots of information related to the architecture.
Loved it ..
The first two videos for @CMUDB's latest seminar series on Database Building Blocks are now posted. You should start off with @andrewlamb1111's fantastic introductory overview to @ApacheDataFusio: https://t.co/9Es7lVLTsy
I am thrilled to announce that my book, Functional Design and Architecture, has just been released by @ManningBooks!
FINALLY RELEASED!!!
😃😃😃😄😊😊❤️❤️❤️❤️❤️❤️
This has been a long journey, and I sincerely hope this book will make a significant contribution to the functional programming world. 🎇🎇
❓ Who is this book for? It's for all developers interested in practical functional programming. This book is useful for software architects, senior developers, and everyone else. The model language is Haskell, but the ideas are universal and applicable to languages like Scala, OCaml, F#, and even C# and C++.
❓What is this book about? It’s about applying an engineering approach to functional programming. Design patterns, design principles, application architectures, best practices, approaches, and deep ideas—all combined into a comprehensive and highly consistent methodology for building real-world applications.
📔 Functional Design and Architecture is structured, consistent, well-written, and approachable. I’ve made a special effort to ensure the content is accessible to a wide audience. The narrative is engaging, free of jargon and complex mathematics, and progresses in a friendly, gradual manner.
💡 The ideas are universal; some were known before, but many were developed throughout this project. There was a significant knowledge gap, and this book covers much of it for our benefit.
A titanic amount of work went into this book. Specifically, the following were created:
🟠 A full-fledged application framework, Hydra;
🔴 A proof-of-concept platform for creating spaceship management scenarios;
🟡 The methodology of Functional Declarative Design, covering various aspects of design and software architecture in functional programming;
🟢 A unique architectural approach, Hierarchical Free Monads;
🔵 A multitude of new design patterns, approaches, and practices, in addition to those that already existed;
🟣 Several demo applications, included both in the book and in the Hydra framework;
🟤 A wealth of accompanying material: articles, talks, and side projects;
⚪️ And of course, these ideas have been successfully tested in practice in several places.
You'll also find many links to other valuable resources in this book because the subject is very broad and deep. I am especially grateful to all those who initiated this movement toward the practical application of functional programming. I stand on the shoulders of giants and deeply appreciate their contributions. On the cover, you’ll find testimonials from these distinguished individuals:
🟡 @ScottWlaschin, author of Domain Modeling Made Functional (Pragmatic Bookshelf)
🟤 @debasishg, author of Functional and Reactive Domain Modeling and DSLs in Action (both from Manning)
🟣 @VBragilevsky@_bravit , author of Haskell in Depth (Manning)
I hope you enjoy the book as much as I enjoyed writing it.
Bon voyage!
Learning Haskell/FP often expands people’s minds. In a world dominated by Java + Python intro courses, being forced to see computation in a new way is very empowering. There’s almost no downside to learning these techniques and having them in your toolbox.