RajeshVallarapu @dataarchipelago - Twitter Profile

Pinned Tweet

6 months ago

The Active-Active Fantasy: Why Postgres Can't Cheat Physics In the world of distributed systems, "Multi-Master" is the fantasy we tell ourselves to sleep at night. We want to believe we can cheat the fundamental trade-off between Latency and Coordination. We want to write to a database in New York and a database in Tokyo simultaneously, have them both accept the write instantly (Low Latency), and somehow end up with the exact same data (Consistency) without waiting for light to travel across the Pacific. We can’t. Tools like pgEdge (and its underlying engine, Spock) are incredible pieces of engineering, but they are not magic. They are an architectural negotiation with the laws of physics. To use them correctly, you must stop thinking of them as features and start thinking of them as compromises. Here is the deep mechanism of why multi-master Postgres works, and more importantly, why it breaks the way it does. Medium version: https://t.co/GGsoqJSrjc The Tyranny of the Log To understand the problem, you have to understand the "soul" of PostgreSQL. PostgreSQL is built on a single, rigid timeline: the Write-Ahead Log (WAL). In a standard single-node setup, this log is the absolute truth. It records physical changes to physical disk blocks (e.g., "Write bytes X to Page Y"). When you set up standard streaming replication, you are shipping this physical truth to a replica. The replica applies it blindly at the exact same byte offsets. This is why native multi-master is architecturally unsupported in the standard model. You cannot have Node A write to "Page 45" and Node B write to "Page 45" at the same time. Merging two divergent physical timelines is practically impossible without corruption because the storage engine lacks commutative merge semantics. The Loophole: Trading Truth for Opinion So how does Spock do it? It stops dealing in "Physical Truth" and starts dealing in "Logical Opinion." It uses Logical Decoding. Instead of shipping raw bytes ("Change offset 0x4F"), it decodes the WAL into a stream of semantic change events (INSERT ID=1, UPDATE NAME='Alice'). This is the loophole. It allows Node A and Node B to have different physical disk states but converge on the same data. But by decoupling the data from the disk, we lose the safety net of the Global Lock Manager. We enter the realm of Distributed Uncertainty. The Architecture of "Fragile Convergence" Most people loosely describe multi-master Postgres as an AP (Available) system. This is dangerously imprecise. A true AP system (like a leaderless Cassandra setup) acts as a "write sponge"—it prioritizes accepting the write above all else, often resolving conflicts later via Last-Write-Wins or vector clocks. Spock is different. It creates a state I call Fragile Convergence. Fragile Convergence means the system moves toward a unified state, but unlike a true AP system, it enforces rigid constraints that can cause the mesh to seize up. 1. The Atomicity Illusion (Why Foreign Keys Break) In a single Postgres node, transactions are ACID. If you delete a User and insert an Order for them in the same transaction, the database guarantees the Order will never exist without the User. Logical replication preserves this atomicity locally (the transaction applies all-or-nothing), but it loses Global Serializability. The Failure Mode: Conflicts are evaluated row-by-row, not transaction-by-transaction across the cluster. If Node A deletes a User and Node B inserts an Order for that user, both commit locally. The Result: When Node A's delete arrives at Node B, it succeeds. When Node B's insert arrives at Node A, it fails because the User is gone. The Dilemma: Postgres enforces constraints synchronously. It screams "Foreign Key Violation." Depending on your spock.exception_behaviour setting, you face a hard choice: Halt Replication (Protective Pause): The worker enters a retry loop. You keep Consistency, but lose Availability (the mesh stops). Discard Data (transdiscard): You drop the conflicting transaction. You keep Availability, but lose Consistency (and data). You haven't just lost a row; you've hit the hard wall of the CAP theorem. 2. The "Last Update Wins" Casino When conflicts happen (and they will), Spock needs a tie-breaker. The default is last_update_wins. It compares the commit timestamps of the conflicting rows. This sounds fair, but it relies on a hidden assumption: Time is uniform. It isn't. Postgres does not use a distributed logical clock (like Spanner's TrueTime). It uses the local system clock. If Node A's clock is drifting 50ms behind Node B, Node B will "win" every conflict in that window, even if Node A's write happened physically later. You are effectively outsourcing your data integrity to NTP (Network Time Protocol). 3. The Limits of Delta-Apply (CRDTs) For things like bank balances, last_update_wins is catastrophic (overwriting a balance instead of merging it). Spock solves this with delta_apply. It looks at the change (+10) rather than the value (=110). The Mechanism: New_Local = Old_Local + (New_Remote - Old_Remote). The Caveat: This is a valid CRDT (specifically a PN-Counter), but it is limited in scope. It handles simple counters well. It does not magically merge complex JSON documents, text fields, or ordered lists. If your application logic requires complex merging, you must build it yourself. Hidden Costs: The "Tax" on Your Architecture When you adopt this architecture, you pay a tax. It’s not a monetary tax; it’s a complexity tax. 1. The Ghost in the Machine (TOAST) PostgreSQL compresses large fields (like big JSON blobs) into "TOAST" tables. To save bandwidth, logical replication often elides these TOAST values if the column itself hasn't changed. The Trap: If a conflict resolution handler requires the full row (e.g., to convert an UPDATE into an INSERT on a missing row), and the update didn't include the TOAST data, the operation may fail due to incomplete data. The Fix: You often must set REPLICA IDENTITY FULL, which forces Postgres to log everything to the WAL. This turns your IOPS graph into a vertical wall. 2. The Identity Crisis You can't use SERIAL (1, 2, 3) for IDs anymore. Two nodes will inevitably generate ID 100 at the same time. You must use Snowflake Sequences—64-bit integers combining time, node ID, and a counter. The Risk: The safety mechanism is you. If you accidentally configure two nodes with the same snowflake.node ID, the system will not warn you during generation. The corruption surface appears only later, when you try to insert those IDs. If your conflict handler is last_update_wins, one of those records will be silently overwritten (unless conflict logging is explicitly enabled), resulting in data loss. The Mental Model Do not think of pgEdge/Spock as "Postgres, but distributed." Think of it as two separate databases that asynchronously email each other change events. Sometimes the email arrives late (Lag). Sometimes the email contradicts what you already know (Conflict). Sometimes the email asks you to do something impossible (FK Violation), and depending on your settings, you either ignore the email (Data Loss) or stop reading emails entirely (Replication Halt). Conclusion Is multi-master Postgres possible? Yes. Is it the right choice for you? Yes, if you need to survive a region failure and can tolerate "eventual" truth, or if you have geographically partitioned your workloads (User A only writes to Node A). No, if you need the database to guarantee that a user's balance never goes below zero in real-time across the globe. The physics of distributed systems are undefeated. When you choose Multi-Master, you are not removing complexity; you are simply moving it from the database engine to three new layers: your Application Logic, your Conflict Resolution Strategy, and your Cluster Topology. Choose wisely.

1

0

427

RajeshVallarapu

@DataArchipelago

1 day ago

@grok @SpencerHakimian So that was supposed to be your response right ?!

1

0

10

RajeshVallarapu

@DataArchipelago

1 day ago

@grok @SpencerHakimian And why is that ? And how ?

1

0

4

RajeshVallarapu

@DataArchipelago

6 months ago

Not every popular extension belongs in the @PostgreSQL core. This article explains why. #Postgres #RelationalDatabases #Extensions #SoftwareArchitecture

0

61

Who to follow

DigiWizard

@drrush7791

Curiosity, creativity and persistence lead us to improvement. Man who works hard to achieve his dreams.

Aimy

@Aimy981

«I am a Derry Girl» 🌈

Rui

@Rui60686496

Trying to learn something about Crypto. Usually not responding to DMs. Sorry, nothing personal.

RajeshVallarapu

@DataArchipelago

6 months ago

https://t.co/ToFQ5sjRUo

1

0

54

RajeshVallarapu

@DataArchipelago

6 months ago

If you want the Too Long - Didn't Read version: Multi-master Postgres works, but only when you accept: • No global atomicity • No consistent clocks • Conflicts are inevitable • FK constraints break globally • CRDTs only solve simple counters • Complexity shifts to the application layer

0

123

RajeshVallarapu

@DataArchipelago

6 months ago

The Active-Active Fantasy: Why Postgres Can't Cheat Physics In the world of distributed systems, "Multi-Master" is the fantasy we tell ourselves to sleep at night. We want to believe we can cheat the fundamental trade-off between Latency and Coordination. We want to write to a database in New York and a database in Tokyo simultaneously, have them both accept the write instantly (Low Latency), and somehow end up with the exact same data (Consistency) without waiting for light to travel across the Pacific. We can’t. Tools like pgEdge (and its underlying engine, Spock) are incredible pieces of engineering, but they are not magic. They are an architectural negotiation with the laws of physics. To use them correctly, you must stop thinking of them as features and start thinking of them as compromises. Here is the deep mechanism of why multi-master Postgres works, and more importantly, why it breaks the way it does. Medium version: https://t.co/GGsoqJSrjc The Tyranny of the Log To understand the problem, you have to understand the "soul" of PostgreSQL. PostgreSQL is built on a single, rigid timeline: the Write-Ahead Log (WAL). In a standard single-node setup, this log is the absolute truth. It records physical changes to physical disk blocks (e.g., "Write bytes X to Page Y"). When you set up standard streaming replication, you are shipping this physical truth to a replica. The replica applies it blindly at the exact same byte offsets. This is why native multi-master is architecturally unsupported in the standard model. You cannot have Node A write to "Page 45" and Node B write to "Page 45" at the same time. Merging two divergent physical timelines is practically impossible without corruption because the storage engine lacks commutative merge semantics. The Loophole: Trading Truth for Opinion So how does Spock do it? It stops dealing in "Physical Truth" and starts dealing in "Logical Opinion." It uses Logical Decoding. Instead of shipping raw bytes ("Change offset 0x4F"), it decodes the WAL into a stream of semantic change events (INSERT ID=1, UPDATE NAME='Alice'). This is the loophole. It allows Node A and Node B to have different physical disk states but converge on the same data. But by decoupling the data from the disk, we lose the safety net of the Global Lock Manager. We enter the realm of Distributed Uncertainty. The Architecture of "Fragile Convergence" Most people loosely describe multi-master Postgres as an AP (Available) system. This is dangerously imprecise. A true AP system (like a leaderless Cassandra setup) acts as a "write sponge"—it prioritizes accepting the write above all else, often resolving conflicts later via Last-Write-Wins or vector clocks. Spock is different. It creates a state I call Fragile Convergence. Fragile Convergence means the system moves toward a unified state, but unlike a true AP system, it enforces rigid constraints that can cause the mesh to seize up. 1. The Atomicity Illusion (Why Foreign Keys Break) In a single Postgres node, transactions are ACID. If you delete a User and insert an Order for them in the same transaction, the database guarantees the Order will never exist without the User. Logical replication preserves this atomicity locally (the transaction applies all-or-nothing), but it loses Global Serializability. The Failure Mode: Conflicts are evaluated row-by-row, not transaction-by-transaction across the cluster. If Node A deletes a User and Node B inserts an Order for that user, both commit locally. The Result: When Node A's delete arrives at Node B, it succeeds. When Node B's insert arrives at Node A, it fails because the User is gone. The Dilemma: Postgres enforces constraints synchronously. It screams "Foreign Key Violation." Depending on your spock.exception_behaviour setting, you face a hard choice: Halt Replication (Protective Pause): The worker enters a retry loop. You keep Consistency, but lose Availability (the mesh stops). Discard Data (transdiscard): You drop the conflicting transaction. You keep Availability, but lose Consistency (and data). You haven't just lost a row; you've hit the hard wall of the CAP theorem. 2. The "Last Update Wins" Casino When conflicts happen (and they will), Spock needs a tie-breaker. The default is last_update_wins. It compares the commit timestamps of the conflicting rows. This sounds fair, but it relies on a hidden assumption: Time is uniform. It isn't. Postgres does not use a distributed logical clock (like Spanner's TrueTime). It uses the local system clock. If Node A's clock is drifting 50ms behind Node B, Node B will "win" every conflict in that window, even if Node A's write happened physically later. You are effectively outsourcing your data integrity to NTP (Network Time Protocol). 3. The Limits of Delta-Apply (CRDTs) For things like bank balances, last_update_wins is catastrophic (overwriting a balance instead of merging it). Spock solves this with delta_apply. It looks at the change (+10) rather than the value (=110). The Mechanism: New_Local = Old_Local + (New_Remote - Old_Remote). The Caveat: This is a valid CRDT (specifically a PN-Counter), but it is limited in scope. It handles simple counters well. It does not magically merge complex JSON documents, text fields, or ordered lists. If your application logic requires complex merging, you must build it yourself. Hidden Costs: The "Tax" on Your Architecture When you adopt this architecture, you pay a tax. It’s not a monetary tax; it’s a complexity tax. 1. The Ghost in the Machine (TOAST) PostgreSQL compresses large fields (like big JSON blobs) into "TOAST" tables. To save bandwidth, logical replication often elides these TOAST values if the column itself hasn't changed. The Trap: If a conflict resolution handler requires the full row (e.g., to convert an UPDATE into an INSERT on a missing row), and the update didn't include the TOAST data, the operation may fail due to incomplete data. The Fix: You often must set REPLICA IDENTITY FULL, which forces Postgres to log everything to the WAL. This turns your IOPS graph into a vertical wall. 2. The Identity Crisis You can't use SERIAL (1, 2, 3) for IDs anymore. Two nodes will inevitably generate ID 100 at the same time. You must use Snowflake Sequences—64-bit integers combining time, node ID, and a counter. The Risk: The safety mechanism is you. If you accidentally configure two nodes with the same snowflake.node ID, the system will not warn you during generation. The corruption surface appears only later, when you try to insert those IDs. If your conflict handler is last_update_wins, one of those records will be silently overwritten (unless conflict logging is explicitly enabled), resulting in data loss. The Mental Model Do not think of pgEdge/Spock as "Postgres, but distributed." Think of it as two separate databases that asynchronously email each other change events. Sometimes the email arrives late (Lag). Sometimes the email contradicts what you already know (Conflict). Sometimes the email asks you to do something impossible (FK Violation), and depending on your settings, you either ignore the email (Data Loss) or stop reading emails entirely (Replication Halt). Conclusion Is multi-master Postgres possible? Yes. Is it the right choice for you? Yes, if you need to survive a region failure and can tolerate "eventual" truth, or if you have geographically partitioned your workloads (User A only writes to Node A). No, if you need the database to guarantee that a user's balance never goes below zero in real-time across the globe. The physics of distributed systems are undefeated. When you choose Multi-Master, you are not removing complexity; you are simply moving it from the database engine to three new layers: your Application Logic, your Conflict Resolution Strategy, and your Cluster Topology. Choose wisely.

1

0

427

RajeshVallarapu

@DataArchipelago

6 months ago

🚀 Kicking off my journey on X. I’ve spent the last decade living in #Postgres from “impossible” prod incidents to multi-TB migrations. Now I’m deep into #dataresearch and analysis across #blockchain and big-#techinfra. If that’s your world too, let’s connect. #data

0

107

RajeshVallarapu

@DataArchipelago

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users