Building a Bulletproof PostgreSQL Cluster with Patroni, etcd, and PGBackrest

Building a Bulletproof PostgreSQL Cluster: My Go-To High Availability Setup

Good morning everyone! Dimitri Bellini here, back on Quadrata, my channel dedicated to the open-source world and the IT topics I love – and hopefully, you do too!

Thanks for tuning in each week. If you haven’t already, please hit that subscribe button and give this video a thumbs up – it really helps!

Today, we’re diving into a crucial topic for anyone running important applications, especially (but not only!) those using Zabbix: database resilience and performance. Databases are often the heart of our applications, but they can also be the source of major headaches – slow queries, crashes, data loss. Ensuring your database is robust and performs well is fundamental.

Why PostgreSQL and This Specific Architecture?

A few years back, we made a strategic decision to shift from MySQL to PostgreSQL. Why? Several reasons:

  • The community and development activity around Postgres seemed much more vibrant.
  • It felt like a more “serious,” robust database, even if maybe a bit more complex to configure initially compared to MySQL’s out-of-the-box readiness.
  • For applications like Zabbix, which heavily utilize the database, especially in complex setups, having a reliable and performant backend is non-negotiable. Avoiding database disasters and recovery nightmares is paramount!

The architecture I’m showcasing today isn’t just for Zabbix; it’s a solid foundation for many applications needing high availability. We have clients using this exact setup for various purposes.

The Core Components

The solution we’ve settled on combines several powerful open-source tools:

  • PostgreSQL: The core relational database.
  • Patroni: A fantastic template for creating a High Availability (HA) PostgreSQL cluster. It manages the Postgres instances and orchestrates failover.
  • etcd: A distributed, reliable key-value store. Patroni uses etcd for coordination and sharing state information between cluster nodes, ensuring consensus.
  • PGBackrest: A reliable, feature-rich backup and restore solution specifically designed for PostgreSQL.
  • HAProxy (Optional but Recommended): A load balancer to direct application traffic to the current primary node seamlessly.

How It Fits Together

Imagine a setup like this:

  • Multiple PostgreSQL Nodes: Typically, at least two nodes running PostgreSQL instances.
  • Patroni Control: Patroni runs on these nodes, monitoring the health of Postgres and managing roles (leader/replica).
  • etcd Cluster: An etcd cluster (minimum 3 nodes for quorum – one can even be the backup server) stores the cluster state. Patroni instances consult etcd to know the current leader and overall cluster health.
  • PGBackrest Node: Often one of the etcd nodes also serves as the PGBackrest repository server, storing backups and Write-Ahead Logs (WALs) for point-in-time recovery. Backups can be stored locally or, even better, pushed to an S3-compatible object store.
  • Load Balancer: HAProxy (or similar) sits in front, checking an HTTP endpoint provided by Patroni on each node to determine which one is the current leader (primary) and directs all write traffic there.

This creates an active-standby (or active-passive) cluster. Your application connects to a single endpoint (the balancer), completely unaware of which physical node is currently active. HAProxy handles the redirection automatically during a switchover or failover.

Key Advantages of This Approach

  • True High Availability: Provides a really bulletproof active-standby solution.
  • Easy Balancer Integration: Uses simple HTTP checks, avoiding the complexities of virtual IPs (VIPs) and Layer 2 network requirements often seen in traditional clustering (like Corosync/Pacemaker), making it great for modern Layer 3 or cloud environments.
  • “Simple” Configuration (Relatively!): Once you grasp the concepts, configuration is largely centralized in a single YAML file per node (patroni.yml).
  • Highly Resilient & Automated: Handles node failures, switchovers, and even node reintegration automatically.
  • Powerful Backup & Recovery: PGBackrest makes backups and, crucially, Point-in-Time Recovery (PITR) straightforward (again, “straightforward” for those familiar with database recovery!).
  • 100% Open Source: No licensing costs or vendor lock-in. Test it, deploy it freely.
  • Enterprise Ready & Supportable: These are mature projects. For production environments needing formal support, companies like Cybertec PostgreSQL (no affiliation, just an example we partner with) offer commercial support for this stack. We at Quadrata can also assist with first-level support and implementation.

In my opinion, this architecture brings PostgreSQL very close to the robustness you might expect from expensive proprietary solutions like Oracle RAC, but using entirely open-source components.

Let’s See It In Action: A Practical Demo

Talk is cheap, right? Let’s walk through some common management tasks and failure scenarios. In my lab, I have three minimal VMs (2 vCPU, 4GB RAM, 50GB disk): two for PostgreSQL/Patroni (node1, node2) and one for PGBackrest/etcd (backup-node). Remember, 3 nodes is the minimum for a reliable etcd quorum.

1. Checking Cluster Status

The primary command is patroni ctl. Let’s see the cluster members:

$ patronictl -c /etc/patroni/patroni.yml list
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Leader | running | 10 | |
| node2 | 10.0.0.2| Replica| running | 10 | 0 |
+--------+---------+--------+---------+----+-----------+

Here, node1 is the current Leader (primary), and node2 is a Replica, perfectly in sync (Lag 0 MB) on the same timeline (TL 10).

2. Performing a Manual Switchover

Need to do maintenance on the primary? Let’s gracefully switch roles:

$ patronictl -c /etc/patroni/patroni.yml switchover
Current cluster leader is node1
Available candidates for switchover:
1. node2
Select candidate from list [1]: 1
When should the switchover take place (e.g. 2023-10-27T10:00:00+00:00) [now]: now
Are you sure you want to switchover cluster 'my_cluster', leader 'node1' to member 'node2'? [y/N]: y
Successfully switched over to "node2"
... (Check status again) ...
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Replica| running | 11 | 0 |
| node2 | 10.0.0.2| Leader | running | 11 | |
+--------+---------+--------+---------+----+-----------+

Patroni handled demoting the old leader, promoting the replica, and ensuring the old leader started following the new one. Notice the timeline (TL) incremented.

3. Simulating a Primary Node Failure

What if the primary node just dies? Let’s stop Patroni on node2 (the current leader):

# systemctl stop patroni (on node2)

Now, check the status from node1:

$ patronictl -c /etc/patroni/patroni.yml list
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Leader | running | 12 | |
| node2 | 10.0.0.2| | stopped | | unknown |
+--------+---------+--------+---------+----+-----------+

Patroni automatically detected the failure and promoted node1 to Leader. When node2 comes back online (systemctl start patroni), Patroni will automatically reintegrate it as a replica.

4. Recovering a Destroyed Node

What if a node is completely lost? Data disk corrupted, VM deleted? Let’s simulate this on node2 (assuming node1 is currently the leader):

# systemctl stop patroni (on node2)
# rm -rf /var/lib/patroni/data # Or wherever your PG data directory is
# systemctl start patroni (on node2)

Watching the Patroni logs on node2, you’ll see it detects it has no data and initiates a `pg_basebackup` (or uses PGBackrest if configured) from the current leader (node1) to rebuild itself from scratch. Checking patroni ctl list shows its state transitioning through `creating replica` to `running` as a replica again, all automatically!

5. Point-in-Time Recovery (PITR) – The Real Lifesaver!

This is why I made the video! Recently, a bad deployment caused data corruption. We needed to restore to a state just *before* the incident. Here’s how PGBackrest and Patroni help.

Scenario: I accidentally deleted all records from a critical table.

psql> SELECT COUNT(*) FROM my_table; -- Shows 1000 rows
psql> DELETE FROM my_table;
psql> SELECT COUNT(*) FROM my_table; -- Shows 0 rows! Disaster!

Recovery Steps:

  1. STOP PATRONI EVERYWHERE: This is critical. We need to prevent Patroni from interfering while we manipulate the database state manually.

    # systemctl stop patroni (on ALL nodes: node1, node2)

  2. Identify Target Time/Backup: Use PGBackrest to find the backup and approximate time *before* the data loss.

    $ pgbackrest --stanza=my_stanza info 
    ... (Find the latest FULL backup timestamp, e.g., '2023-10-27 11:30:00') ...

  3. Perform Restore on the (Ex-)Leader Node: Go to the node that *was* the leader (let’s say node1). Run the restore command, specifying the target time. The `–delta` option is efficient as it only restores changed files.

    $ pgbackrest --stanza=my_stanza --delta --type=time --target="2023-10-27 11:30:00" --target-action=pause restore

    (Note: `–target-action=pause` or `promote` might be needed depending on your exact recovery goal. For simplicity here, let’s assume we want to stop recovery at that point). Check PGBackrest docs for specifics. The video used a slightly different target specification based on the backup label.)

    Correction based on Video: The video demonstrated restoring to the end time of a specific full backup. A more typical PITR might use `–type=time` and a specific timestamp like `YYYY-MM-DD HH:MM:SS`. Let’s assume we used the backup label as shown in the video’s logic:

    $ pgbackrest --stanza=my_stanza --delta --set=20231027-xxxxxxF --type=default --target-action=promote restore

    (Replace `20231027-xxxxxxF` with your actual backup label. Using `–target-action=promote` tells Postgres to finish recovery and become promotable immediately after reaching the target.)

  4. Start Postgres Manually (on the restored node): Start the database *without* Patroni first.

    # pg_ctl -D /var/lib/patroni/data start

    PostgreSQL will perform recovery using the restored files and WAL archives up to the specified target. Because we used `–target-action=promote`, it should finish recovery and be ready. If we had used `pause`, we would need `pg_ctl promote`.

  5. Verify Data: Connect via `psql` and check if your data is back!

    psql> SELECT COUNT(*) FROM my_table; -- Should show 1000 rows again!

  6. Restart Patroni: Now that the database is in the desired state, start Patroni on the restored node first, then on the other nodes.

    # systemctl start patroni (on node1)
    # systemctl start patroni (on node2)

    Patroni on `node1` will see it’s a valid database, assert leadership in etcd. Patroni on `node2` will detect it’s diverged (or has no data if we wiped it) and automatically re-sync from the now-restored leader (`node1`).

As you saw, we recovered from a potential disaster relatively quickly because the architecture and tools are designed for this.

Final Thoughts

Setting up this entire stack isn’t trivial – it requires understanding each component. That’s why I didn’t do a full step-by-step configuration in the video (it would be too long!). But I hope showing you *how it works* and its capabilities demonstrates *why* we chose this architecture.

It provides automation, resilience, and recovery options that are crucial for critical systems. Having an organized setup like this, combined with good documentation (please, write down your procedures!), turns stressful recovery scenarios into manageable tasks.

What do you think? Is PostgreSQL with Patroni something you’d consider? Are there comparable HA solutions in the MySQL/MariaDB world you think are as robust or easy to manage? Let me know your thoughts in the comments below!

Don’t forget to check out the Quadrata YouTube channel for more open-source and IT content, and join the discussion on the Zabbix Italia Telegram channel!

That’s all for this episode. A big greeting from me, Dimitri, and see you in the next one. Bye everyone!

Leave a comment

Your email address will not be published. Required fields are marked *