PostgreSQL DB data Replication

PostgreSQL, also known as Postgres, is a widely used database engine globally. It shares similarities with relational databases such as MySQL, MSSQL, and Oracle SQL but has some unique features. It is risky to rely on just one copy of your data for your project, so it's important to have a backup stored in a readily accessible location. PostgreSQL Replication can be employed for this purpose. Postgres Replication is a strong feature that can guarantee data availability and aid in disaster recovery efforts.

Let's learn first about PostgreSQL Data Replication.

Postgres Replication

Replication is just transferring data from one location to another location or duplicating the original data in the same location. In the context of PostgreSQL, it would involve maintaining duplicate data from one database in another database and ensuring they remain synchronized constantly. If the initial database experiences a failure, the secondary database must be capable of managing the tasks until the primary database is operational again. PostgreSQL offers various replication techniques, and today we will specifically utilize streaming replication. Streaming replication is used because of its ability to replicate data in real time.

Before we get started let's go through the prerequisites.

Prerequisites

2 Servers with PostgreSQL 16 installed
Private/Public Network connectivity between both servers

Setting Up

First, we will set up the first server which will act as the primary server.

Primary Server Setup

In this Postgres instance, we will configure WAL Archiving and Streaming, set up a user for replication and configure network connectivity.

Let's get started by setting up WAL Archiving

For WAL Archiving we'll configure the Postgres instance with the necessary parameters. let's edit the postgresql.conf file.

# /etc/postgresql/16/main/postgresql.conf

# Enable WAL archiving
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/16/main/archive/%f'

# Set WAL level to 'replica'
wal_level = replica

# Specify maximum number of concurrent connections from standby servers
max_wal_senders = 3

# Set the maximum number of WAL files created before recycling
wal_keep_size = 16MB

# Set a timeout for replication connections
wal_sender_timeout = 60s

Now we'll create the Replication user, open a SQL query console in the DB and run the query below which creates the user replicator-user and sets the password as example-password. Of course please feel free to adapt the user and password to your liking.

CREATE ROLE replicator-user WITH REPLICATION LOGIN PASSWORD 'example-password';

Finally, we'll configure network connectivity settings. For this, edit the pg_hba.conf file

# /etc/postgresql/16/main/pg_hba.conf

# Allow replication connections from standby server
host replication replicator 192.168.1.19/24 md5

Now that the first server configuration is ready, let's work on the second server.

Configuring the Second server

Before getting started make sure that the Postgres instance in the second server is down and all data is cleared. We'll now restore a current backup of the primary server database into the secondary server's database instance.

pg_basebackup -h 192.168.1.18 -D /var/lib/postgresql/16/main -U replicator -Fp -Xs -P

With the command above we will have a working backup of the primary DB server in our secondary DB server.

Next, we need to configure the recovery settings. For this, we'll have to create a signal file which will ensure that the DB server stays in standby mode and also the postgres.conf file will be populated with details to the primary server so that the secondary DB can connect to the first DB server to start and ensure consistent replication.

Let's create the signal file

touch /var/lib/postgresql/16/main/standby.signal

This will create the signal file for keeping the server on standby mode.

Now we'll create and populate the postgresql.auto.conf file

touch /var/lib/postgresql/16/main/postgresql.auto.conf

# /var/lib/postgresql/16/main/postgresql.auto.conf

primary_conninfo = 'host=192.168.1.18 port=5432 user=replicator-user password=example-password application_name=standby'

Now restart the stopped server instance, if the configuration has gone well then you should see that the secondary db instance is running healthy.

Checking Replication Status

To check if the replication is running or not run the following query on the primary server.

select * from pg_stat_replication;

The rows should return a column with the value active in it.

To check if the secondary server is receiving the replication data, run the following SQL query in the secondary server console.

select * from pg_stat_wal_receiver;

Alternatively, you can run the following query in the primary server to check if the replication data is being sent.

select * from pg_stat_wal_sender;

Synchronous Replication

What if your primary server goes down or Postgres reports that replication was successful but the secondary server does not have all the data?

To ensure that the primary and secondary servers are in sync we should enable synchronous replication. To enable, simply update the postgresql.conf file with the following config:

# /etc/postgresql/16/main/postgresql.conf

synchronous_standby_names = 'standby'

This should ensure that the data is available in both servers by ensuring that SQL query transactions are committed in both servers.

If the primary DB goes down then simply running the following command will promote the secondary server into the primary role and let it handle all connections.

pg_ctl promote -D /var/lib/postgresql/16/main

If you want to promote the old primary back to primary then run the command again on the now primary server but formerly secondary server.

Thank you for reading this article. See you in the next one.

PostgreSQL DB Data Replication

Nischal Subedi