PostgreSQL 10 Streaming Physics and Logic Master-Slave Best Practice

Keywords: PostgreSQL Database Linux network

Label

PostgreSQL, stream replication, master-slave, logical subscription

background

Origin of stream replication

PostgreSQL has supported streaming physical replication since the release of version 9.0 in 2010. Users can build read-only standby through streaming replication (master-standby physical replication, block-level consistency). Streaming physical replication can achieve very low latency (usually within 1 millisecond).

Synchronized stream replication

Version 9.1, released in 2011, supports synchronous replication, when only one synchronous stream replication standby node was supported (for example, three standby nodes were configured, only one is synchronous mode, and the others are asynchronous mode).

In the synchronous replication mode, when a user submits a transaction, it is necessary to wait for the WAL log of the transaction to be replicated to the standby node of the synchronous stream replication before returning the ACK that submitted successfully to the client.

In synchronization mode, 0 loss of data can be ensured. (As long as the client receives the ACK with successful transaction submission, there are two WAL s for this transaction.)

Cascade stream replication

Version 9.2, launched in 2012, supports cascade stream replication. This means that the reserve can also be connected to the reserve.

Cascade replication is especially suitable for the use of cross-room, such as the main library in A room and the standby in B room, but when B room needs to establish more than one standby, then B room only needs to establish a standby directly connected to the main library, and other standby can be generated through the standby cascade in B room. This reduces network overhead.

Flow Virtual Reserve

Version 9.2, released in 2012, supports not only cascading replication, but also virtual standby. What is virtual standby? It's just WAL, there's no backup for data files.

Through virtual backup, WAL can be received streamwise, and real-time streaming WAL archiving can be carried out. Improve real-time backup or archiving.

The Basis of Logical Replication

Version 9.4, launched in 2014, adds the basic information needed for logical replication in WAL. Logical replication can be achieved through plug-ins.

Logical replication enables partial replication of the main library, such as table-level replication, rather than block-level consistent replication of the entire cluster.

Logically replicated backup libraries are not only read-only, but also can perform write operations.

Adding several levels of synchronization

Version 9.6, released in 2016, streaming replication of PG, achieves synchronization by replicating WAL, so synchronization level is also related to WAL. The synchronous_commit parameter allows you to configure the level of synchronization of transactions.

1. on, representing the local WAL fsync, synchronizing standby WAL fsync. Two persistent WALs.

2. remote_apply, indicating the local WAL fsync, and the synchronous standby WAL has been restored. This brings the highest RT.

3. remote_write, which means local WAL fsync, is completed synchronously by standby WAL asynchronous write. A persistent, standby WAL may still be in OS CACHE.

4. Local, which means local WAL fsync.

5. off, which means that the local WAL is written to the wal buffer and returns the successful ACK of client transaction submission for asynchronous submission (database CRASH may result in transaction loss, but does not lead to database inconsistency).

The impact of RT from low to high is as follows:

off, local, remote_write, on, remote_apply.

Stream Backup Compression

Ten versions released in 2017, pg_basebackup and pg_receivewal support streaming compression backup WAL.

quorum based synchronous stream replication

The 10 versions released in 2017 support synchronous stream replication of quorum based. For example, there are three standby nodes. You can tell the main library that this transaction requires three WAL replicas. Then the main library needs to wait for feedback that at least two standby nodes have synchronized the WAL in the past before returning the ACK that submitted the transaction successfully to the client.

quorum based synchronous stream replication, combined with raft protocol, can achieve a high availability and high reliability architecture with zero data loss.

Built-in logical subscription and replication

The 10 versions launched in 2017 have built-in logical subscription functions.

Multi-master

In the 10 versions released in 2017, multi-Master architecture can be realized through logical subscription.

I. Master-slave Best Practice of Flow Physics

Taking a master-slave, quorum-based synchronization (one copy) as an example, this paper introduces the master-slave best practices of PG10.

Environmental Science

Three machines (assuming that the host has 644 memory) are interconnected with each other in the same LAN (at least the database listening ports should be interconnected).

Linux CentOS 7.x x64

HOSTA: Listen on port 1921

HOSTB: Listen on port 1921

HOSTC: Listen on port 1921

(This paper uses a physical machine to simulate, IP 127.0.0.1, ports 2921, 2922, 2923 respectively. Readers please modify according to the actual environment.)

Software Installation Sketch

Please refer to

PostgreSQL on Linux Best Deployment Manual

Initialize the main library

mkdir /disk1/digoal/pgdata2921  
  
chown digoal /disk1/digoal/pgdata2921  
  
initdb -D /disk1/digoal/pgdata2921 -E UTF8 --locale=C -U postgres  

Configure postgresql.conf

cd /disk1/digoal/pgdata2921  
  
vi postgresql.conf  
  
listen_addresses = '0.0.0.0'  
port = 2921  
max_connections = 1000  
unix_socket_directories = '.'  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 8GB  
maintenance_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 500  
bgwriter_lru_multiplier = 5.0  
bgwriter_flush_after = 0  
effective_io_concurrency = 0  
max_worker_processes = 16  
backend_flush_after = 0  
wal_level = replica  
fsync = on  
synchronous_commit = remote_write  
full_page_writes = on  
wal_buffers = 128MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 0  
checkpoint_timeout = 30min  
max_wal_size = 16GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.5  
checkpoint_flush_after = 0  
max_wal_senders = 10  
wal_keep_segments = 1024  
synchronous_standby_names = 'ANY 1 (*)'  
hot_standby = on  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
wal_receiver_status_interval = 1s  
hot_standby_feedback = off  
log_destination = 'csvlog'  
logging_collector = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose      
log_timezone = 'PRC'  
log_autovacuum_min_duration = 0  
autovacuum_vacuum_scale_factor = 0.1  
autovacuum_analyze_scale_factor = 0.1  
autovacuum_freeze_max_age = 1000000000  
autovacuum_multixact_freeze_max_age = 1200000000  
autovacuum_vacuum_cost_delay = 0  
autovacuum_vacuum_cost_limit = 0  
vacuum_freeze_table_age = 800000000  
vacuum_multixact_freeze_table_age = 800000000  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  

Configure pg_hba.conf

cd /disk1/digoal/pgdata2921  
  
vi pg_hba.conf  
  
# "local" is for Unix domain socket connections only  
local   all             all                                     trust  
# IPv4 local connections:  
host    all             all             127.0.0.1/32            trust  
# IPv6 local connections:  
host    all             all             ::1/128                 trust  
# Allow replication connections from localhost, by a user with the  
# replication privilege.  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  
host replication all 0.0.0.0/0 md5  

Configure recovery.done

cd /disk1/digoal/pgdata2921  
  
vi recovery.done  
  
recovery_target_timeline = 'latest'  
standby_mode = on  
primary_conninfo = 'host=127.0.0.1 port=2921 user=rep password=pwd'  
# recovery_min_apply_delay = 0   #How many minutes of application latency, users can configure a delayed backup, such as a bit of misoperation buffer time. It won't be used so early in the backup.  

Start the main library

pg_ctl start -D /disk1/digoal/pgdata2921  

Create stream replication roles

psql -h 127.0.0.1 -p 2921  
  
psql (10beta1)  
Type "help" for help.  
  
postgres=# set synchronous_commit =off;  
SET  
postgres=# create role rep login replication encrypted password 'pwd';  
CREATE ROLE  

Generating Standby 1

mkdir /disk1/digoal/pgdata2922  
chown digoal /disk1/digoal/pgdata2922  
chmod 700 /disk1/digoal/pgdata2922  
  
export PGPASSWORD="pwd"  
pg_basebackup -D /disk1/digoal/pgdata2922 -F p -X stream -h 127.0.0.1 -p 2921 -U rep  

Configure standby 1 postgresql.conf

cd /disk1/digoal/pgdata2922  
  
vi postgresql.conf  
  
port = 2922  

Configure Standby 1 recovery.conf

cd /disk1/digoal/pgdata2922  
  
mv recovery.done recovery.conf  

Start up Standby 1

pg_ctl start -D /disk1/digoal/pgdata2922  

Generating Standby 2

mkdir /disk1/digoal/pgdata2923  
chown digoal /disk1/digoal/pgdata2923  
chmod 700 /disk1/digoal/pgdata2923  
  
export PGPASSWORD="pwd"  
pg_basebackup -D /disk1/digoal/pgdata2923 -F p -X stream -h 127.0.0.1 -p 2921 -U rep  

Configure standby 2 postgresql.conf

cd /disk1/digoal/pgdata2923  
  
vi postgresql.conf  
  
port = 2923  

Configure Standby 2 recovery.conf

cd /disk1/digoal/pgdata2923  
  
mv recovery.done recovery.conf  

Start up Standby 2

pg_ctl start -D /disk1/digoal/pgdata2923  

Status Monitoring of Flow Replication Nodes

Main database query

psql -h 127.0.0.1 -p 2921  
psql (10beta1)  
Type "help" for help.  
  
postgres=# \x  
Expanded display is on.  
  
postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay,   
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay,   
  *  
from pg_stat_replication;  
  
-[ RECORD 1 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 0 bytes  
flush_delay      | 0 bytes  
replay_delay     | 0 bytes  
pid              | 11962  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63083  
backend_start    | 2017-07-11 17:15:31.231492+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/C0000060  
write_lsn        | 1/C0000060  
flush_lsn        | 1/C0000060  
replay_lsn       | 1/C0000060  
write_lag        |   
flush_lag        |   
replay_lag       |   
sync_priority    | 1  
sync_state       | quorum  
-[ RECORD 2 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 0 bytes  
flush_delay      | 0 bytes  
replay_delay     | 0 bytes  
pid              | 11350  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63077  
backend_start    | 2017-07-11 17:15:13.818043+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/C0000060  
write_lsn        | 1/C0000060  
flush_lsn        | 1/C0000060  
replay_lsn       | 1/C0000060  
write_lag        |   
flush_lag        |   
replay_lag       |   
sync_priority    | 1  
sync_state       | quorum  

Standby query

psql -h 127.0.0.1 -p 2922  
  
-- View the current WAL Does the application pause?  
postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  
  
-- See WAL Received site  
postgres=# select pg_last_wal_receive_lsn();  
 pg_last_wal_receive_lsn   
-------------------------  
 1/C0000060  
(1 row)  
  
-- See WAL Application site  
postgres=# select pg_last_wal_replay_lsn();  
 pg_last_wal_replay_lsn   
------------------------  
 1/C0000060  
(1 row)  
  
-- See wal receiver Statistical information  
postgres=# \x  
Expanded display is on.  
postgres=# select * from pg_stat_get_wal_receiver();  
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------  
pid                   | 11349  
status                | streaming  
receive_start_lsn     | 1/C0000000  
receive_start_tli     | 1  
received_lsn          | 1/C0000060  
received_tli          | 1  
last_msg_send_time    | 2017-07-11 17:23:14.372327+08  
last_msg_receipt_time | 2017-07-11 17:23:14.372361+08  
latest_end_lsn        | 1/C0000060  
latest_end_time       | 2017-07-11 17:15:13.819553+08  
slot_name             |   
conninfo              | user=rep password=******** dbname=replication host=127.0.0.1 port=2921 fallback_application_name=walreceiver sslmode=disable sslcompression=1 target_session_attrs=any  
  
  
postgres=# select pg_wal_replay_pause();  
-[ RECORD 1 ]-------+-  
pg_wal_replay_pause |   
  
-- suspend WAL Applications, such as troubleshooting  
postgres=# select pg_is_wal_replay_paused();  
-[ RECORD 1 ]-----------+--  
pg_is_wal_replay_paused | t  
  
postgres=# select pg_wal_replay_resume();  
-[ RECORD 1 ]--------+-  
pg_wal_replay_resume |   
  
-- Continue to apply WAL  
postgres=# select pg_is_wal_replay_paused();  
-[ RECORD 1 ]-----------+--  
pg_is_wal_replay_paused | f  

Matters needing attention

1. There are two ways to prevent the main library from deleting WAL files that have not been received by the standby.

Use slot s, or configure wal keep large enough.

But both methods have certain risks or problems, such as when the standby hangs, or the standby is no longer used, and the user forgot to delete the corresponding SLOT. It may lead to infinite expansion of the main library WAL.

wal keep will cause the WAL of the main library to reserve enough space.

Relevant parameters

main postgresql.conf  
# max_replication_slots = 10  
# wal_keep_segments = 1024  
  
//Prepare recovery.conf  
# primary_slot_name = ''  

2. If you don't want to prevent the WAL needed by the backup library from being deleted by the above methods, you can configure the archiving of the main library, and the backup library needs to be able to obtain the archived WAL files.

Relevant parameters

main postgresql.conf  
#archive_mode = off             # enables archiving; off, on, or always  
                                # (change requires restart)  
#archive_command = ''           # command to use to archive a logfile segment  
                                # placeholders: %p = path of file to archive  
                                #               %f = file name only  
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'  
  
//Prepare recovery.conf  
# restore_command = ''           # e.g. 'cp /mnt/server/archivedir/%f %p'  

3. Protect the password in recovery.conf file, because the configuration is plaintext.

4. Bandwidth between master and backup should be large enough, otherwise it may lead to delay between master and backup.

Pressure measurement

Connect the main library for TPC-B pressure measurement

pgbench -i -s 100 -h 127.0.0.1 -p 2921 -U postgres  
  
pgbench -n -r -P 1 -h 127.0.0.1 -p 2921 -U postgres -c 32 -j 32 -T 120  

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 1326066
latency average = 2.896 ms
latency stddev = 2.030 ms
tps = 11050.199659 (including connections establishing)
tps = 11051.140876 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  \set aid random(1, 100000 * :scale)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.043  BEGIN;
         0.154  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.112  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.159  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.423  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.092  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.910  END;

Observing the delay of backup

psql -h 127.0.0.1 -p 2921  
psql (10beta1)  
Type "help" for help.  
  
postgres=# \x  
Expanded display is on.  
postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay,   
postgres-#   pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay,   
postgres-#   *  
postgres-# from pg_stat_replication;  
-[ RECORD 1 ]----+------------------------------  
sent_delay       | 4024 bytes  
write_delay      | 4024 bytes  
flush_delay      | 9080 bytes  
replay_delay     | 13 kB  
pid              | 11962  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63083  
backend_start    | 2017-07-11 17:15:31.231492+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/1C61E398  
write_lsn        | 2/1C61E398  
flush_lsn        | 2/1C61CFD8  
replay_lsn       | 2/1C61BEF8  
write_lag        | 00:00:00.000129  
flush_lag        | 00:00:00.001106  
replay_lag       | 00:00:00.001626  
sync_priority    | 1  
sync_state       | quorum  
-[ RECORD 2 ]----+------------------------------  
sent_delay       | 0 bytes  
write_delay      | 4024 bytes  
flush_delay      | 9080 bytes  
replay_delay     | 12 kB  
pid              | 11350  
usesysid         | 16384  
usename          | rep  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63077  
backend_start    | 2017-07-11 17:15:13.818043+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/1C61F350  
write_lsn        | 2/1C61E398  
flush_lsn        | 2/1C61CFD8  
replay_lsn       | 2/1C61C388  
write_lag        | 00:00:00.000542  
flush_lag        | 00:00:00.001582  
replay_lag       | 00:00:00.001952  
sync_priority    | 1  
sync_state       | quorum  
  
postgres=# \watch 1  

2. Streaming Logic Subscription, Logic Master-Slave Best Practices

PostgreSQL Logical Subscription - What Hope Does It Bring to Business Architecture? "

PostgreSQL 10.0 preview Logical Replication - Principles and Best Practices

"Using PostgreSQL Logical Subscription to Realize multi-master"

Posted by grandeclectus on Thu, 13 Jun 2019 16:29:23 -0700