Label
PostgreSQL, stream replication, master-slave, logical subscription
background
Origin of stream replication
PostgreSQL has supported streaming physical replication since the release of version 9.0 in 2010. Users can build read-only standby through streaming replication (master-standby physical replication, block-level consistency). Streaming physical replication can achieve very low latency (usually within 1 millisecond).
Synchronized stream replication
Version 9.1, released in 2011, supports synchronous replication, when only one synchronous stream replication standby node was supported (for example, three standby nodes were configured, only one is synchronous mode, and the others are asynchronous mode).
In the synchronous replication mode, when a user submits a transaction, it is necessary to wait for the WAL log of the transaction to be replicated to the standby node of the synchronous stream replication before returning the ACK that submitted successfully to the client.
In synchronization mode, 0 loss of data can be ensured. (As long as the client receives the ACK with successful transaction submission, there are two WAL s for this transaction.)
Cascade stream replication
Version 9.2, launched in 2012, supports cascade stream replication. This means that the reserve can also be connected to the reserve.
Cascade replication is especially suitable for the use of cross-room, such as the main library in A room and the standby in B room, but when B room needs to establish more than one standby, then B room only needs to establish a standby directly connected to the main library, and other standby can be generated through the standby cascade in B room. This reduces network overhead.
Flow Virtual Reserve
Version 9.2, released in 2012, supports not only cascading replication, but also virtual standby. What is virtual standby? It's just WAL, there's no backup for data files.
Through virtual backup, WAL can be received streamwise, and real-time streaming WAL archiving can be carried out. Improve real-time backup or archiving.
The Basis of Logical Replication
Version 9.4, launched in 2014, adds the basic information needed for logical replication in WAL. Logical replication can be achieved through plug-ins.
Logical replication enables partial replication of the main library, such as table-level replication, rather than block-level consistent replication of the entire cluster.
Logically replicated backup libraries are not only read-only, but also can perform write operations.
Adding several levels of synchronization
Version 9.6, released in 2016, streaming replication of PG, achieves synchronization by replicating WAL, so synchronization level is also related to WAL. The synchronous_commit parameter allows you to configure the level of synchronization of transactions.
1. on, representing the local WAL fsync, synchronizing standby WAL fsync. Two persistent WALs.
2. remote_apply, indicating the local WAL fsync, and the synchronous standby WAL has been restored. This brings the highest RT.
3. remote_write, which means local WAL fsync, is completed synchronously by standby WAL asynchronous write. A persistent, standby WAL may still be in OS CACHE.
4. Local, which means local WAL fsync.
5. off, which means that the local WAL is written to the wal buffer and returns the successful ACK of client transaction submission for asynchronous submission (database CRASH may result in transaction loss, but does not lead to database inconsistency).
The impact of RT from low to high is as follows:
off, local, remote_write, on, remote_apply.
Stream Backup Compression
Ten versions released in 2017, pg_basebackup and pg_receivewal support streaming compression backup WAL.
quorum based synchronous stream replication
The 10 versions released in 2017 support synchronous stream replication of quorum based. For example, there are three standby nodes. You can tell the main library that this transaction requires three WAL replicas. Then the main library needs to wait for feedback that at least two standby nodes have synchronized the WAL in the past before returning the ACK that submitted the transaction successfully to the client.
quorum based synchronous stream replication, combined with raft protocol, can achieve a high availability and high reliability architecture with zero data loss.
Built-in logical subscription and replication
The 10 versions launched in 2017 have built-in logical subscription functions.
Multi-master
In the 10 versions released in 2017, multi-Master architecture can be realized through logical subscription.
I. Master-slave Best Practice of Flow Physics
Taking a master-slave, quorum-based synchronization (one copy) as an example, this paper introduces the master-slave best practices of PG10.
Environmental Science
Three machines (assuming that the host has 644 memory) are interconnected with each other in the same LAN (at least the database listening ports should be interconnected).
Linux CentOS 7.x x64
HOSTA: Listen on port 1921
HOSTB: Listen on port 1921
HOSTC: Listen on port 1921
(This paper uses a physical machine to simulate, IP 127.0.0.1, ports 2921, 2922, 2923 respectively. Readers please modify according to the actual environment.)
Software Installation Sketch
Please refer to
PostgreSQL on Linux Best Deployment Manual
Initialize the main library
mkdir /disk1/digoal/pgdata2921 chown digoal /disk1/digoal/pgdata2921 initdb -D /disk1/digoal/pgdata2921 -E UTF8 --locale=C -U postgres
Configure postgresql.conf
cd /disk1/digoal/pgdata2921 vi postgresql.conf listen_addresses = '0.0.0.0' port = 2921 max_connections = 1000 unix_socket_directories = '.' tcp_keepalives_idle = 60 tcp_keepalives_interval = 10 tcp_keepalives_count = 10 shared_buffers = 8GB maintenance_work_mem = 1GB dynamic_shared_memory_type = posix vacuum_cost_delay = 0 bgwriter_delay = 10ms bgwriter_lru_maxpages = 500 bgwriter_lru_multiplier = 5.0 bgwriter_flush_after = 0 effective_io_concurrency = 0 max_worker_processes = 16 backend_flush_after = 0 wal_level = replica fsync = on synchronous_commit = remote_write full_page_writes = on wal_buffers = 128MB wal_writer_delay = 10ms wal_writer_flush_after = 0 checkpoint_timeout = 30min max_wal_size = 16GB min_wal_size = 8GB checkpoint_completion_target = 0.5 checkpoint_flush_after = 0 max_wal_senders = 10 wal_keep_segments = 1024 synchronous_standby_names = 'ANY 1 (*)' hot_standby = on max_standby_archive_delay = 300s max_standby_streaming_delay = 300s wal_receiver_status_interval = 1s hot_standby_feedback = off log_destination = 'csvlog' logging_collector = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_timezone = 'PRC' log_autovacuum_min_duration = 0 autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_freeze_max_age = 1000000000 autovacuum_multixact_freeze_max_age = 1200000000 autovacuum_vacuum_cost_delay = 0 autovacuum_vacuum_cost_limit = 0 vacuum_freeze_table_age = 800000000 vacuum_multixact_freeze_table_age = 800000000 datestyle = 'iso, mdy' timezone = 'PRC' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english'
Configure pg_hba.conf
cd /disk1/digoal/pgdata2921 vi pg_hba.conf # "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 127.0.0.1/32 trust # IPv6 local connections: host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. local replication all trust host replication all 127.0.0.1/32 trust host replication all ::1/128 trust host replication all 0.0.0.0/0 md5
Configure recovery.done
cd /disk1/digoal/pgdata2921 vi recovery.done recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=127.0.0.1 port=2921 user=rep password=pwd' # recovery_min_apply_delay = 0 #How many minutes of application latency, users can configure a delayed backup, such as a bit of misoperation buffer time. It won't be used so early in the backup.
Start the main library
pg_ctl start -D /disk1/digoal/pgdata2921
Create stream replication roles
psql -h 127.0.0.1 -p 2921 psql (10beta1) Type "help" for help. postgres=# set synchronous_commit =off; SET postgres=# create role rep login replication encrypted password 'pwd'; CREATE ROLE
Generating Standby 1
mkdir /disk1/digoal/pgdata2922 chown digoal /disk1/digoal/pgdata2922 chmod 700 /disk1/digoal/pgdata2922 export PGPASSWORD="pwd" pg_basebackup -D /disk1/digoal/pgdata2922 -F p -X stream -h 127.0.0.1 -p 2921 -U rep
Configure standby 1 postgresql.conf
cd /disk1/digoal/pgdata2922 vi postgresql.conf port = 2922
Configure Standby 1 recovery.conf
cd /disk1/digoal/pgdata2922 mv recovery.done recovery.conf
Start up Standby 1
pg_ctl start -D /disk1/digoal/pgdata2922
Generating Standby 2
mkdir /disk1/digoal/pgdata2923 chown digoal /disk1/digoal/pgdata2923 chmod 700 /disk1/digoal/pgdata2923 export PGPASSWORD="pwd" pg_basebackup -D /disk1/digoal/pgdata2923 -F p -X stream -h 127.0.0.1 -p 2921 -U rep
Configure standby 2 postgresql.conf
cd /disk1/digoal/pgdata2923 vi postgresql.conf port = 2923
Configure Standby 2 recovery.conf
cd /disk1/digoal/pgdata2923 mv recovery.done recovery.conf
Start up Standby 2
pg_ctl start -D /disk1/digoal/pgdata2923
Status Monitoring of Flow Replication Nodes
Main database query
psql -h 127.0.0.1 -p 2921 psql (10beta1) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay, * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ sent_delay | 0 bytes write_delay | 0 bytes flush_delay | 0 bytes replay_delay | 0 bytes pid | 11962 usesysid | 16384 usename | rep application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 63083 backend_start | 2017-07-11 17:15:31.231492+08 backend_xmin | state | streaming sent_lsn | 1/C0000060 write_lsn | 1/C0000060 flush_lsn | 1/C0000060 replay_lsn | 1/C0000060 write_lag | flush_lag | replay_lag | sync_priority | 1 sync_state | quorum -[ RECORD 2 ]----+------------------------------ sent_delay | 0 bytes write_delay | 0 bytes flush_delay | 0 bytes replay_delay | 0 bytes pid | 11350 usesysid | 16384 usename | rep application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 63077 backend_start | 2017-07-11 17:15:13.818043+08 backend_xmin | state | streaming sent_lsn | 1/C0000060 write_lsn | 1/C0000060 flush_lsn | 1/C0000060 replay_lsn | 1/C0000060 write_lag | flush_lag | replay_lag | sync_priority | 1 sync_state | quorum
Standby query
psql -h 127.0.0.1 -p 2922 -- View the current WAL Does the application pause? postgres=# select pg_is_wal_replay_paused(); pg_is_wal_replay_paused ------------------------- f (1 row) -- See WAL Received site postgres=# select pg_last_wal_receive_lsn(); pg_last_wal_receive_lsn ------------------------- 1/C0000060 (1 row) -- See WAL Application site postgres=# select pg_last_wal_replay_lsn(); pg_last_wal_replay_lsn ------------------------ 1/C0000060 (1 row) -- See wal receiver Statistical information postgres=# \x Expanded display is on. postgres=# select * from pg_stat_get_wal_receiver(); -[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------- pid | 11349 status | streaming receive_start_lsn | 1/C0000000 receive_start_tli | 1 received_lsn | 1/C0000060 received_tli | 1 last_msg_send_time | 2017-07-11 17:23:14.372327+08 last_msg_receipt_time | 2017-07-11 17:23:14.372361+08 latest_end_lsn | 1/C0000060 latest_end_time | 2017-07-11 17:15:13.819553+08 slot_name | conninfo | user=rep password=******** dbname=replication host=127.0.0.1 port=2921 fallback_application_name=walreceiver sslmode=disable sslcompression=1 target_session_attrs=any postgres=# select pg_wal_replay_pause(); -[ RECORD 1 ]-------+- pg_wal_replay_pause | -- suspend WAL Applications, such as troubleshooting postgres=# select pg_is_wal_replay_paused(); -[ RECORD 1 ]-----------+-- pg_is_wal_replay_paused | t postgres=# select pg_wal_replay_resume(); -[ RECORD 1 ]--------+- pg_wal_replay_resume | -- Continue to apply WAL postgres=# select pg_is_wal_replay_paused(); -[ RECORD 1 ]-----------+-- pg_is_wal_replay_paused | f
Matters needing attention
1. There are two ways to prevent the main library from deleting WAL files that have not been received by the standby.
Use slot s, or configure wal keep large enough.
But both methods have certain risks or problems, such as when the standby hangs, or the standby is no longer used, and the user forgot to delete the corresponding SLOT. It may lead to infinite expansion of the main library WAL.
wal keep will cause the WAL of the main library to reserve enough space.
Relevant parameters
main postgresql.conf # max_replication_slots = 10 # wal_keep_segments = 1024 //Prepare recovery.conf # primary_slot_name = ''
2. If you don't want to prevent the WAL needed by the backup library from being deleted by the above methods, you can configure the archiving of the main library, and the backup library needs to be able to obtain the archived WAL files.
Relevant parameters
main postgresql.conf #archive_mode = off # enables archiving; off, on, or always # (change requires restart) #archive_command = '' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' //Prepare recovery.conf # restore_command = '' # e.g. 'cp /mnt/server/archivedir/%f %p'
3. Protect the password in recovery.conf file, because the configuration is plaintext.
4. Bandwidth between master and backup should be large enough, otherwise it may lead to delay between master and backup.
Pressure measurement
Connect the main library for TPC-B pressure measurement
pgbench -i -s 100 -h 127.0.0.1 -p 2921 -U postgres pgbench -n -r -P 1 -h 127.0.0.1 -p 2921 -U postgres -c 32 -j 32 -T 120 transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 32 number of threads: 32 duration: 120 s number of transactions actually processed: 1326066 latency average = 2.896 ms latency stddev = 2.030 ms tps = 11050.199659 (including connections establishing) tps = 11051.140876 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.000 \set delta random(-5000, 5000) 0.043 BEGIN; 0.154 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.112 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 0.159 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 0.423 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.092 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 1.910 END;
Observing the delay of backup
psql -h 127.0.0.1 -p 2921 psql (10beta1) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as sent_delay, postgres-# pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) as write_delay, postgres-# pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) as flush_delay, postgres-# pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as replay_delay, postgres-# * postgres-# from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ sent_delay | 4024 bytes write_delay | 4024 bytes flush_delay | 9080 bytes replay_delay | 13 kB pid | 11962 usesysid | 16384 usename | rep application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 63083 backend_start | 2017-07-11 17:15:31.231492+08 backend_xmin | state | streaming sent_lsn | 2/1C61E398 write_lsn | 2/1C61E398 flush_lsn | 2/1C61CFD8 replay_lsn | 2/1C61BEF8 write_lag | 00:00:00.000129 flush_lag | 00:00:00.001106 replay_lag | 00:00:00.001626 sync_priority | 1 sync_state | quorum -[ RECORD 2 ]----+------------------------------ sent_delay | 0 bytes write_delay | 4024 bytes flush_delay | 9080 bytes replay_delay | 12 kB pid | 11350 usesysid | 16384 usename | rep application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 63077 backend_start | 2017-07-11 17:15:13.818043+08 backend_xmin | state | streaming sent_lsn | 2/1C61F350 write_lsn | 2/1C61E398 flush_lsn | 2/1C61CFD8 replay_lsn | 2/1C61C388 write_lag | 00:00:00.000542 flush_lag | 00:00:00.001582 replay_lag | 00:00:00.001952 sync_priority | 1 sync_state | quorum postgres=# \watch 1
2. Streaming Logic Subscription, Logic Master-Slave Best Practices
PostgreSQL Logical Subscription - What Hope Does It Bring to Business Architecture? "
PostgreSQL 10.0 preview Logical Replication - Principles and Best Practices
"Using PostgreSQL Logical Subscription to Realize multi-master"