master/salve handover based on async stream in pgsql 9.1

Keywords: PostgreSQL Attribute SSL network

vm uses Oracle vm Virtual Box 5.2.4 r119785 (Qt5.6.2)
os uses debian 8.2
pgsql was 9.1.22

ip planning is as follows: two network segments
# 192.168.56.x
# 192.168.165.x

node1
bond0 192.168.56.100
bond0:1 192.168.56.101
eth0
eth1
bond1 192.168.165.100
bond1:1 192.168.165.101
eth2
eth3

node2
bond0 192.168.56.200
bond0:1
eth0
eth1
bond1 192.168.165.200
bond1:1
eth2
eth3

node3
eth0 192.168.56.50
eth1 192.168.165.50

192.168.56.101 and 192.168.165.101 are vip of pgsql. The application only needs to connect one of the two ip s.

#######################################################################
debian 8.2 Install pgsql 9.1

#vi /etc/apt/sources.list.d/pgdg.list
deb http://apt.postgresql.org/pub/repos/apt/ jessie-pgdg main

#wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
apt-key add -
#apt-get update
#apt-get install postgresql-9.1

Contains the following packages
postgresql-9.1
postgresql-client-9.1
postgresql-client-common
postgresql-common
postgresql-contrib-9.1

Manual installation of the following packages

#apt-get install postgresql-server-dev-9.1

Contains the following packages
libpq-dev
postgresql-server-dev-9.1

Using bonding, install ifenslave

# apt-get install ifenslave

bonding configuration, you can refer to another blog.

#######################################################################
Build pgsql stream environment with one master and two slaves

Noe1 node creates replicated users

$ psql
psql (9.1.22)
postgres=# CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'rep123';

Noe1, Noe1, Noe3 Nodes Modify pg_hba.conf

$ cd /etc/postgresql/9.1/main
$ vi pg_hba.conf

host     replication     replicator     192.168.56.100/32         md5
host     replication     replicator     192.168.165.100/32        md5
host     replication     replicator     192.168.56.200/32         md5
host     replication     replicator     192.168.165.200/32        md5
host     replication     replicator     192.168.56.50/32          md5
host     replication     replicator     192.168.165.50/32         md5
host     replication     replicator     192.168.56.101/32         md5
host     replication     replicator     192.168.165.101/32        md5

Modify postgresql.conf ellipsis on node 2 and node 3.

$ ls -l
drwx------ 5 postgres postgres 4096 1 Month 1616:12 base
drwx------ 2 postgres postgres 4096 1 Month 1618:51 global
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_clog
drwx------ 4 postgres postgres 4096 1 Month 1616:12 pg_multixact
drwx------ 2 postgres postgres 4096 1 Month 1618:48 pg_notify
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_serial
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_stat_tmp
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_subtrans
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_tblspc
drwx------ 2 postgres postgres 4096 1 Month 1616:12 pg_twophase
-rw------- 1 postgres postgres    4 1 Month 1616:12 PG_VERSION
drwx------ 3 postgres postgres 4096 1 Month 1616:12 pg_xlog
-rw------- 1 postgres postgres  133 1 Month 1618:48 postmaster.opts
lrwxrwxrwx 1 root     root       36 1month  16 16:12 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem
lrwxrwxrwx 1 root     root       38 1month  16 16:12 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key

Noe2 and Noe3 use pg_basebackup to pull files

$ pg_basebackup -h 192.168.56.101 -D /var/lib/postgresql/9.1/main -U replicator -v -P

Configure recovery.conf on node 2 and node 3

$ cd /var/lib/postgresql/9.1/main
$ vi recovery.conf 
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'

#######################################################################
Turn off pgsql in node1 node pgbench generating data

$ psql
psql (9.1.22)
postgres=# create database peiybdb
$ pgbench -i -s 200 peiybdb 

Noe2 Views lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
       pg_last_xlog_replay_location(),
       pg_last_xact_replay_timestamp(), 
       now() - pg_last_xact_replay_timestamp() AS replication_delay;
-[ RECORD 1 ]-----------------+------------------------------
pg_last_xlog_receive_location | 0/84001D8
pg_last_xlog_replay_location  | 0/84001D8
pg_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
replication_delay             | 00:03:41.62903

Noe3 Views lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
       pg_last_xlog_replay_location(),
       pg_last_xact_replay_timestamp(), 
       now() - pg_last_xact_replay_timestamp() AS replication_delay;
-[ RECORD 1 ]-----------------+------------------------------
pg_last_xlog_receive_location | 0/84001D8
pg_last_xlog_replay_location  | 0/84001D8
pg_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
replication_delay             | 00:04:04.309157

Noe1 node uses pg_controldata to view the final generated lsn

$ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main

#######################################################################
Posgresql session closure for node1

$ psql
psql (9.1.22)
postgres=# SELECT 'select pg_terminate_backend('|| psa.procpid || ');' as term_spid, 
       psa.usename <> 'postgres'
       psa.procpid,
       psa.waiting,
       psa.client_addr,
       psa.current_query,
       now() - psa.xact_start as xact_time,
       now() - psa.query_start as query_time,
       psa.* 
FROM pg_stat_activity psa
WHERE 1=1
  and psa.usename <> 'postgres'
order by case when psa.current_query = '<IDLE>' then 99
              when psa.current_query = '<insufficient privilege>' then 98
              else 1
          end,
         now() - psa.xact_start desc 
;

Then select pg_terminate_backend(") is executed manually.

Document recovery of node1.

$ cd /var/lib/postgresql/9.1/main/
$ vi recovery.done
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'

postgresql closure for node1

# service postgresql stop 
//perhaps
$ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main 

Noe1 file recovery.done renamed recovery.conf, beware of dual ownership after startup

$ cd /var/lib/postgresql/9.1/main/
$ mv recovery.done recovery.conf

Noe1 bond network card subinterface vip deletion

# ip addr del 192.168.56.101/24 dev bond0 label bond0:1
# ip addr del 192.168.165.101/24 dev bond1 label bond1:1
# ifconfig
# ip a

Noe2 bond network card sub-interface vip add

# ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
# ip addr add 192.168.165.101/24 brd 192.168.165.255 dev bond1 label bond1:1
# ifconfig
# ip a

Noe2 for arping broadcasting

# arping -q -A -c 1 -I bond0:1 192.168.56.101
# arping -q -A -c 1 -I bond1:1 10.10.165.101

Noe2's slave libraries are upgraded to master Libraries

$ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main

Noe2 Views pgsql status

$ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main  

Copy the generated timeline on node2 to the rest of the slave nodes
pgsql 9.1 also needs to manually copy the timeline file. pgsql 9.3 does not need to manually copy the generated timeline file from the beginning, and it will automatically transfer. This must be noted, otherwise the log file will generate such FATAL prompts

timeline 2 of the primary does not match recovery target timeline 1

$ scp ./00000002.history postgres@192.168.56.100:/var/lib/postgresql/9.1/main/pg_xlog/
$ scp ./00000002.history postgres@192.168.56.50:/var/lib/postgresql/9.1/main/pg_xlog/

Noe1 starts pgsql 9.1

# service postgresql start 
//perhaps
$ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"

Noe2 Views Slave Library Replication

$ psql
psql (9.1.22)
postgres=# select client_addr,
       pg_xlog_location_diff(pg_current_xlog_insert_location(), pg_current_xlog_location() ) as local_noflush_delay,
       pg_xlog_location_diff(pg_current_xlog_location(), sent_location) as local_sent_delay,
       pg_xlog_location_diff(sent_location, write_location) as stream_write_delay,
       pg_xlog_location_diff(sent_location, flush_location) as stream_flush_delay,
       pg_xlog_location_diff(sent_location, replay_location) as stream_replay_delay 
from pg_stat_replication;

Noe2 Views master's current lsn

$ psql
psql (9.1.22)
postgres=# select pg_current_xlog_insert_location(), 
       pg_current_xlog_location() 
;

node3 view slave's current lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
       pg_last_xlog_replay_location(),
       pg_last_xact_replay_timestamp(), 
       now() - pg_last_xact_replay_timestamp() AS replication_delay
;

So far, the master/slave switch of pgsql 9.1 has been successful. Because of the use of vip, the application does not need to make any changes, but in the application log will be output to the database connection lost or unable to connect to the database prompt.

#######################################################################
For reference,
#######################################################################
1, check ip

# ifconfig
 perhaps
# ip -a

Note scope global bond0
scope global secondary bond0:1

2. Cancel bond 0:1 (192.168.56.101)

# ip addr del 192.168.56.101/24 dev bond0 label bond0:1
//perhaps
# ifconfig bond0:1 192.168.56.101 down

3. Add bond 0:1 (192.168.56.101)

# ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
//perhaps
# ifconfig bond0:1 192.168.56.101/24 up

4. Posgresql Closed

# service postgresql stop 
//perhaps
$ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main 

5. Posgresql startup

# service postgresql start 
//perhaps
$ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"

6. Posgresql slave upgraded to master

$ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main  
//Or generate trigger_file for recovery.conf
$ touch /tmp/postgresql.trigger.5432

7. recovery.conf file

vi recovery.conf 
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'

Note that recovery_target_timeline must be set to latest

8. Execute arping after bond 0:1 and bond 1:1 changes

# arping -q -A -c 1 -I bond0:1 192.168.56.101
# arping -q -A -c 1 -I bond1:1 10.10.165.101

For the arping command, you can refer to http://man.linuxde.net/arping

Posted by juancuca on Thu, 13 Dec 2018 01:12:07 -0800