Recovery of Table Volume by Using pg_repack

Keywords: Linux Database yum github CentOS

For some tables that often do archiver or purge operations, if we reclaim table space irregularly, the volume of tables will increase.

However, the vacuum full in pg will block the read and write operations in the process of recovery, and can not run directly in the production environment.

Therefore, in the production environment, the commonly used table space shrinkage tools are pg_squeeze and pg_repack.


Here we first paste the usage of pg_repack:


Project address: https://github.com/reorg/pg_repack

Principle: Create an identical shadow table, copy the data of the original table, and rename replaces the original table.

Note: Tables to be processed must have primary keys


yum install centos-release-scl-rh
yum install llvm-toolset-7-clang

cd /home/postgres

tar xf pg_repack-ver_1.4.4.tar.gz 

export PATH=/usr/local/pgsql-11.5/bin:$PATH   -- You need to load environment variables, otherwise you may not find them during compilation pg_config This document

cd pg_repack-ver_1.4.4

make && make install

In addition, an executable file is generated: / home/postgres/pg_repack-ver_1.4.4/bin/pg_repack


Modify the configuration file:

vim  /usr/local/pgsql-11.5/data/postgresql.conf

shared_preload_libraries = 'pg_repack'

Then, restart the pg process



Usage method:

create database db1;

\c db1

create extension pg_repack;
 
create table testdata (id integer,course int,grade numeric(4,2),testtime date);
alter table testdata add primary key (id);

insert into testdata 
 select generate_series(1,100) as id,
 10 as course,
 10.11 as grade,
 '2017-07-06' as testtime;


Then we can see that the physical file size of PG datadir has increased from 1.1GB to 1.6GB.

 

Then, we use the command delete from test data where id is between 5000000 and 10000000; delete half of the data from the test data table, and you can see that the physical file does not shrink.

 

Then, use pg_repack externally to reclaim space from the color table:

cd /home/postgres/pg_repack-ver_1.4.4/bin

./pg_repack -h 127.0.0.1  --port 5434 -Upostgres -d db1 -t testdata -j 2 -D -k

The results returned are as follows (depending on the size of the actual table, the waiting time for this result is determined):

NOTICE: Setting up workers.conns

INFO: repacking table "public.testdata"


pg_repack parameter description:

  -a, --all                 repack all databases
  -t, --table=TABLE         repack specific table only
  -I, --parent-table=TABLE  repack specific parent table and its inheritors
  -c, --schema=SCHEMA       repack tables in specific schema only
  -s, --tablespace=TBLSPC   move repacked tables to a new tablespace
  -S, --moveidx             move repacked indexes to TBLSPC too
  -o, --order-by=COLUMNS    order by columns instead of cluster keys
  -n, --no-order            do vacuum full instead of cluster
  -N, --dry-run             print what would have been repacked
  -j, --jobs=NUM            Use this many parallel jobs for each table
  -i, --index=INDEX         move only the specified index
  -x, --only-indexes        move only indexes of the specified table
  -T, --wait-timeout=SECS   timeout to cancel other backends on conflict
  -D, --no-kill-backend     don't kill other backends when timed out
  -Z, --no-analyze          don't analyze at end
  -k, --no-superuser-check  skip superuser checks in client
  -C, --exclude-extension   don't repack tables which belong to specific extension
Connection options:
  -d, --dbname=DBNAME       database to connect
  -h, --host=HOSTNAME       database server host or socket directory
  -p, --port=PORT           database server port
  -U, --username=USERNAME   user name to connect as
  -w, --no-password         never prompt for password
  -W, --password            force password prompt




 

We looked at the physical file size again and found that the entire file size of PG was reduced back to 1.4 GHz (the recovered 200 MB is almost half the space size of the testdata table).

 

 

Finally, we can use script timing detection to perform pg_repack operations on tables that exceed certain thresholds to recover disk space.







Posted by timvw on Sat, 24 Aug 2019 06:16:25 -0700