For some tables that often do archiver or purge operations, if we reclaim table space irregularly, the volume of tables will increase.
However, the vacuum full in pg will block the read and write operations in the process of recovery, and can not run directly in the production environment.
Therefore, in the production environment, the commonly used table space shrinkage tools are pg_squeeze and pg_repack.
Here we first paste the usage of pg_repack:
Project address: https://github.com/reorg/pg_repack
Principle: Create an identical shadow table, copy the data of the original table, and rename replaces the original table.
Note: Tables to be processed must have primary keys
yum install centos-release-scl-rh yum install llvm-toolset-7-clang cd /home/postgres tar xf pg_repack-ver_1.4.4.tar.gz export PATH=/usr/local/pgsql-11.5/bin:$PATH -- You need to load environment variables, otherwise you may not find them during compilation pg_config This document cd pg_repack-ver_1.4.4 make && make install
In addition, an executable file is generated: / home/postgres/pg_repack-ver_1.4.4/bin/pg_repack
Modify the configuration file:
vim /usr/local/pgsql-11.5/data/postgresql.conf
shared_preload_libraries = 'pg_repack'
Then, restart the pg process
Usage method:
create database db1; \c db1 create extension pg_repack; create table testdata (id integer,course int,grade numeric(4,2),testtime date); alter table testdata add primary key (id); insert into testdata select generate_series(1,100) as id, 10 as course, 10.11 as grade, '2017-07-06' as testtime;
Then we can see that the physical file size of PG datadir has increased from 1.1GB to 1.6GB.
Then, we use the command delete from test data where id is between 5000000 and 10000000; delete half of the data from the test data table, and you can see that the physical file does not shrink.
Then, use pg_repack externally to reclaim space from the color table:
cd /home/postgres/pg_repack-ver_1.4.4/bin ./pg_repack -h 127.0.0.1 --port 5434 -Upostgres -d db1 -t testdata -j 2 -D -k
The results returned are as follows (depending on the size of the actual table, the waiting time for this result is determined):
NOTICE: Setting up workers.conns
INFO: repacking table "public.testdata"
pg_repack parameter description:
-a, --all repack all databases -t, --table=TABLE repack specific table only -I, --parent-table=TABLE repack specific parent table and its inheritors -c, --schema=SCHEMA repack tables in specific schema only -s, --tablespace=TBLSPC move repacked tables to a new tablespace -S, --moveidx move repacked indexes to TBLSPC too -o, --order-by=COLUMNS order by columns instead of cluster keys -n, --no-order do vacuum full instead of cluster -N, --dry-run print what would have been repacked -j, --jobs=NUM Use this many parallel jobs for each table -i, --index=INDEX move only the specified index -x, --only-indexes move only indexes of the specified table -T, --wait-timeout=SECS timeout to cancel other backends on conflict -D, --no-kill-backend don't kill other backends when timed out -Z, --no-analyze don't analyze at end -k, --no-superuser-check skip superuser checks in client -C, --exclude-extension don't repack tables which belong to specific extension Connection options: -d, --dbname=DBNAME database to connect -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port -U, --username=USERNAME user name to connect as -w, --no-password never prompt for password -W, --password force password prompt
We looked at the physical file size again and found that the entire file size of PG was reduced back to 1.4 GHz (the recovered 200 MB is almost half the space size of the testdata table).
Finally, we can use script timing detection to perform pg_repack operations on tables that exceed certain thresholds to recover disk space.