PostgreSQL pgbench tpcb massive database test-partition table test optimization

Keywords: Database SQL PostgreSQL github git

Label

PostgreSQL , pgbench , tpcb

background

pgbench is a test tool for PG. The built-in test CASE is tpcb test. It also supports users to write test CASE by themselves.

A large number of custom CASE references

https://github.com/digoal/blog/blob/master/201711/readme.md

When we use tpcb to test CASE, if the amount of data generated is too large, for example, I recently generated 1 trillion CASE, we can consider using partition tables. However, when the performance of PG built-in partition tables is very large, using PREPARED STATEMENT will lead to performance degradation.

It is recommended to use the pg_pathman partition plug-in to eliminate this performance problem.

Or you can change the SQL of pgbench and use udf and dynamic SQL to achieve the same performance.

Implementation of UDF Dynamic SQL for pgbench tpcb Partition Table

1. Primary QUERY is as follows

Read and write case

\set aid random(1, 100000 * :scale)  
\set bid random(1, 1 * :scale)  
\set tid random(1, 10 * :scale)  
\set delta random(-5000, 5000)  
BEGIN;  
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
END;

Read-only case

\set aid random(1, 100000 * :scale)  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

Reconstruction of tpcb pgbench_accounts partition table

1. Definition structure

For example, 1 trillion

pgbench -i -I dt --tablespace=tbs1 -s 10000000

2. Using partitions

create table p (like pgbench_accounts) partition by RANGE ( aid ) tablespace tbs1;

3. Create 4097 partitions with 244140625 records per partition

do language plpgsql $$                                                           
declare  
  i_rows_perpartition int8 := 244140625;  
begin  
  for i in 0..4096 loop  
    execute format ('create table pgbench_accounts%s partition of p for values from (%s) to (%s) tablespace tbs1', i, i*i_rows_perpartition, (i+1)*i_rows_perpartition);  
  end loop;  
end;  
$$;

4. Rename table

drop table pgbench_accounts;  
  
alter table p rename to pgbench_accounts;  
  
-- alter table pgbench_accounts add constraint pk_pgbench_accounts_aid primary key (aid) using index tablespace tbs2;

5. New Task Table and Parallel Loading of Data

drop table task;  
create table task(id int primary key);  
insert into task select i from generate_series(0,4095) t(i);

6. Create a new basic data table and write 244140625 pieces of data.

create table init_accounts(aid int8);  
insert into init_accounts select generate_series(0,244140624);

7. Create UDF, call it once and write 244140625 pieces of data to a single partition.

create or replace function tpcb_init_accounts() returns void as $$  
declare  
  v_id int;  
begin  
  with tmp as (select * from task limit 1 for update skip locked),  
    tmp1 as (delete from task using tmp where task.id=tmp.id)  
    select id into v_id from tmp;  
    
  if found then  
    execute format ('insert into pgbench_accounts%s select aid+%s*244140625::int8, ((aid+%s*244140625::int8)-1)/100000 + 1, 0 from init_accounts on conflict do nothing', v_id, v_id, v_id);  
  end if;  
end;  
$$ language plpgsql strict;

8. Using pgbench, 64 concurrencies are opened to generate 1 trillion tpcb pgbench_accounts test data.

vi test.sql  
select tpcb_init_accounts();  
  
nohup pgbench -M prepared -n -r -f ./test.sql -c 64 -j 64 -t 100 >./init.log 2>&1 &

9. Create task tables to generate pgbench_accounts partition table PK

drop table task;  
create table task(id int primary key);  
insert into task select i from generate_series(0,4095) t(i);

10. New UDF for generating pgbench_accounts partition table PK

create or replace function tpcb_init_accounts_pkey() returns void as $$  
declare  
  v_id int;  
begin  
  with tmp as (select * from task limit 1 for update skip locked),  
    tmp1 as (delete from task using tmp where task.id=tmp.id)  
    select id into v_id from tmp;  
    
  if found then  
    execute format ('analyze pgbench_accounts%s', v_id);  
    execute format ('alter table pgbench_accounts%s add constraint pk_pgbench_accounts%s_aid primary key (aid) using index tablespace tbs2', v_id, v_id);  
  end if;  
end;  
$$ language plpgsql strict;

11. Generating pgbench_accounts partition table PK

vi test.sql  
select tpcb_init_accounts_pkey();  
  
nohup pgbench -M prepared -n -r -f ./test.sql -c 64 -j 64 -t 100 >./init.log 2>&1 &

If you use the pg_pathman plug-in, don't bother.

1. Using pgbench to generate structure

pgbench -i -I dt --tablespace=tbs1 -s 10000000

2. Convert to partition table

https://github.com/postgrespro/pg_pathman

PostgreSQL 9.5 + Efficient Partition Table Implementation - pg_pathman

3. Using pgbench to generate data

pgbench -i -I g -n --tablespace=tbs1 -s 10000000

4, add PK

Table level parallel  
  
analyze surface  
  
pgbench -i -I p -n -s 10000000 --index-tablespace=tbs2

tpcb UDF

1. Read and write

create or replace function tpcb_rw(     
  i_aid int8,     
  i_bid int4,     
  i_tid int4,     
  i_delta int4,     
  i_rows_perpartition int8     
) returns int as $$    
declare    
  i_suffix int := i_aid/i_rows_perpartition;     
  res int;     
begin    
  execute format('execute p1_%s(%s,%s)', i_suffix, i_delta, i_aid);    
  execute format('execute p2_%s(%s)', i_suffix, i_aid) into res;    
  UPDATE pgbench_tellers SET tbalance = tbalance + i_delta WHERE tid = i_tid;    
  UPDATE pgbench_branches SET bbalance = bbalance + i_delta WHERE bid = i_bid;    
  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (i_tid, i_bid, i_aid, i_delta, CURRENT_TIMESTAMP);    
  return res;    
exception when others then    
  execute format('prepare p1_%s(int,int8) as UPDATE pgbench_accounts%s SET abalance = abalance + $1 WHERE aid = $2', i_suffix, i_suffix);    
  execute format('prepare p2_%s(int8) as SELECT abalance FROM pgbench_accounts%s WHERE aid = $1', i_suffix, i_suffix);    
  execute format('execute p1_%s(%s,%s)', i_suffix, i_delta, i_aid);    
  execute format('execute p2_%s(%s)', i_suffix, i_aid) into res;    
  return res;    
end;    
$$ language plpgsql strict;

2, read only

create or replace function tpcb_ro(     
  i_aid int8,     
  i_rows_perpartition int8     
) returns int as $$    
declare    
  i_suffix int := i_aid/i_rows_perpartition;    
  res int;    
begin    
  execute format('execute p2_%s(%s)', i_suffix, i_aid) into res;    
  return res;    
exception when others then    
  execute format('prepare p2_%s(int8) as SELECT abalance FROM pgbench_accounts%s WHERE aid = $1', i_suffix, i_suffix);    
  execute format('execute p2_%s(%s)', i_suffix, i_aid) into res;    
  return res;    
end;    
$$ language plpgsql strict;

3. Change to a custom script

Read and write CASE

vi rw.sql  
  
\set aid random(1, 100000 * :scale)  
\set bid random(1, 1 * :scale)  
\set tid random(1, 10 * :scale)  
\set delta random(-5000, 5000)  
SELECT tpcb_rw(:aid,:bid,:tid,:delta,244140625);

Read-only case

vi ro.sql  
  
\set aid random(1, 100000 * :scale)  
SELECT tpcb_ro(:aid,244140625);

Reference resources

https://github.com/digoal/blog/blob/master/201711/readme.md

https://github.com/postgrespro/pg_pathman

PostgreSQL 9.5 + Efficient Partition Table Implementation - pg_pathman

PostgreSQL Oracle Compatibility - DBMS_SQL (Using Binding Variables in Stored Procedure Dynamic SQL - DB-side prepare statement)

https://www.postgresql.org/docs/devel/static/pgbench.html

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=60e612b602999e670f2d57a01e52799eaa903ca9

Posted by jariizumi on Thu, 02 May 2019 08:30:37 -0700

Programmer Group