Research on the partition distribution of Citus data (1)

Keywords: SQL

(any SQL not explicitly indicated in this article is executed on the coordination node.)

Work node

mydb1=# SELECT * FROM master_get_active_worker_nodes();
   node_name   | node_port 
---------------+-----------
 192.168.7.131 |      5432
 192.168.7.135 |      5432
 192.168.7.136 |      5432
 192.168.7.137 |      5432
 192.168.7.133 |      5432
 192.168.7.132 |      5432
 192.168.7.134 |      5432
 192.168.7.130 |      5432
(8 rows)

Create test table

create table test_table(id int, name varchar(16));

At this time, you can see the table test table by using \ d on the coordination node.

Configuration fragmentation principle

SELECT master_create_distributed_table('test_table', 'id', 'hash');

Slice according to the number of slices and copies

SELECT master_create_worker_shards('test_table', 8, 1);

At this time, you can see the table test table XXXXXX by using \ d on the work node.

View fragmentation

mydb1=# SELECT * from pg_dist_shard;
 logicalrelid | shardid | shardstorage | shardminvalue | shardmaxvalue 
--------------+---------+--------------+---------------+---------------
 test_table   |  102024 | t            | -2147483648   | -1610612737
 test_table   |  102025 | t            | -1610612736   | -1073741825
 test_table   |  102026 | t            | -1073741824   | -536870913
 test_table   |  102027 | t            | -536870912    | -1
 test_table   |  102028 | t            | 0             | 536870911
 test_table   |  102029 | t            | 536870912     | 1073741823
 test_table   |  102030 | t            | 1073741824    | 1610612735
 test_table   |  102031 | t            | 1610612736    | 2147483647
(8 rows)

It can be seen that the last number in the table name test table XXXXXX on the work node is shardid here.

Operation table

mydb1=# INSERT INTO test_table VALUES(1,'a');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(2,'b');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(3,'c');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(4,'d');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(5,'e');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(6,'f');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(7,'g');
INSERT 0 1
mydb1=# INSERT INTO test_table VALUES(8,'h');
INSERT 0 1
mydb1=# select * from test_table;
 id | name 
----+------
  1 | a
  8 | h
  5 | e
  4 | d
  7 | g
  3 | c
  6 | f
  2 | b
(8 rows)

Operate the table directly on the work node

On the worker node:

mydb1=# select * from test_table_102024;
 id | name 
----+------
  1 | a
  8 | h
(2 rows)

It can be seen that the records of test table are scattered in 8 work nodes according to the hash column of id.
Try to insert data directly into the work node and execute on the worker node:

mydb1=# INSERT INTO test_table_102024 VALUES(999,'999');
INSERT 0 1

Then query:

mydb1=# select * from test_table_102024;
 id  | name 
-----+------
   1 | a
   8 | h
 999 | 999
(3 rows)

Query on coordinator node:

mydb1=# select * from test_table;
 id  | name 
-----+------
   1 | a
   8 | h
 999 | 999
   5 | e
   4 | d
   7 | g
   3 | c
   6 | f
   2 | b
(9 rows)

It can be seen that the data operation directly to the work node can be accessed normally in the coordination node (but the hash distribution is not performed according to the id).

Posted by loquela on Sat, 04 Apr 2020 00:20:48 -0700