Doris temporary partition

Temporary partition

Temporary partitions belong to a partition table. Only partition tables can create temporary partitions. This is very core. Only the partition table can have temporary partitions.

Rules:
The partition field of temporary partition is the same as that of formal partition
The name of the temporary partition must be different from that of the formal partition
Temporary partitions support add, delete, and replace operations
Adding temporary partitions is similar to adding formal partitions. The partition scope of the temporary partition is independent of the formal partition.

Add temporary partition
ALTER TABLE expamle_range_tbl ADD TEMPORARY PARTITION tp2 VALUES LESS THAN("2020-02-01");
ALTER TABLE tbl2 ADD TEMPORARY PARTITION tp1 VALUES [("2020-01-01"), ("2020-02-01"));
ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01")
("in_memory" = "true", "replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;

Delete temporary partition
Deleting a temporary partition does not affect the data of a formal partition

ALTER TABLE expamle_range_tbl DROP TEMPORARY PARTITION tp1;

Insert temporary partition test data;
Note here that if no temporary partition is specified during insertion, the formal partition will be inserted by default. If the data is not within the range of the formal partition, an error will be reported; The following is an error message when the data does not fall into any partition during normal insertion.

When inserting the specified temporary partition below, the corresponding data must be able to be put into the range of the temporary partition, otherwise an unmatched error will be reported as follows: Reason: no partition for this tuple. Tuple = (11111 2022-04-01 2000-11-11 11 11:11:11 BJ 111 1 2022-01-01 00:00 1000 11 22). SRC line: [];

INSERT INTO expamle_range_tbl TEMPORARY PARTITION(tp2) values(11111,'2018-04-01','2000-11-11 11:11:11','bj',111,1,'2022-01-01 00:00:00',1000,11,22);


INSERT INTO expamle_range_tbl TEMPORARY PARTITION(tp2) values(11111,'2022-04-01','2000-11-11 11:11:11','bj',111,1,'2022-01-01 00:00:00',1000,11,22);

Query temporary partition data

SELECT ... FROM
tbl1 TEMPORARY PARTITION(tp1, tp2, ...)
JOIN
tbl2 TEMPORARY PARTITION(tp1, tp2, ...)
ON ...
WHERE ...;

select * from expamle_range_tbl TEMPORARY PARTITION(tp2);

The above query will find the data inserted into the temporary partition

Replace partition

You can use the ALTER TABLE REPLACE PARTITION statement to replace the formal partition of a table with a temporary partition.
strict_range :
The default is true. When this parameter is true, it means that the range union of all formal partitions to be replaced should be exactly the same as that of the temporary partition to be replaced. When set to false, you only need to ensure that the ranges between new formal partitions do not overlap after replacement.
See the official website for specific examples:

use_temp_partition_name:
The default is false. When the parameter is false and the number of partitions to be replaced and the number of replaced partitions are the same, the formal partition name after replacement remains unchanged. If true, the name of the formal partition after replacement is the name of the replacement partition.
After the partition is replaced successfully, the replaced partition will be deleted and unrecoverable: that is, the replaced partition will be deleted and unrecoverable

--  The following statement is used to tp1 Partition replacement p1 partition
ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2, tp3);

ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
PROPERTIES (
    "strict_range" = "false",
    "use_temp_partition_name" = "true"
);

Practical operation:

-- Create table
CREATE TABLE `expamle_range_tbl` (
  `user_id` largeint(40) NOT NULL COMMENT "user id",
  `date` date NOT NULL COMMENT "Data entry date and time",
  `timestamp` datetime NOT NULL COMMENT "Timestamp of data injection",
  `city` varchar(20) NULL COMMENT "User's City",
  `age` smallint(6) NULL COMMENT "User age",
  `sex` tinyint(4) NULL COMMENT "User gender",
  `last_visit_date` datetime REPLACE NULL DEFAULT "1970-01-01 00:00:00" COMMENT "Last access time of user",
  `cost` bigint(20) SUM NULL DEFAULT "0" COMMENT "Total user consumption",
  `max_dwell_time` int(11) MAX NULL DEFAULT "0" COMMENT "Maximum user residence time",
  `min_dwell_time` int(11) MIN NULL DEFAULT "99999" COMMENT "User minimum residence time"
) ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
COMMENT "OLAP"
PARTITION BY RANGE(`date`)
(PARTITION p201701 VALUES [('0000-01-01'), ('2017-02-01')),
PARTITION p201702 VALUES [('2017-02-01'), ('2017-03-01')),
PARTITION p201703 VALUES [('2017-03-01'), ('2017-04-01')))
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "V2"
);
-- Insert test data
insert into expamle_range_tbl  values(11111,'2017-01-01','2000-11-11 11:11:11','bj',111,1,'2022-01-01 00:00:00',1000,11,22);
insert into expamle_range_tbl  values(11112,'2017-02-22','2000-11-11 11:11:11','bj',111,1,'2022-01-01 00:00:00',1000,11,22);
insert into expamle_range_tbl  values(11113,'2017-03-22','2000-11-11 11:11:11','bj',111,1,'2022-01-01 00:00:00',1000,11,22);
-- Query formal partition validation 
select * from expamle_range_tbl PARTITION(p201701,p201702,p201703);

Verify partition insertion results

-- Create temporary partition
ALTER TABLE expamle_range_tbl ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2017-02-01");
-- Insert data into temporary partition
INSERT INTO expamle_range_tbl TEMPORARY PARTITION(tp1) values(11115,'2017-01-01','2017-01-01 11:11:11','sh',111,1,'2022-11-11 00:00:00',100,11,22);
-- Query data validation from temporary partition
select * from expamle_range_tbl TEMPORARY PARTITION(tp1);
select * from expamle_range_tbl  PARTITION(tp1);

-- replace partition  use_temp_partition_name  The function of is if true  Then partition p201701 Will be deleted. Only the modified formal partition can be used   tp1 query
-- if it is false  Then the original formal partition p201701  It will also retain only the following data and replace it with a temporary partition  tp1 After the following, you can continue to use the formal partition query
ALTER TABLE expamle_range_tbl REPLACE PARTITION (p201701) WITH TEMPORARY PARTITION (tp1)
PROPERTIES (
    "strict_range" = "true",
    "use_temp_partition_name" = "true"
);
-- The end result is  p201701 The partition will be deleted, including partition information and data under the partition; The temporary partition will also not exist, use tp1 Name as official partition

select * from expamle_range_tbl PARTITION(p201701);

select * from expamle_range_tbl PARTITION(p201702,p201703);

select * from expamle_range_tbl PARTITION(tp1); The following results can be found to be the data of the previous temporary partition TP1

select * from expamle_range_tbl; It can be found that the data of the temporary partition is not visible to the formal partition.

ALTER TABLE expamle_range_tbl DROP TEMPORARY PARTITION tp1;
ALTER TABLE expamle_range_tbl DROP PARTITION tp1;
Potholes:
If strict_ If the range is in the true mode and their partition range union is different, the following error will occur:

strict_ If the range is false, but it is necessary to ensure that the partition ranges do not overlap, the following error will be reported. It means that the leftmost boundary and the rightmost boundary must be the same, otherwise it won't work

Import and query of temporary partition

Import temporary partition

INSERT INTO tbl TEMPORARY PARTITION(tp1, tp2, ...) SELECT ....
curl --location-trusted -u root: -H "label:123" -H "temporary_partitions: tp1, tp2, ..." -T testData http://host:port/api/testDb/testTbl/_stream_load

LOAD LABEL example_db.label1
(
DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
INTO TABLE my_table
TEMPORARY PARTITION (tp1, tp2, ...)
...
)
WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password");

CREATE ROUTINE LOAD example_db.test1 ON example_tbl
COLUMNS(k1, k2, k3, v1, v2, v3 = k1 * 100),
TEMPORARY PARTITIONS(tp1, tp2, ...),
WHERE k1 > 100
PROPERTIES
(...)
FROM KAFKA
(...);

Relationship with other operations

DROP
After the Drop operation is used to directly delete the database or table, the database or table can be recovered through the Recover command (within a limited time), but the temporary partition will not be recovered.
After deleting a formal partition with the Alter command, you can Recover the partition with the Recover command (within a limited time). The operation has nothing to do with the formal partition and the temporary partition.
After deleting a temporary partition with the Alter command, the temporary partition cannot be recovered with the Recover command.

TRUNCATE
Use the Truncate command to empty the table. The temporary partition of the table will be deleted and cannot be recovered.
When you empty a formal partition with the Truncate command, the temporary partition is not affected.
You cannot empty a temporary partition using the Truncate command.

ALTER
When the table has temporary partitions, you cannot use the Alter command to change the table, such as Schema Change and Rollup.
You cannot add a temporary partition to a table while it is changing.

Best practices

  1. Atomic overwrite write operation
    Doris does not support overwrite writing. It can only be deleted and then inserted, but this will lead to unreadability for a period of time. Therefore, you can write to the temporary partition first and then replace it
  2. Modify bucket number
    The setting of the number of buckets in the original table is unreasonable. You can set a virtual partition consistent with the range of the original partition, and then use insert into select to write the data of the original partition into the virtual partition, and then replace the partition to achieve the effect of modifying the partition
  3. Merge or split partitions
    The partition range setting is unreasonable. If you want to reduce or expand the partition, you can set the corresponding virtual partition, and then use insert into select to write the data of the original partition to the virtual partition, and then replace the partition to achieve the effect of partition expansion.

Posted by dearsina on Wed, 24 Nov 2021 04:22:40 -0800