pt-archiver - Archive rows from a MySQL table into another table or a file.
pt-archiver nibbles records from a MySQL table. The --source and --dest arguments use DSN syntax; if COPY is yes, --dest defaults to the key's value from --source.
pt-archiver is a component of Percona-Toolkit toolkit, which is mainly used to archive and clean MySQL table data. It can archive data into another table or file. pt-archiver does not affect the query performance of OLTP transactions in the process of cleaning table data. For data archiving, it can be archived to another table on another server or to a file, which can be loaded with LOAD DATA INFILE. This function is similar to incremental deletion of table history data.
This article is about the reorganization of the learning notes used by the pt-archiver tools before, and the use of the latest version of the tools at the same time to sort out the principles of knowledge.
Refer to my other blog post on getting and installing the Percona-Toolkit toolkit toolkit: Summary of pt-online-schema-change in Percona-Toolkit Installation section in.
Basic explanation
pt-archiver [OPTIONS] --source DSN --where WHERE
Common Options (OPTIONS)
--analyze Specify tools to perform the'Analysis YZE TABLE'operation on tables after data archiving is completed. Specify methods such as'- analysis = ds', s for source table, d for target table, or individually. --ask-pass The command line prompts password input to protect password security, provided the module perl-TermReadKey is installed. --buffer Specifies that the buffer data is refreshed to the file specified by the option'- file'and refreshed on submission. Only when a transaction commits, automatically refreshing the files specified in'- file'and refreshing the files to disk are disabled, which means that the files are refreshed by the operating system block, so some data is implicitly refreshed to disk before the transaction commits. The default is to refresh the file to disk after each line operation. --bulk-delete Specifying how a single statement deletes a chunk to delete rows in batches implicitly executes the option'- commit-each'. Using a single DELETE statement to delete the table rows corresponding to each chunk, the usual way is to delete rows by row through the primary key. Batch deletion will greatly improve the speed, but if there are complex'WHERE'conditions, it may be slower. --[no]bulk-delete-limit Default value:yes Specify the add options'--bulk-delete'and'--limit' to the statement being archived. --bulk-insert Insert rows by bulk insertion of chunk s using the LOAD DATA LOCAL INFILE method (implicit Specified options'--bulk-delete'and'--commit-each') Instead of inserting the INSERT statement line by line, it is faster than inserting the INSERT statement line by line. By implicitly creating temporary tables to store rows (chunks) that need to be inserted in batches, instead of directly inserting in batches, uniform data loading is performed after each chunk is completed in the temporary tables. To ensure data security, this option enforces the option'- bulk-delete', which effectively ensures that deletion takes place after the insertion is completely successful. --channel Specifies which master library data needs to be archived when the master-slave replication environment is multi-source replication, which is suitable for the case where multiple master libraries correspond to one slave Library in multi-source replication. --charset,-A Specifies the connection character set. --[no]check-charset Default value:yes Specify checks to ensure that the character set and table character set are the same when connecting to the database. --[no]check-columns Default value:yes Specify checks to ensure that the source table specified by the option'- source'and the target table specified by'- dest' have the same fields. Do not check the ranking and type of fields in the table, only check whether the fields exist in the source table and the target table. If there are different field differences, the tool will exit with an error. If you need to disable this check, specify'- no-check-columns'. --check-slave-lag Pause the archiving operation after specifying that the master-slave replication delay is greater than the value specified by the option'--max-lag'. By default, the tool checks all slave libraries, but this option only works on the specified slave libraries (via DSN connections). --check-interval Default value: 1s If the option'--check-slave-lag'is specified at the same time, the time specified by this option is the pause time when the tool finds the master-slave replication delay. Check every 100 lines. --columns,-c Specify the table fields that need to be archived, separated by','(comma) if there are more than one. --commit-each Specifies that submissions are made by the number of rows retrieved and archived at a time, which disables the option'- txn-size'. After each acquisition of table data and archiving, before acquiring the next data and the sleep time specified by the option'- sleep', the transaction submission and refresh option'- file'specified files are performed, and the transaction size is controlled by the option'- limit'. --host,-h Specify the database IP address of the connection. --port,-P Specify the database Port port to connect to. --user,-u Specify the database user to connect to. --password,-p Specify the database user password for the connection. --socket,-S Specifies the use of SOCKET file connections. --databases,-d Specify the database to connect to --source Specify the table that needs to be archived. This option is the one that must be specified and is represented by DSN. --dest Specify the target end table to be archived, and use DSN to represent it. If this option is not specified, the default is to specify that the source table is the same as the option'--source'. --where Specifies the data to be archived through the WHERE conditional statement, which is the option that must be specified. There is no need to add the'WHERE'keyword, and if the WHERE condition is really not required to restrict it, specify' -- where 1=1'. --file Specifies the file to which table data needs to be archived. Use a formatted naming style similar to MySQL DATE_FORMAT(). The file content is in the same format as the SELECT INTO OUTFILE statement in MySQL. The file naming options are as follows: ' % Y: Year, numeric, four digits % m: Month, 2 digits (Month, numeric (01.12)) % d: Day of the month, numeric (01.31) % H: Hour (00.23) % i: Minutes, numeric (00.59) % s: seconds (00.59) % D: Database name % t: Table name For example: -- file'/var/log/archive/%Y-%m-%d-%D.%t' ' --output-format Specify the format of the output of the option'--file'file content. By default, this option is not specified as a tab delimiter for fields. If this option is specified, the field delimiter is','(comma) and the field is enclosed by'"(double quotation marks). Examples of usage:'- output-format=dump'. --for-update Specifies that the FOR UPDATE clause is added to the SELECT statement executed for each archive. --share-lock Specifies that the LOCK IN SHARE MODE clause is added to the SELECT statement executed for each archive. --header Specifies that the field name is written in the first line of the file as the title. --ignore Specifies that the IGNORE option is added to the INSERT statement. --limit Default value: 1 Specifies the number of rows for each statement to obtain tables and archive tables. --local Specifies that OPTIMIZE and ANALYZE statements are not written to binlog. --max-lag Default value: 1s Specifies the maximum allowable master-slave replication delay time per second. If the master-slave delay exceeds the specified value after each row data acquisition, the archiving operation will be suspended for the value specified by the option'- check-interval'. The master-slave latency is checked again after the end of the dormancy time by the value of'Seconds_Behind_Master'queried from the library. If the master-slave replication delay is always greater than the parameter specified value or if the slave library stops replication, the operation will wait until the slave library restarts and the delay is less than the parameter specified value. --no-delete Specifies that the archived table data is not deleted. --optimize The specified tool performs the'OPTIMIZE TABLE'operation on the table after the data archiving is completed. Specify methods such as'- analysis = ds', s for source table, d for target table, or individually. --primary-key-only Specifies that only the primary key field is archived, which is short for the option'-- columns = primary key'. If the tool archive operation is the most effective for DELETE cleaning, it only needs to read one field of the primary key instead of all fields of the row. --progress Specifies how many lines of progress information are printed, the current time is printed, the time used, and how many lines are archived. --purge Specify cleaning operations to be performed instead of archiving operations. Operations that ignore the options'- dest'and'- file' are allowed, and it is more efficient to combine the option'- primary-key-only'with the purge operation only. --quiet,-q The specified tool executes silently and does not output any execution information. --replace Rewrite the INSERT statement to REPLACE statement when specifying the write option'--dest'to specify the target end table. --retries Default value: 1 Specifies the number of retries that an archive operation encounters deadlocks or timeouts. When the number of retries exceeds the value specified by this option, the tool exits with an error. --run-time Specifies the time that the tool archive operation needs to run before exiting. The allowable time suffix is s = seconds, m = minutes, h = hours, d = days. If not specified, the default is s. --[no]safe-auto-increment Default value:yes Specifies that rows corresponding to the maximum value of AUTO_INCREMENT are not used for archiving. This option adds an additional WHERE clause to prevent the tool from deleting rows with the maximum AUTO_INCREMENT attribute in a single column ascending field. In order to use the corresponding value of AUTO_INCREMENT after restarting the database, this will cause rows that cannot be archived or cleared for the maximum value of the field. --set-vars Acquiescence: wait_timeout=10000 innodb_lock_wait_timeout=1 lock_wait_timeout=60 When archiving tools, specify parameter values, such as multiple separated by','(comma). For example,'- set-vars=wait_timeout=5000'. --skip-foreign-key-checks Specifies the use statement SET FOREIGN_KEY_CHECKS = 0 to disable foreign key checking. --sleep Specifies when the tool needs to sleep to obtain archived data through SELECT statements, and the default value is no sleep. Transactions are not committed before hibernation, and the files specified by the option'- file'will not be refreshed. If the option'--commit-each'is specified, transaction commits and file refreshes occur before hibernation. --statistics Specify tools to collect and print time statistics for operations. Examples of statistical information are as follows: ' Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53 Source: D=db,t=table SELECT 4 INSERT 4 DELETE 4 Action Count Time Pct commit 10 0.1079 88.27 select 5 0.0047 3.87 deleting 4 0.0028 2.29 inserting 4 0.0028 2.28 other 0 0.0040 3.29 ' --txn-size Default: 1 Specifies the number of rows per transaction processed. If 0, the transaction function is disabled. --version Display the version of the tool and exit. --[no]version-check Default value:yes Check the latest versions of Percona Toolkit, MySQL and other programs. --why-quit Specifies the reason why the tool prints when the number of non-causal filing lines exits. When performing an automatic archiving task, this option is very convenient to use with the option'- run-time', which can determine whether the archiving task is completed within the specified time. If the option'- statistics'is specified at the same time, all reasons for exit are printed.
DSN Options (DSN)
DSN can be used to connect to the database. The DSN option is key=value. There can be no spaces on both sides of the equal sign, and case-sensitive. Before multiple options are separated by','(comma). The main options are as follows:
-
a
Under which library is the archiving operation performed, which is equivalent to the USE operation. -
A
Specify the default character set. -
b
When the value is true, SQL_LOG_BIN is forbidden, which is equivalent to SQL_LOG_BIN = 0. -
D
Specify the database that contains the tables that need to be archived. -
h
Specify the host to connect to. -
u
Specify the user of the connection. -
p
Specify the password required for the connection. -
P
Specify the port of the connection. -
S
Specify the connection SOCKET file. -
t
Specify the tables that need to be archived. -
i
Specify the index to be used.
Option Usage Description
- Tools need to specify at least one of the options -- dest, -- file, or -- purge;
- Options -- ignore and -- replace are mutually exclusive;
- Options -- txn-size and -- commit-each are mutually exclusive;
- Options -- share-lock and -- for-update are mutually exclusive;
- analyze and optimize are mutually exclusive.
Usage example
Environment and data preparation
MySQL: 5.7.24 192.168.58.3:3306
Percona Server: 192.168.58.3:3308
This article is based on MySQL official sample database employee: Example Databases Test.
This test is based on the employees table and the new employees_ptarc table.
- Create test table employees_ptarc
mysql admin@192.168.58.3:employees> show create table employees_ptarc; +-----------------+-----------------------------------------+ | Table | Create Table | +-----------------+-----------------------------------------+ | employees_ptarc | CREATE TABLE `employees_ptarc` ( | | | `id` int(11) NOT NULL AUTO_INCREMENT, | | | `v_int` int(11) DEFAULT NULL, | | | `v_string` varchar(50) DEFAULT NULL, | | | `s_string` char(20) NOT NULL, | | | PRIMARY KEY (`id`) | | | ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-----------------+-----------------------------------------+ 1 row in set Time: 0.019s
- Create stored procedure i_employees_ptarc to insert test data
delimiter $$ CREATE PROCEDURE i_employees_ptarc (IN row_num INT) BEGIN DECLARE i INT DEFAULT 0 ; WHILE i < row_num DO INSERT INTO employees_ptarc (v_int, v_string, s_string) VALUES ( floor(1 + rand() * 1000000), substring( MD5(RAND()), 1, floor(1 + rand() * 20) ), substring(MD5(RAND()), 1, 20) ) ; SET i = i + 1 ; END WHILE ; END$$ delimiter ; mysql admin@192.168.58.3:employees> call i_employees_ptarc(200000); Query OK, 1 row affected Time: 697.054s
- Basic information of test table employees_ptarc
mysql admin@192.168.58.3:employees> select min(id),max(id) from employees_ptarc; +---------+---------+ | min(id) | max(id) | +---------+---------+ | 1 | 200000 | +---------+---------+ 1 row in set Time: 0.025s mysql admin@192.168.58.3:employees> select count(*) from employees_ptarc; +----------+ | count(*) | +----------+ | 200000 | +----------+ 1 row in set Time: 0.064s mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc'; +----------------+ | auto_increment | +----------------+ | 200001 | +----------------+ 1 row in set Time: 0.029s
File tables to tables (line by line)
- Archive table data, but do not delete the source table archived data
-- The amount of data that needs to be archived mysql admin@192.168.58.3:employees> select count(*) from employees where first_name = 'Anneke'; +----------+ | count(*) | +----------+ | 225 | +----------+ 1 row in set Time: 0.025s -- Perform archiving operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where 'first_name = 'Anneke'' --progress 50 --txn-size=1000 --statistics --no-delete --ask-pass Enter password: Enter password: DBD::mysql::db selectrow_hashref failed: Table 'test.employees_arc' doesn't exist [for Statement "SHOW CREATE TABLE `test`.`employees_arc`"] at /usr/bin/pt-archiver line 1923, <STDIN> line 2.
Through error reporting information, we can see that the target end table does not exist. First, the target end table is created, which is consistent with the structure of the source end table, and then the archiving operation is carried out.
-- Create target end tables percona admin@192.168.58.3:test> show create table employees_arc; +---------------+----------------------------------------------------+ | Table | Create Table | +---------------+----------------------------------------------------+ | employees_arc | CREATE TABLE `employees_arc` ( | | | `emp_no` int(11) NOT NULL, | | | `birth_date` date NOT NULL, | | | `first_name` varchar(14) NOT NULL, | | | `last_name` varchar(16) NOT NULL, | | | `gender` enum('M','F') NOT NULL, | | | `hire_date` date NOT NULL, | | | PRIMARY KEY (`emp_no`), | | | KEY `idx_first_last` (`first_name`,`last_name`), | | | KEY `idx_birth_hire` (`birth_date`,`hire_date`), | | | KEY `idx_empno` (`emp_no`) | | | ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +---------------+----------------------------------------------------+ 1 row in set Time: 0.024s -- Filing again # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --statistics --no-delete --ask-pass Enter password: Enter password: TIME ELAPSED COUNT 2019-04-16T10:31:36 0 0 2019-04-16T10:31:37 0 50 2019-04-16T10:31:37 0 100 2019-04-16T10:31:37 0 150 2019-04-16T10:31:37 0 200 2019-04-16T10:31:37 0 225 Started at 2019-04-16T10:31:36, ended at 2019-04-16T10:31:37 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin Dest: A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_arc,u=admin SELECT 225 INSERT 225 DELETE 0 Action Count Time Pct select 114 0.5027 80.47 inserting 225 0.0572 9.15 commit 2 0.0469 7.50 other 0 0.0180 2.88 -- Query the amount of archived data in source and archive tables mysql admin@192.168.58.3:employees> select count(*) from employees where first_name = 'Anneke'; +----------+ | count(*) | +----------+ | 225 | +----------+ 1 row in set Time: 0.049s percona admin@192.168.58.3:test> select count(*) from employees_arc where first_name = 'Anneke'; +----------+ | count(*) | +----------+ | 225 | +----------+ 1 row in set Time: 0.023s
File tables to tables (batch)
The options involved in batch archiving are -- limit, bulk-insert, and bulk-insert. The option -- bulk-insert also specifies the option -- bulk-delete. If you do not delete the archived data, you need to specify the option -- no-delete.
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --no-delete --bulk-insert --ask-pass Enter password: Enter password: TIME ELAPSED COUNT 2019-04-16T10:34:17 0 0 2019-04-16T10:34:17 0 50 2019-04-16T10:34:17 0 100 2019-04-16T10:34:17 0 150 2019-04-16T10:34:17 0 200 2019-04-16T10:34:17 0 225 Started at 2019-04-16T10:34:17, ended at 2019-04-16T10:34:17 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin Dest: A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_arc,u=admin SELECT 225 INSERT 225 DELETE 0 Action Count Time Pct select 6 0.1171 81.13 bulk_inserting 5 0.0086 5.98 commit 2 0.0025 1.72 print_bulkfile 225 -0.0004 -0.25 other 0 0.0165 11.42
File tables to documents
Filing tables into files replaces the option -- dest with -- file, and adds the option -- output-format as needed.
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --file='/data/employees_arc_%Y-%m-%d.sql' --charset=utf8 --output-format='dump' --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --no-delete --ask-pass Enter password: TIME ELAPSED COUNT 2019-04-16T10:36:02 0 0 2019-04-16T10:36:02 0 50 2019-04-16T10:36:02 0 100 2019-04-16T10:36:02 0 150 2019-04-16T10:36:02 0 200 2019-04-16T10:36:02 0 225 Started at 2019-04-16T10:36:02, ended at 2019-04-16T10:36:02 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin SELECT 225 INSERT 0 DELETE 0 Action Count Time Pct select 6 0.1253 93.68 print_file 225 0.0004 0.33 commit 1 0.0001 0.05 other 0 0.0079 5.93
Table Clearance Data
If you only do table data cleaning without archiving, you can ignore the option - dest or - file. By specifying the option - purge, you can print the execution statement that needs to be cleared by using the option - dry-run first, and then execute it after confirming it.
-- Use options first--dry-run # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --ask-pass --dry-run Enter password: SELECT /*!40001 SQL_NO_CACHE */ `emp_no`,`birth_date`,`first_name`,`last_name`,`gender`,`hire_date` FROM `employees`.`employees` FORCE INDEX(`PRIMARY`) WHERE (first_name = 'Anneke') ORDER BY `emp_no` LIMIT 50 SELECT /*!40001 SQL_NO_CACHE */ `emp_no`,`birth_date`,`first_name`,`last_name`,`gender`,`hire_date` FROM `employees`.`employees` FORCE INDEX(`PRIMARY`) WHERE (first_name = 'Anneke') AND ((`emp_no` >= ?)) ORDER BY `emp_no` LIMIT 50 DELETE FROM `employees`.`employees` WHERE (`emp_no` = ?) -- Perform Clearance Operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --ask-pass Enter password: TIME ELAPSED COUNT 2019-04-16T10:40:57 0 0 2019-04-16T10:40:57 0 50 2019-04-16T10:40:58 0 100 2019-04-16T10:40:58 0 150 2019-04-16T10:40:58 0 200 2019-04-16T10:40:58 0 225 Started at 2019-04-16T10:40:57, ended at 2019-04-16T10:40:58 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin SELECT 225 INSERT 0 DELETE 225 Action Count Time Pct deleting 225 0.6943 79.10 select 6 0.1386 15.79 commit 1 0.0265 3.02 other 0 0.0183 2.08 -- Options can also be used--bulk-delete Batch Clearance # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --bulk-delete --statistics --ask-pass Enter password: TIME ELAPSED COUNT 2019-04-16T10:43:07 0 0 2019-04-16T10:43:07 0 50 2019-04-16T10:43:07 0 100 2019-04-16T10:43:07 0 150 2019-04-16T10:43:07 0 200 2019-04-16T10:43:07 0 225 Started at 2019-04-16T10:43:07, ended at 2019-04-16T10:43:07 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin SELECT 225 INSERT 0 DELETE 225 Action Count Time Pct select 6 0.1131 66.43 bulk_deleting 5 0.0384 22.54 commit 1 0.0153 8.97 other 0 0.0035 2.06
Table Self-Increasing Field Processing
By default, when the pt-archiver tool archives or clears table data, it restricts the operation of data rows corresponding to the field with AUTO_INCREMENT property by adding WHERE clause condition. This is to use () the previous value of AUTO_INCREMENT after the database restart, but this will result in less archived data or less data clearance.
-- Data sheet to be archived employees_ptarc information //You can view the'environment and data preparation'information. -- Specified options--dry-run Perform archiving operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_ptarc,A=utf8 --charset=utf8 --where "1 = 1" --progress=10000 --txn-size=10000 --limit=10000 --statistics --no-delete --bulk-insert --ask-pass --dry-run Enter password: Enter password: SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') ORDER BY `id` LIMIT 10000 SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') AND ((`id` > ?)) ORDER BY `id` LIMIT 10000 LOAD DATA LOCAL INFILE ? INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`) -- Specified options--dry-run Perform Clearance Operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "1 = 1" --progress=10000 --txn-size=10000 --limit=10000 --statistics --bulk-delete --ask-pass --dry-run Enter password: SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') ORDER BY `id` LIMIT 10000 SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') AND ((`id` >= ?)) ORDER BY `id` LIMIT 10000 DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (1 = 1) LIMIT 10000
It can be seen that when the tool executes the statement querying archived data, it specifies the condition ID < 200000, but in fact there is id = 200000 in the table, and it does not specify the condition to exclude this statement in the archiving operation, which is obviously the condition that the pt-archiver tool automatically adds.
This avoids the value of the AUTO_INCREMENT attribute that can be reused after a database restart. The following test instructions can be made:
-- surface employees_ptarc Current AUTO_INCREMENT Attribute value mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc'; +----------------+ | auto_increment | +----------------+ | 200001 | +----------------+ 1 row in set Time: 0.048s -- The amount of data to be cleaned up mysql admin@192.168.58.3:employees> select count(*) from employees_ptarc where id <=199990 or id > 199995; +----------+ | count(*) | +----------+ | 199995 | +----------+ 1 row in set Time: 0.102s -- Perform data cleanup operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --ask-pass Enter password: TIME ELAPSED COUNT 2019-04-16T10:47:59 0 0 2019-04-16T10:48:00 0 50000 2019-04-16T10:48:00 1 100000 2019-04-16T10:48:01 2 150000 2019-04-16T10:48:02 2 199994 Started at 2019-04-16T10:47:59, ended at 2019-04-16T10:48:02 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin SELECT 199994 INSERT 0 DELETE 199994 Action Count Time Pct bulk_deleting 4 0.9030 28.23 select 5 0.2095 6.55 commit 4 0.1562 4.88 other 0 1.9298 60.33 //You can see that only 1994 rows of data were cleared. -- Query for Uncleared Data mysql admin@192.168.58.3:employees> select * from employees_ptarc; +--------+--------+---------------+----------------------+ | id | v_int | v_string | s_string | +--------+--------+---------------+----------------------+ | 199991 | 60305 | 526ed | 0240a2d81e255c915b5a | | 199992 | 546438 | a85b6a18d | 0bf1d636cd0e536eb044 | | 199993 | 543327 | 1367a1c | 68908231ca18ed631907 | | 199994 | 99632 | 2f | 5c10f8d106a30bb1ef95 | | 199995 | 164172 | e57bba13eb3c1 | 3208ac758bd8c912c39f | | 200000 | 108936 | 3bc1db70b | 079f744bf2800ad62a9b | +--------+--------+---------------+----------------------+ 6 rows in set Time: 0.018s -- After restart, query the current AUTO_INCREMENT Attribute value mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc'; +----------------+ | auto_increment | +----------------+ | 200001 | +----------------+ 1 row in set Time: 0.014s
In order not to protect data rows with the maximum value of the AUTO_INCREMENT attribute, the tool provides an option -- no-safe-auto-increment, which is specified before testing:
-- Specify options--dry-run Perform Clearance Operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --no-safe-auto-increment --ask-pass --dry-run Enter password: SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199990 or id > 199995) ORDER BY `id` LIMIT 50000 SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199990 or id > 199995) AND ((`id` >= ?)) ORDER BY `id` LIMIT 50000 DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (id <= 199990 or id > 199995) LIMIT 50000 //From the above information, it can be seen that the maximum value of the AUTO_INCREMENT attribute field is not excluded. -- Perform data cleanup operations # pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --no-safe-auto-increment --ask-pass Enter password: TIME ELAPSED COUNT 2019-04-16T10:52:19 0 0 2019-04-16T10:52:20 0 50000 2019-04-16T10:52:20 1 100000 2019-04-16T10:52:21 1 150000 2019-04-16T10:52:22 2 199995 Started at 2019-04-16T10:52:19, ended at 2019-04-16T10:52:22 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin SELECT 199995 INSERT 0 DELETE 199995 Action Count Time Pct bulk_deleting 4 0.7500 27.80 select 5 0.1851 6.86 commit 4 0.0622 2.30 other 0 1.7007 63.03 //You can see that line 1995 was cleared. -- Query for Uncleared Data mysql admin@192.168.58.3:employees> select * from employees_ptarc; +--------+--------+---------------+----------------------+ | id | v_int | v_string | s_string | +--------+--------+---------------+----------------------+ | 199991 | 60305 | 526ed | 0240a2d81e255c915b5a | | 199992 | 546438 | a85b6a18d | 0bf1d636cd0e536eb044 | | 199993 | 543327 | 1367a1c | 68908231ca18ed631907 | | 199994 | 99632 | 2f | 5c10f8d106a30bb1ef95 | | 199995 | 164172 | e57bba13eb3c1 | 3208ac758bd8c912c39f | +--------+--------+---------------+----------------------+ 5 rows in set Time: 0.027s -- After restart, query the current AUTO_INCREMENT Attribute value mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc'; +----------------+ | auto_increment | +----------------+ | 199996 | +----------------+ 1 row in set Time: 0.046s
From the above query information, we can see that the value of the AUTO_INCREMENT attribute has changed. If you want to restore to the previous state value, you can repair it by executing the command manually:
mysql admin@192.168.58.3:employees> alter table employees_ptarc auto_increment = 200001; Query OK, 0 rows affected Time: 0.013s mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc'; +----------------+ | auto_increment | +----------------+ | 200001 | +----------------+ 1 row in set Time: 0.020s
Workflow
Open the general log s of two instances to understand the workflow of the pt-archiver tool. The workflow is illustrated by the use of table archiving to table (batch process), while deleting the archived data of the source table.
The executive orders are as follows:
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_ptarc,A=utf8 --charset=utf8 --where "id <= 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-insert --ask-pass Enter password: Enter password: TIME ELAPSED COUNT 2019-04-16T10:56:18 0 0 2019-04-16T10:56:19 1 50000 2019-04-16T10:56:21 2 100000 2019-04-16T10:56:23 4 150000 2019-04-16T10:56:24 6 199995 Started at 2019-04-16T10:56:18, ended at 2019-04-16T10:56:25 Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin Dest: A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_ptarc,u=admin SELECT 199995 INSERT 199995 DELETE 199995 Action Count Time Pct bulk_inserting 4 1.3027 19.51 bulk_deleting 4 0.7103 10.64 select 5 0.1872 2.80 commit 8 0.1455 2.18 print_bulkfile 199995 -0.3881 -5.81 other 0 4.7192 70.68
- Source instance general log
-- Initial checks for database parameters, load information 13 Connect admin@dbabd1 on employees using TCP/IP 13 Query set autocommit=0 13 Query /*!40101 SET NAMES "utf8"*/ 13 Query SHOW VARIABLES LIKE 'wait\_timeout' 13 Query SET SESSION wait_timeout=10000 13 Query SELECT @@SQL_MODE 13 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/ 13 Query SELECT version() 13 Query SHOW VARIABLES LIKE 'version%' 13 Query SHOW ENGINES 13 Query SHOW VARIABLES LIKE 'innodb_version' 13 Query show variables like 'innodb_rollback_on_timeout' 13 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 13 Query USE `employees` 13 Query SHOW CREATE TABLE `employees`.`employees_ptarc` 13 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 13 Query SELECT CONCAT(/*!40100 @@session.character_set_connection, */ "") 13 Query SHOW VARIABLES LIKE 'wsrep_on' 13 Query SHOW VARIABLES LIKE 'wsrep_on' 13 Query SHOW VARIABLES LIKE 'version%' 13 Query SHOW ENGINES 13 Query SHOW VARIABLES LIKE 'innodb_version' -- Determine the maximum boundaries for archived data, and there are options based on the primary key--limit Determine the data for each archive 13 Query SELECT MAX(`id`) FROM `employees`.`employees_ptarc` 13 Query SELECT CONCAT(@@hostname, @@port) 13 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') ORDER BY `id` LIMIT 50000 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' ......Omit... 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query commit 13 Query SELECT 'pt-archiver keepalive' -- After filing the data, depending on whether there are options--no-delete Delete 13 Query DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '1'))) AND (((`id` <= '50000'))) AND (id <= 199995) LIMIT 50000 -- Perform the next data archiving operation 13 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '50000')) ORDER BY `id` LIMIT 50000 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' ......Omit... 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query commit 13 Query SELECT 'pt-archiver keepalive' 13 Query DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '50001'))) AND (((`id` <= '100000'))) AND (id <= 199995) LIMIT 50000 13 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '100000')) ORDER BY `id` LIMIT 50000 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' ......Omit... 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query commit 13 Query SELECT 'pt-archiver keepalive' 13 Query DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '100001'))) AND (((`id` <= '150000'))) AND (id <= 199995) LIMIT 50000 13 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '150000')) ORDER BY `id` LIMIT 50000 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' ......Omit... 13 Query SELECT 'pt-archiver keepalive' 13 Query SELECT 'pt-archiver keepalive' 13 Query DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '150001'))) AND (((`id` <= '199995'))) AND (id <= 199995) LIMIT 50000 -- Does the final query still have archived data? 13 Query SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '199995')) ORDER BY `id` LIMIT 50000 13 Query commit 13 Quit
- Target end instance general log
-- Initial checks for database parameters, load information 62 Connect admin@dbabd1 on test using TCP/IP 62 Query set autocommit=0 62 Query /*!40101 SET NAMES "utf8"*/ 62 Query SHOW VARIABLES LIKE 'wait\_timeout' 62 Query SET SESSION wait_timeout=10000 62 Query SELECT @@SQL_MODE 62 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/ 62 Query SELECT version() 62 Query SHOW VARIABLES LIKE 'version%' 62 Query SHOW ENGINES 62 Query SHOW VARIABLES LIKE 'innodb_version' 62 Query show variables like 'innodb_rollback_on_timeout' 62 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ -- Switch to the target database 62 Query USE `test` 62 Query SHOW CREATE TABLE `test`.`employees_ptarc` 62 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 62 Query SELECT CONCAT(/*!40100 @@session.character_set_connection, */ "") 62 Query SELECT CONCAT(@@hostname, @@port) 62 Query commit -- Batch Insert Options--bulk-insert It is through LOAD DATA INFILE Method to generate temporary files for import 62 Query LOAD DATA LOCAL INFILE '/tmp/UGeTe6hEIPpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`) 62 Query commit 62 Query LOAD DATA LOCAL INFILE '/tmp/PNNmzgvqx6pt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`) 62 Query commit 62 Query LOAD DATA LOCAL INFILE '/tmp/pJDb48JQvQpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`) 62 Query LOAD DATA LOCAL INFILE '/tmp/xvb0NmYJiGpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`) 62 Query commit 62 Quit
summary
- pt-archiver is a very efficient tool for table data archiving. Archived data can be processed in batches to reduce performance consumption.
- If the instance opens GTID, because GTID does not support CTAS's syntax for creating tables, it can be processed with pt-archiver.
- For cross-instance or cross-server table data archiving, pt-archiver can run on the target server because the generated temporary files are on the server where the tool executes.
- Batch deletion of expired data from large tables can also be handled through the pt-archiver specified option, purge.
Reference resources
https://www.percona.com/doc/percona-toolkit/LATEST/pt-archiver.html
My level is limited, if there are any errors in the text, please leave a message to criticize and correct!