Summary of pt-archiver in Percona-Toolkit

Keywords: MySQL Database Attribute Session

pt-archiver - Archive rows from a MySQL table into another table or a file.

pt-archiver nibbles records from a MySQL table. The --source and --dest arguments use DSN syntax; if COPY is yes, --dest defaults to the key's value from --source.

pt-archiver is a component of Percona-Toolkit toolkit, which is mainly used to archive and clean MySQL table data. It can archive data into another table or file. pt-archiver does not affect the query performance of OLTP transactions in the process of cleaning table data. For data archiving, it can be archived to another table on another server or to a file, which can be loaded with LOAD DATA INFILE. This function is similar to incremental deletion of table history data.

This article is about the reorganization of the learning notes used by the pt-archiver tools before, and the use of the latest version of the tools at the same time to sort out the principles of knowledge.

Refer to my other blog post on getting and installing the Percona-Toolkit toolkit toolkit: Summary of pt-online-schema-change in Percona-Toolkit Installation section in.

Basic explanation

pt-archiver [OPTIONS] --source DSN --where WHERE

Common Options (OPTIONS)

--analyze
 Specify tools to perform the'Analysis YZE TABLE'operation on tables after data archiving is completed. Specify methods such as'- analysis = ds', s for source table, d for target table, or individually.

--ask-pass
 The command line prompts password input to protect password security, provided the module perl-TermReadKey is installed.

--buffer
 Specifies that the buffer data is refreshed to the file specified by the option'- file'and refreshed on submission.
Only when a transaction commits, automatically refreshing the files specified in'- file'and refreshing the files to disk are disabled, which means that the files are refreshed by the operating system block, so some data is implicitly refreshed to disk before the transaction commits. The default is to refresh the file to disk after each line operation.

--bulk-delete
 Specifying how a single statement deletes a chunk to delete rows in batches implicitly executes the option'- commit-each'.
Using a single DELETE statement to delete the table rows corresponding to each chunk, the usual way is to delete rows by row through the primary key. Batch deletion will greatly improve the speed, but if there are complex'WHERE'conditions, it may be slower.

--[no]bulk-delete-limit
 Default value:yes
 Specify the add options'--bulk-delete'and'--limit' to the statement being archived.

--bulk-insert
 Insert rows by bulk insertion of chunk s using the LOAD DATA LOCAL INFILE method (implicit Specified options'--bulk-delete'and'--commit-each')
Instead of inserting the INSERT statement line by line, it is faster than inserting the INSERT statement line by line. By implicitly creating temporary tables to store rows (chunks) that need to be inserted in batches, instead of directly inserting in batches, uniform data loading is performed after each chunk is completed in the temporary tables. To ensure data security, this option enforces the option'- bulk-delete', which effectively ensures that deletion takes place after the insertion is completely successful.

--channel
 Specifies which master library data needs to be archived when the master-slave replication environment is multi-source replication, which is suitable for the case where multiple master libraries correspond to one slave Library in multi-source replication.

--charset,-A
 Specifies the connection character set.

--[no]check-charset
 Default value:yes
 Specify checks to ensure that the character set and table character set are the same when connecting to the database.

--[no]check-columns
 Default value:yes
 Specify checks to ensure that the source table specified by the option'- source'and the target table specified by'- dest' have the same fields.
Do not check the ranking and type of fields in the table, only check whether the fields exist in the source table and the target table. If there are different field differences, the tool will exit with an error. If you need to disable this check, specify'- no-check-columns'.

--check-slave-lag
 Pause the archiving operation after specifying that the master-slave replication delay is greater than the value specified by the option'--max-lag'. By default, the tool checks all slave libraries, but this option only works on the specified slave libraries (via DSN connections).

--check-interval
 Default value: 1s
 If the option'--check-slave-lag'is specified at the same time, the time specified by this option is the pause time when the tool finds the master-slave replication delay. Check every 100 lines.

--columns,-c
 Specify the table fields that need to be archived, separated by','(comma) if there are more than one.

--commit-each
 Specifies that submissions are made by the number of rows retrieved and archived at a time, which disables the option'- txn-size'.
After each acquisition of table data and archiving, before acquiring the next data and the sleep time specified by the option'- sleep', the transaction submission and refresh option'- file'specified files are performed, and the transaction size is controlled by the option'- limit'.

--host,-h
 Specify the database IP address of the connection.

--port,-P
 Specify the database Port port to connect to.

--user,-u
 Specify the database user to connect to.

--password,-p
 Specify the database user password for the connection.

--socket,-S
 Specifies the use of SOCKET file connections.

--databases,-d
 Specify the database to connect to

--source
 Specify the table that needs to be archived. This option is the one that must be specified and is represented by DSN.

--dest
 Specify the target end table to be archived, and use DSN to represent it.
If this option is not specified, the default is to specify that the source table is the same as the option'--source'.

--where
 Specifies the data to be archived through the WHERE conditional statement, which is the option that must be specified. There is no need to add the'WHERE'keyword, and if the WHERE condition is really not required to restrict it, specify' -- where 1=1'.

--file
 Specifies the file to which table data needs to be archived. Use a formatted naming style similar to MySQL DATE_FORMAT().
The file content is in the same format as the SELECT INTO OUTFILE statement in MySQL. The file naming options are as follows:
'
% Y: Year, numeric, four digits        
% m: Month, 2 digits (Month, numeric (01.12))       
% d: Day of the month, numeric (01.31)    
% H: Hour (00.23)                
% i: Minutes, numeric (00.59)            
% s: seconds (00.59)             
% D: Database name            
% t: Table name                   

For example: -- file'/var/log/archive/%Y-%m-%d-%D.%t'
'

--output-format
 Specify the format of the output of the option'--file'file content.
By default, this option is not specified as a tab delimiter for fields. If this option is specified, the field delimiter is','(comma) and the field is enclosed by'"(double quotation marks). Examples of usage:'- output-format=dump'.

--for-update
 Specifies that the FOR UPDATE clause is added to the SELECT statement executed for each archive.

--share-lock
 Specifies that the LOCK IN SHARE MODE clause is added to the SELECT statement executed for each archive.

--header
 Specifies that the field name is written in the first line of the file as the title.

--ignore
 Specifies that the IGNORE option is added to the INSERT statement.

--limit
 Default value: 1
 Specifies the number of rows for each statement to obtain tables and archive tables.

--local
 Specifies that OPTIMIZE and ANALYZE statements are not written to binlog.

--max-lag
 Default value: 1s
 Specifies the maximum allowable master-slave replication delay time per second. If the master-slave delay exceeds the specified value after each row data acquisition, the archiving operation will be suspended for the value specified by the option'- check-interval'. The master-slave latency is checked again after the end of the dormancy time by the value of'Seconds_Behind_Master'queried from the library. If the master-slave replication delay is always greater than the parameter specified value or if the slave library stops replication, the operation will wait until the slave library restarts and the delay is less than the parameter specified value.

--no-delete
 Specifies that the archived table data is not deleted.

--optimize
 The specified tool performs the'OPTIMIZE TABLE'operation on the table after the data archiving is completed. Specify methods such as'- analysis = ds', s for source table, d for target table, or individually.

--primary-key-only
 Specifies that only the primary key field is archived, which is short for the option'-- columns = primary key'.
If the tool archive operation is the most effective for DELETE cleaning, it only needs to read one field of the primary key instead of all fields of the row.

--progress
 Specifies how many lines of progress information are printed, the current time is printed, the time used, and how many lines are archived.

--purge
 Specify cleaning operations to be performed instead of archiving operations. Operations that ignore the options'- dest'and'- file' are allowed, and it is more efficient to combine the option'- primary-key-only'with the purge operation only.

--quiet,-q
 The specified tool executes silently and does not output any execution information.

--replace
 Rewrite the INSERT statement to REPLACE statement when specifying the write option'--dest'to specify the target end table.

--retries
 Default value: 1
 Specifies the number of retries that an archive operation encounters deadlocks or timeouts. When the number of retries exceeds the value specified by this option, the tool exits with an error.

--run-time
 Specifies the time that the tool archive operation needs to run before exiting. The allowable time suffix is s = seconds, m = minutes, h = hours, d = days. If not specified, the default is s.

--[no]safe-auto-increment
 Default value:yes
 Specifies that rows corresponding to the maximum value of AUTO_INCREMENT are not used for archiving.
This option adds an additional WHERE clause to prevent the tool from deleting rows with the maximum AUTO_INCREMENT attribute in a single column ascending field. In order to use the corresponding value of AUTO_INCREMENT after restarting the database, this will cause rows that cannot be archived or cleared for the maximum value of the field.

--set-vars
 Acquiescence:
    wait_timeout=10000
    innodb_lock_wait_timeout=1
    lock_wait_timeout=60
 When archiving tools, specify parameter values, such as multiple separated by','(comma). For example,'- set-vars=wait_timeout=5000'.

--skip-foreign-key-checks
 Specifies the use statement SET FOREIGN_KEY_CHECKS = 0 to disable foreign key checking.

--sleep
 Specifies when the tool needs to sleep to obtain archived data through SELECT statements, and the default value is no sleep. Transactions are not committed before hibernation, and the files specified by the option'- file'will not be refreshed. If the option'--commit-each'is specified, transaction commits and file refreshes occur before hibernation.

--statistics
 Specify tools to collect and print time statistics for operations.
Examples of statistical information are as follows:
'
Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
Source: D=db,t=table
SELECT 4
INSERT 4
DELETE 4
Action         Count       Time        Pct
commit            10     0.1079      88.27
select             5     0.0047       3.87
deleting           4     0.0028       2.29
inserting          4     0.0028       2.28
other              0     0.0040       3.29
'

--txn-size
 Default: 1
 Specifies the number of rows per transaction processed. If 0, the transaction function is disabled.

--version
 Display the version of the tool and exit.

--[no]version-check
 Default value:yes
 Check the latest versions of Percona Toolkit, MySQL and other programs.

--why-quit
 Specifies the reason why the tool prints when the number of non-causal filing lines exits.
When performing an automatic archiving task, this option is very convenient to use with the option'- run-time', which can determine whether the archiving task is completed within the specified time. If the option'- statistics'is specified at the same time, all reasons for exit are printed.

DSN Options (DSN)

DSN can be used to connect to the database. The DSN option is key=value. There can be no spaces on both sides of the equal sign, and case-sensitive. Before multiple options are separated by','(comma). The main options are as follows:

  • a
    Under which library is the archiving operation performed, which is equivalent to the USE operation.
  • A
    Specify the default character set.
  • b
    When the value is true, SQL_LOG_BIN is forbidden, which is equivalent to SQL_LOG_BIN = 0.
  • D
    Specify the database that contains the tables that need to be archived.
  • h
    Specify the host to connect to.
  • u
    Specify the user of the connection.
  • p
    Specify the password required for the connection.
  • P
    Specify the port of the connection.
  • S
    Specify the connection SOCKET file.
  • t
    Specify the tables that need to be archived.
  • i
    Specify the index to be used.

Option Usage Description

  1. Tools need to specify at least one of the options -- dest, -- file, or -- purge;
  2. Options -- ignore and -- replace are mutually exclusive;
  3. Options -- txn-size and -- commit-each are mutually exclusive;
  4. Options -- share-lock and -- for-update are mutually exclusive;
  5. analyze and optimize are mutually exclusive.

Usage example

Environment and data preparation

MySQL: 5.7.24 192.168.58.3:3306
Percona Server: 192.168.58.3:3308

This article is based on MySQL official sample database employee: Example Databases Test.

This test is based on the employees table and the new employees_ptarc table.

  • Create test table employees_ptarc
mysql admin@192.168.58.3:employees> show create table employees_ptarc;
+-----------------+-----------------------------------------+
| Table           | Create Table                            |
+-----------------+-----------------------------------------+
| employees_ptarc | CREATE TABLE `employees_ptarc` (        |
|                 |   `id` int(11) NOT NULL AUTO_INCREMENT, |
|                 |   `v_int` int(11) DEFAULT NULL,         |
|                 |   `v_string` varchar(50) DEFAULT NULL,  |
|                 |   `s_string` char(20) NOT NULL,         |
|                 |   PRIMARY KEY (`id`)                    |
|                 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8    |
+-----------------+-----------------------------------------+
1 row in set
Time: 0.019s
  • Create stored procedure i_employees_ptarc to insert test data
delimiter $$
CREATE PROCEDURE i_employees_ptarc (IN row_num INT)
BEGIN

DECLARE i INT DEFAULT 0 ;
WHILE i < row_num DO
    INSERT INTO employees_ptarc (v_int, v_string, s_string)
VALUES
    (
        floor(1 + rand() * 1000000),
        substring(
            MD5(RAND()),
            1,
            floor(1 + rand() * 20)
        ),
        substring(MD5(RAND()), 1, 20)
    ) ;
SET i = i + 1 ;
END
WHILE ; END$$

delimiter ;

mysql admin@192.168.58.3:employees> call i_employees_ptarc(200000);
Query OK, 1 row affected
Time: 697.054s
  • Basic information of test table employees_ptarc
mysql admin@192.168.58.3:employees> select min(id),max(id) from employees_ptarc;
+---------+---------+
| min(id) | max(id) |
+---------+---------+
| 1       | 200000  |
+---------+---------+
1 row in set
Time: 0.025s
mysql admin@192.168.58.3:employees> select count(*) from employees_ptarc;
+----------+
| count(*) |
+----------+
| 200000   |
+----------+
1 row in set
Time: 0.064s
mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc';
+----------------+
| auto_increment |
+----------------+
| 200001         |
+----------------+
1 row in set
Time: 0.029s

File tables to tables (line by line)

  • Archive table data, but do not delete the source table archived data
-- The amount of data that needs to be archived
mysql admin@192.168.58.3:employees> select count(*) from employees where first_name = 'Anneke';
+----------+
| count(*) |
+----------+
| 225      |
+----------+
1 row in set
Time: 0.025s

-- Perform archiving operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where 'first_name = 'Anneke'' --progress 50 --txn-size=1000 --statistics --no-delete --ask-pass
Enter password:
Enter password:

DBD::mysql::db selectrow_hashref failed: Table 'test.employees_arc' doesn't exist [for Statement "SHOW CREATE TABLE `test`.`employees_arc`"] at /usr/bin/pt-archiver line 1923, <STDIN> line 2.

Through error reporting information, we can see that the target end table does not exist. First, the target end table is created, which is consistent with the structure of the source end table, and then the archiving operation is carried out.

-- Create target end tables
percona admin@192.168.58.3:test> show create table employees_arc;
+---------------+----------------------------------------------------+
| Table         | Create Table                                       |
+---------------+----------------------------------------------------+
| employees_arc | CREATE TABLE `employees_arc` (                     |
|               |   `emp_no` int(11) NOT NULL,                       |
|               |   `birth_date` date NOT NULL,                      |
|               |   `first_name` varchar(14) NOT NULL,               |
|               |   `last_name` varchar(16) NOT NULL,                |
|               |   `gender` enum('M','F') NOT NULL,                 |
|               |   `hire_date` date NOT NULL,                       |
|               |   PRIMARY KEY (`emp_no`),                          |
|               |   KEY `idx_first_last` (`first_name`,`last_name`), |
|               |   KEY `idx_birth_hire` (`birth_date`,`hire_date`), |
|               |   KEY `idx_empno` (`emp_no`)                       |
|               | ) ENGINE=InnoDB DEFAULT CHARSET=utf8               |
+---------------+----------------------------------------------------+
1 row in set
Time: 0.024s

-- Filing again
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --statistics --no-delete --ask-pass
Enter password:
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:31:36       0       0
2019-04-16T10:31:37       0      50
2019-04-16T10:31:37       0     100
2019-04-16T10:31:37       0     150
2019-04-16T10:31:37       0     200
2019-04-16T10:31:37       0     225
Started at 2019-04-16T10:31:36, ended at 2019-04-16T10:31:37
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin
Dest:   A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_arc,u=admin
SELECT 225
INSERT 225
DELETE 0
Action         Count       Time        Pct
select           114     0.5027      80.47
inserting        225     0.0572       9.15
commit             2     0.0469       7.50
other              0     0.0180       2.88

-- Query the amount of archived data in source and archive tables
mysql admin@192.168.58.3:employees> select count(*) from employees where first_name = 'Anneke';
+----------+
| count(*) |
+----------+
| 225      |
+----------+
1 row in set
Time: 0.049s

percona admin@192.168.58.3:test> select count(*) from employees_arc where first_name = 'Anneke';
+----------+
| count(*) |
+----------+
| 225      |
+----------+
1 row in set
Time: 0.023s

File tables to tables (batch)

The options involved in batch archiving are -- limit, bulk-insert, and bulk-insert. The option -- bulk-insert also specifies the option -- bulk-delete. If you do not delete the archived data, you need to specify the option -- no-delete.

# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_arc,A=utf8 --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --no-delete --bulk-insert --ask-pass
Enter password:
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:34:17       0       0
2019-04-16T10:34:17       0      50
2019-04-16T10:34:17       0     100
2019-04-16T10:34:17       0     150
2019-04-16T10:34:17       0     200
2019-04-16T10:34:17       0     225
Started at 2019-04-16T10:34:17, ended at 2019-04-16T10:34:17
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin
Dest:   A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_arc,u=admin
SELECT 225
INSERT 225
DELETE 0
Action              Count       Time        Pct
select                  6     0.1171      81.13
bulk_inserting          5     0.0086       5.98
commit                  2     0.0025       1.72
print_bulkfile        225    -0.0004      -0.25
other                   0     0.0165      11.42

File tables to documents

Filing tables into files replaces the option -- dest with -- file, and adds the option -- output-format as needed.

# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --file='/data/employees_arc_%Y-%m-%d.sql' --charset=utf8 --output-format='dump' --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --no-delete --ask-pass
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:36:02       0       0
2019-04-16T10:36:02       0      50
2019-04-16T10:36:02       0     100
2019-04-16T10:36:02       0     150
2019-04-16T10:36:02       0     200
2019-04-16T10:36:02       0     225
Started at 2019-04-16T10:36:02, ended at 2019-04-16T10:36:02
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin
SELECT 225
INSERT 0
DELETE 0
Action          Count       Time        Pct
select              6     0.1253      93.68
print_file        225     0.0004       0.33
commit              1     0.0001       0.05
other               0     0.0079       5.93

Table Clearance Data

If you only do table data cleaning without archiving, you can ignore the option - dest or - file. By specifying the option - purge, you can print the execution statement that needs to be cleared by using the option - dry-run first, and then execute it after confirming it.

-- Use options first--dry-run
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --ask-pass --dry-run
Enter password:

SELECT /*!40001 SQL_NO_CACHE */ `emp_no`,`birth_date`,`first_name`,`last_name`,`gender`,`hire_date` FROM `employees`.`employees` FORCE INDEX(`PRIMARY`) WHERE (first_name = 'Anneke') ORDER BY `emp_no` LIMIT 50
SELECT /*!40001 SQL_NO_CACHE */ `emp_no`,`birth_date`,`first_name`,`last_name`,`gender`,`hire_date` FROM `employees`.`employees` FORCE INDEX(`PRIMARY`) WHERE (first_name = 'Anneke') AND ((`emp_no` >= ?)) ORDER BY `emp_no` LIMIT 50
DELETE FROM `employees`.`employees` WHERE (`emp_no` = ?)

-- Perform Clearance Operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --statistics --ask-pass
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:40:57       0       0
2019-04-16T10:40:57       0      50
2019-04-16T10:40:58       0     100
2019-04-16T10:40:58       0     150
2019-04-16T10:40:58       0     200
2019-04-16T10:40:58       0     225
Started at 2019-04-16T10:40:57, ended at 2019-04-16T10:40:58
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin
SELECT 225
INSERT 0
DELETE 225
Action        Count       Time        Pct
deleting        225     0.6943      79.10
select            6     0.1386      15.79
commit            1     0.0265       3.02
other             0     0.0183       2.08

-- Options can also be used--bulk-delete Batch Clearance
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees,A=utf8 --purge --charset=utf8 --where "first_name = 'Anneke'" --progress=50 --txn-size=1000 --limit=50 --bulk-delete --statistics --ask-pass
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:43:07       0       0
2019-04-16T10:43:07       0      50
2019-04-16T10:43:07       0     100
2019-04-16T10:43:07       0     150
2019-04-16T10:43:07       0     200
2019-04-16T10:43:07       0     225
Started at 2019-04-16T10:43:07, ended at 2019-04-16T10:43:07
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees,u=admin
SELECT 225
INSERT 0
DELETE 225
Action             Count       Time        Pct
select                 6     0.1131      66.43
bulk_deleting          5     0.0384      22.54
commit                 1     0.0153       8.97
other                  0     0.0035       2.06

Table Self-Increasing Field Processing

By default, when the pt-archiver tool archives or clears table data, it restricts the operation of data rows corresponding to the field with AUTO_INCREMENT property by adding WHERE clause condition. This is to use () the previous value of AUTO_INCREMENT after the database restart, but this will result in less archived data or less data clearance.

-- Data sheet to be archived employees_ptarc information
//You can view the'environment and data preparation'information.

-- Specified options--dry-run Perform archiving operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_ptarc,A=utf8 --charset=utf8 --where "1 = 1" --progress=10000 --txn-size=10000 --limit=10000 --statistics --no-delete --bulk-insert --ask-pass --dry-run
Enter password:
Enter password:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') ORDER BY `id` LIMIT 10000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') AND ((`id` > ?)) ORDER BY `id` LIMIT 10000
LOAD DATA LOCAL INFILE ? INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`)

-- Specified options--dry-run Perform Clearance Operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "1 = 1" --progress=10000 --txn-size=10000 --limit=10000 --statistics --bulk-delete --ask-pass --dry-run
Enter password:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') ORDER BY `id` LIMIT 10000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (1 = 1) AND (`id` < '200000') AND ((`id` >= ?)) ORDER BY `id` LIMIT 10000
DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (1 = 1) LIMIT 10000

It can be seen that when the tool executes the statement querying archived data, it specifies the condition ID < 200000, but in fact there is id = 200000 in the table, and it does not specify the condition to exclude this statement in the archiving operation, which is obviously the condition that the pt-archiver tool automatically adds.

This avoids the value of the AUTO_INCREMENT attribute that can be reused after a database restart. The following test instructions can be made:

-- surface employees_ptarc Current AUTO_INCREMENT Attribute value
mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc';
+----------------+
| auto_increment |
+----------------+
| 200001         |
+----------------+
1 row in set
Time: 0.048s

-- The amount of data to be cleaned up
mysql admin@192.168.58.3:employees> select count(*) from employees_ptarc where id <=199990 or id > 199995;
+----------+
| count(*) |
+----------+
| 199995   |
+----------+
1 row in set
Time: 0.102s

-- Perform data cleanup operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --ask-pass
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:47:59       0       0
2019-04-16T10:48:00       0   50000
2019-04-16T10:48:00       1  100000
2019-04-16T10:48:01       2  150000
2019-04-16T10:48:02       2  199994
Started at 2019-04-16T10:47:59, ended at 2019-04-16T10:48:02
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin
SELECT 199994
INSERT 0
DELETE 199994
Action             Count       Time        Pct
bulk_deleting          4     0.9030      28.23
select                 5     0.2095       6.55
commit                 4     0.1562       4.88
other                  0     1.9298      60.33

//You can see that only 1994 rows of data were cleared.

-- Query for Uncleared Data
mysql admin@192.168.58.3:employees> select * from employees_ptarc;
+--------+--------+---------------+----------------------+
| id     | v_int  | v_string      | s_string             |
+--------+--------+---------------+----------------------+
| 199991 | 60305  | 526ed         | 0240a2d81e255c915b5a |
| 199992 | 546438 | a85b6a18d     | 0bf1d636cd0e536eb044 |
| 199993 | 543327 | 1367a1c       | 68908231ca18ed631907 |
| 199994 | 99632  | 2f            | 5c10f8d106a30bb1ef95 |
| 199995 | 164172 | e57bba13eb3c1 | 3208ac758bd8c912c39f |
| 200000 | 108936 | 3bc1db70b     | 079f744bf2800ad62a9b |
+--------+--------+---------------+----------------------+
6 rows in set
Time: 0.018s

-- After restart, query the current AUTO_INCREMENT Attribute value
mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc';
+----------------+
| auto_increment |
+----------------+
| 200001         |
+----------------+
1 row in set
Time: 0.014s

In order not to protect data rows with the maximum value of the AUTO_INCREMENT attribute, the tool provides an option -- no-safe-auto-increment, which is specified before testing:

-- Specify options--dry-run Perform Clearance Operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --no-safe-auto-increment --ask-pass --dry-run
Enter password:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199990 or id > 199995) ORDER BY `id` LIMIT 50000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199990 or id > 199995) AND ((`id` >= ?)) ORDER BY `id` LIMIT 50000
DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (id <= 199990 or id > 199995) LIMIT 50000

//From the above information, it can be seen that the maximum value of the AUTO_INCREMENT attribute field is not excluded.

-- Perform data cleanup operations
# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --purge --charset=utf8 --where "id <= 199990 or id > 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-delete --no-safe-auto-increment --ask-pass
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:52:19       0       0
2019-04-16T10:52:20       0   50000
2019-04-16T10:52:20       1  100000
2019-04-16T10:52:21       1  150000
2019-04-16T10:52:22       2  199995
Started at 2019-04-16T10:52:19, ended at 2019-04-16T10:52:22
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin
SELECT 199995
INSERT 0
DELETE 199995
Action             Count       Time        Pct
bulk_deleting          4     0.7500      27.80
select                 5     0.1851       6.86
commit                 4     0.0622       2.30
other                  0     1.7007      63.03

//You can see that line 1995 was cleared.

-- Query for Uncleared Data
mysql admin@192.168.58.3:employees> select * from employees_ptarc;
+--------+--------+---------------+----------------------+
| id     | v_int  | v_string      | s_string             |
+--------+--------+---------------+----------------------+
| 199991 | 60305  | 526ed         | 0240a2d81e255c915b5a |
| 199992 | 546438 | a85b6a18d     | 0bf1d636cd0e536eb044 |
| 199993 | 543327 | 1367a1c       | 68908231ca18ed631907 |
| 199994 | 99632  | 2f            | 5c10f8d106a30bb1ef95 |
| 199995 | 164172 | e57bba13eb3c1 | 3208ac758bd8c912c39f |
+--------+--------+---------------+----------------------+
5 rows in set
Time: 0.027s

-- After restart, query the current AUTO_INCREMENT Attribute value
mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc';
+----------------+
| auto_increment |
+----------------+
| 199996         |
+----------------+
1 row in set
Time: 0.046s

From the above query information, we can see that the value of the AUTO_INCREMENT attribute has changed. If you want to restore to the previous state value, you can repair it by executing the command manually:

mysql admin@192.168.58.3:employees> alter table employees_ptarc auto_increment = 200001;
Query OK, 0 rows affected
Time: 0.013s

mysql admin@192.168.58.3:employees> select auto_increment from information_schema.tables where table_schema = 'employees' and table_name = 'employees_ptarc';
+----------------+
| auto_increment |
+----------------+
| 200001         |
+----------------+
1 row in set
Time: 0.020s

Workflow

Open the general log s of two instances to understand the workflow of the pt-archiver tool. The workflow is illustrated by the use of table archiving to table (batch process), while deleting the archived data of the source table.

The executive orders are as follows:

# pt-archiver --source h=192.168.58.3,P=3306,u=admin,D=employees,t=employees_ptarc,A=utf8 --dest h=192.168.58.3,P=3308,u=admin,D=test,t=employees_ptarc,A=utf8 --charset=utf8 --where "id <= 199995" --progress=50000 --txn-size=50000 --limit=50000 --statistics --bulk-insert --ask-pass
Enter password:
Enter password:

TIME                ELAPSED   COUNT
2019-04-16T10:56:18       0       0
2019-04-16T10:56:19       1   50000
2019-04-16T10:56:21       2  100000
2019-04-16T10:56:23       4  150000
2019-04-16T10:56:24       6  199995
Started at 2019-04-16T10:56:18, ended at 2019-04-16T10:56:25
Source: A=utf8,D=employees,P=3306,h=192.168.58.3,p=...,t=employees_ptarc,u=admin
Dest:   A=utf8,D=test,P=3308,h=192.168.58.3,p=...,t=employees_ptarc,u=admin
SELECT 199995
INSERT 199995
DELETE 199995
Action              Count       Time        Pct
bulk_inserting          4     1.3027      19.51
bulk_deleting           4     0.7103      10.64
select                  5     0.1872       2.80
commit                  8     0.1455       2.18
print_bulkfile     199995    -0.3881      -5.81
other                   0     4.7192      70.68
  • Source instance general log
-- Initial checks for database parameters, load information
13 Connect  admin@dbabd1 on employees using TCP/IP
13 Query    set autocommit=0
13 Query    /*!40101 SET NAMES "utf8"*/
13 Query    SHOW VARIABLES LIKE 'wait\_timeout'
13 Query    SET SESSION wait_timeout=10000
13 Query    SELECT @@SQL_MODE
13 Query    SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/
13 Query    SELECT version()
13 Query    SHOW VARIABLES LIKE 'version%'
13 Query    SHOW ENGINES
13 Query    SHOW VARIABLES LIKE 'innodb_version'
13 Query    show variables like 'innodb_rollback_on_timeout'
13 Query    /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
13 Query    USE `employees`
13 Query    SHOW CREATE TABLE `employees`.`employees_ptarc`
13 Query    /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
13 Query    SELECT CONCAT(/*!40100 @@session.character_set_connection, */ "")
13 Query    SHOW VARIABLES LIKE 'wsrep_on'
13 Query    SHOW VARIABLES LIKE 'wsrep_on'
13 Query    SHOW VARIABLES LIKE 'version%'
13 Query    SHOW ENGINES
13 Query    SHOW VARIABLES LIKE 'innodb_version'

-- Determine the maximum boundaries for archived data, and there are options based on the primary key--limit Determine the data for each archive
13 Query    SELECT MAX(`id`) FROM `employees`.`employees_ptarc`
13 Query    SELECT CONCAT(@@hostname, @@port)
13 Query    SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') ORDER BY `id` LIMIT 50000
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'

......Omit...

13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    commit
13 Query    SELECT 'pt-archiver keepalive'

-- After filing the data, depending on whether there are options--no-delete Delete
13 Query    DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '1'))) AND (((`id` <= '50000'))) AND (id <= 199995) LIMIT 50000

-- Perform the next data archiving operation
13 Query    SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '50000')) ORDER BY `id` LIMIT 50000
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'

......Omit...

13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    commit
13 Query    SELECT 'pt-archiver keepalive'
13 Query    DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '50001'))) AND (((`id` <= '100000'))) AND (id <= 199995) LIMIT 50000
13 Query    SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '100000')) ORDER BY `id` LIMIT 50000
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'

......Omit...

13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    commit
13 Query    SELECT 'pt-archiver keepalive'
13 Query    DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '100001'))) AND (((`id` <= '150000'))) AND (id <= 199995) LIMIT 50000
13 Query    SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '150000')) ORDER BY `id` LIMIT 50000
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'

......Omit...

13 Query    SELECT 'pt-archiver keepalive'
13 Query    SELECT 'pt-archiver keepalive'
13 Query    DELETE FROM `employees`.`employees_ptarc` WHERE (((`id` >= '150001'))) AND (((`id` <= '199995'))) AND (id <= 199995) LIMIT 50000

-- Does the final query still have archived data?
13 Query    SELECT /*!40001 SQL_NO_CACHE */ `id`,`v_int`,`v_string`,`s_string` FROM `employees`.`employees_ptarc` FORCE INDEX(`PRIMARY`) WHERE (id <= 199995) AND (`id` < '200000') AND ((`id` >= '199995')) ORDER BY `id` LIMIT 50000
13 Query    commit
13 Quit 
  • Target end instance general log
-- Initial checks for database parameters, load information
62 Connect  admin@dbabd1 on test using TCP/IP
62 Query    set autocommit=0
62 Query    /*!40101 SET NAMES "utf8"*/
62 Query    SHOW VARIABLES LIKE 'wait\_timeout'
62 Query    SET SESSION wait_timeout=10000
62 Query    SELECT @@SQL_MODE
62 Query    SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/
62 Query    SELECT version()
62 Query    SHOW VARIABLES LIKE 'version%'
62 Query    SHOW ENGINES
62 Query    SHOW VARIABLES LIKE 'innodb_version'
62 Query    show variables like 'innodb_rollback_on_timeout'
62 Query    /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */

-- Switch to the target database
62 Query    USE `test`
62 Query    SHOW CREATE TABLE `test`.`employees_ptarc`
62 Query    /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
62 Query    SELECT CONCAT(/*!40100 @@session.character_set_connection, */ "")
62 Query    SELECT CONCAT(@@hostname, @@port)
62 Query    commit

-- Batch Insert Options--bulk-insert It is through LOAD DATA INFILE Method to generate temporary files for import
62 Query    LOAD DATA LOCAL INFILE '/tmp/UGeTe6hEIPpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`)
62 Query    commit
62 Query    LOAD DATA LOCAL INFILE '/tmp/PNNmzgvqx6pt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`)
62 Query    commit
62 Query    LOAD DATA LOCAL INFILE '/tmp/pJDb48JQvQpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`)
62 Query    LOAD DATA LOCAL INFILE '/tmp/xvb0NmYJiGpt-archiver' INTO TABLE `test`.`employees_ptarc`CHARACTER SET utf8(`id`,`v_int`,`v_string`,`s_string`)
62 Query    commit
62 Quit 

summary

  1. pt-archiver is a very efficient tool for table data archiving. Archived data can be processed in batches to reduce performance consumption.
  2. If the instance opens GTID, because GTID does not support CTAS's syntax for creating tables, it can be processed with pt-archiver.
  3. For cross-instance or cross-server table data archiving, pt-archiver can run on the target server because the generated temporary files are on the server where the tool executes.
  4. Batch deletion of expired data from large tables can also be handled through the pt-archiver specified option, purge.

Reference resources

https://www.percona.com/doc/percona-toolkit/LATEST/pt-archiver.html

My level is limited, if there are any errors in the text, please leave a message to criticize and correct!

Posted by Jewelie on Fri, 10 May 2019 06:43:39 -0700