Slow logs are mainly used to record SQL statements that take more than a specified time to execute in the database. Through slow query logs, you can find out which statements are inefficient for optimization. This paper briefly introduces the slow log structure in MySQL and TiDB, and deepens the understanding of the slow log structure of distributed database through horizontal comparison.
1. Slow logs in MySQL
1.1 introduction to MySQL slow log
Slow logs are mainly used to record SQL statements that take more than a specified time to execute in the database. Through slow query logs, you can find out which statements are inefficient for optimization. By default, MySQL does not enable slow logging. You can modify slow_query_log parameter to open the slow log. The parameters related to slow log are described as follows:
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG ptuagfea-1635052662407)( https://note.youdao.com/favicon.ico )]
Generally, we only need to enable slow logging and configure the threshold time. Other parameters can be configured by default. In the actual production environment, it can be adjusted as needed, usually set to long_query_time is 100ms.
1.2 slow log analysis tool
1.2.1 enable slow query
By default, slow_ query_ The value of log is OFF, which means that slow query log is disabled. You can set slow_query_log. The following modification method is to temporarily start the slow query without restarting the database, but the slow query will become invalid after restarting mysql. If it takes effect after restart, the configuration file needs to be modified.
mysql> show variables like 'slow_query%'; +---------------------+-----------------------------------------------+ | Variable_name | Value | +---------------------+-----------------------------------------------+ | slow_query_log | OFF | | slow_query_log_file | /usr/local/mysql/data/tango-GDB-DB01-slow.log | +---------------------+-----------------------------------------------+ 2 rows in set (0.11 sec) mysql> show variables like 'long%'; +-----------------+-----------+ | Variable_name | Value | +-----------------+-----------+ | long_query_time | 10.000000 | +-----------------+-----------+ 1 row in set (0.00 sec) #Start slow query and set the threshold to 100ms mysql> set global slow_query_log=1; Query OK, 0 rows affected (0.10 sec) mysql> set long_query_time=0.1; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'slow_query%'; +---------------------+-----------------------------------------------+ | Variable_name | Value | +---------------------+-----------------------------------------------+ | slow_query_log | ON | | slow_query_log_file | /usr/local/mysql/data/tango-GDB-DB01-slow.log | +---------------------+-----------------------------------------------+ 2 rows in set (0.00 sec) mysql> show variables like 'long%'; +-----------------+----------+ | Variable_name | Value | +-----------------+----------+ | long_query_time | 0.100000 | +-----------------+----------+ 1 row in set (0.00 sec)
After the slow log is enabled, run simple SQL to view the slow query log
mysql> select count(1) from sbtest.sbtest1 where id between 100 and 10000 ; +----------+ | count(1) | +----------+ | 9901 | +----------+ 1 row in set (0.55 sec) [root@tango-GDB-DB01 data]# less tango-GDB-DB01-slow.log # Time: 2021-10-17T00:26:03.079183Z # User@Host: root[root] @ localhost [] Id: 8 # Query_time: 0.546503 Lock_time: 0.386796 Rows_sent: 1 Rows_examined: 9901 SET timestamp=1634430362; select count(1) from sbtest.sbtest1 where id between 100 and 10000;
Check the number of slow query logs through the command
mysql> show global status like '%Slow_queries%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Slow_queries | 3 | +---------------+-------+ 1 row in set (0.49 sec)
1.2.2 log analysis tool mysqldumpslow
In the production environment, if you want to manually analyze logs, find and analyze SQL, it is obviously an individual effort. MySQL provides a log analysis tool mysqldumpslow. It's easy to use. You can follow - h to see the specific usage
[root@tango-GDB-DB01 data]# ../bin/mysqldumpslow tango-GDB-DB01-slow.log Reading mysql slow query log from tango-GDB-DB01-slow.log Count: 1 Time=0.16s (0s) Lock=0.39s (0s) Rows=1.0 (1), root[root]@localhost select count(N) from sbtest.sbtest1 where id between N and N
The main function is to count the following attributes of different sql:
- Number of occurrences (Count),
- Maximum execution time (Time),
- Time to wait for lock (Lock),
- The total number of rows sent to the client (Rows),
- The user and the sql statement itself (the format is abstracted. For example, limit 1 and 20 are represented by limit N and N)
1.2.3 open source tool Pt query Digest
PT query digest is a tool for analyzing MySQL slow queries. It can analyze binlog, General log, slowlog, SHOWPROCESSLIST or MySQL protocol data captured through tcpdump. The analysis results can be output to the file. The analysis process is to parameterize the conditions of the query statement, and then group and count the parameterized queries to calculate the execution time, times and proportion of each query. The problems can be found out and optimized with the help of the analysis results.
1) Install Pt query Digest
# yum install perl-DBI # yum install perl-DBD-MySQL # yum install perl-Time-HiRes # yum install perl-IO-Socket-SSL # yum -y install perl-Digest-MD5 cd /usr/local/src wget percona.com/get/percona-toolkit.tar.gz tar zxf percona-toolkit.tar.gz cd percona-toolkit-3.3.1 perl Makefile.PL PREFIX=/usr/local/percona-toolkit make && make install
2) PT query digest syntax and important options
pt-query-digest [OPTIONS] [FILES] [DSN] --create-review-table When used--review When the parameter outputs the analysis results to a table, it will be automatically created if there is no table. --create-history-table When used--history When the parameter outputs the analysis results to a table, it will be automatically created if there is no table. --filter The input slow query is matched and filtered according to the specified string, and then analyzed --limit Limit the percentage or number of output results. The default value is 20,That is, the slowest 20 statements are output. If it is 50%Then the total response time will be sorted from large to small, and the total output will reach 50%Position cut-off. --host mysql server address --user mysql user name --password mysql User password --history Save the analysis results to the table. The analysis results are detailed and can be used next time--history If the same statement exists and the time interval of the query is different from that in the history table, it will be recorded in the data table. You can query the same statement CHECKSUM To compare the historical changes of a certain type of query. --review Save the analysis results to the table. This analysis only parameterizes the query criteria. One record for one type of query is relatively simple. When next used--review If the same statement analysis exists, it will not be recorded in the data table. --output Analysis result output type. The value can be report(Standard analysis report),slowlog(Mysql slow log),json,json-anon,General use report,For easy reading. --since When does the analysis start? The value is a string, which can be a specified " yyyy-mm-dd [hh:mm:ss]"The time point in format can also be a simple time value: s(second),h(hour),m(minute),d(day),Such as 12 h It means that the statistics began 12 hours ago. --until Deadline, with - since Slow queries over a period of time can be analyzed.
3) Analyze Pt query digest output
- Part I: overall statistical results
# The tool performs log analysis of user time, system time, physical memory footprint, and virtual memory footprint # 200ms user time, 130ms system time, 25.73M rss, 220.28M vsz # Tool execution time # Current date: Sat Oct 24 10:30:19 2021 # The hostname on which the analysis tool is running # Hostname: tango-GDB-DB01 # File name to be analyzed # Files: /usr/local/mysql/data/tango-GDB-DB01-slow.log # Total statements, unique statements, QPS, concurrency # Overall: 4 total, 4 unique, 0.01 QPS, 0.11x concurrency________________ # Time range of logging # Time range: 2021-10-17T00:24:42 to 2021-10-17T00:29:33 # Attribute total minimum maximum average 95% standard medium # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Statement execution time # Exec time 33s 398ms 32s 8s 32s 14s 16s # Lock occupancy time # Lock time 390ms 147us 387ms 97ms 374ms 162ms 188ms # Number of rows sent to the client # Rows sent 4 1 1 1 1 0 1 # Number of rows scanned by select statement # Rows examine 9.67k 0 9.67k 2.42k 9.33k 4.04k 4.67k # Number of characters to query # Query size 208 28 74 52 72.65 18.73 69.27
- Overall: how many queries are there in total
- Time range: the time range of query execution
- Unique: unique query quantity, that is, the total number of different queries after parameterization of query criteria
- Total: total min: minimum max: maximum avg: Average
- 95%: arrange all values from small to large, and the number with the position at 95% is generally the most valuable for reference
- Median: median, which arranges all values from small to large, with the position in the middle
- Part II: query grouping statistics results
# Profile # Rank Query ID Response time Calls R/Call V/M # ==== =================================== ============= ===== ======= === # 1 0x028E79DCDB99AC4C8AE06173AA02BA16 31.9112 95.7% 1 31.9112 0.00 SELECT sbtest? # MISC 0xMISC 1.4378 4.3% 3 0.4793 0.0 <3 ITEMS>
- Rank: the ranking of all statements. By default, they are arranged in descending order of query time, which is specified by – order by
- Query ID: the ID of the statement, (remove redundant spaces and text characters, and calculate the hash value)
- Response: total response time
- Time: the total time proportion of this query in this analysis
- calls: execution times, that is, the total number of query statements of this type in this analysis
- R/Call: average response time per execution
- V/M: ratio of response time variance to mean
- Item: query object
- Part III: detailed statistical results of each query
# Query 1: 0 QPS, 0x concurrency, ID 0x028E79DCDB99AC4C8AE06173AA02BA16 at byte 382 # This item is included in the report because it matches --limit. # Scores: V/M = 0.00 # Time range: all events occurred at 2021-10-17T00:24:42 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 25 1 # Exec time 95 32s 32s 32s 32s 32s 0 32s # Lock time 0 2ms 2ms 2ms 2ms 2ms 0 2ms # Rows sent 25 1 1 1 1 1 0 1 # Rows examine 0 0 0 0 0 0 0 0 # Query size 13 28 28 28 28 28 0 28 # String: # Hosts localhost # Users root # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ ################################################################ # Tables # SHOW TABLE STATUS LIKE 'sbtest1'\G # SHOW CREATE TABLE `sbtest1`\G # EXPLAIN /*!50100 PARTITIONS*/ select count(1) from sbtest1\G
- Based on the detailed statistical results of the query below, the top table lists the statistics of execution times, maximum, minimum, average, 95% and other items.
- ID: the ID number of the query, which corresponds to the Query ID in the figure above
- Databases: database name
- Users: the number of times each user executes (percentage)
- Query_time distribution: query time distribution. The length reflects the proportion of intervals. In this example, the number of queries between 1s and 10s is twice that of more than 10s.
- Tables: the tables involved in the query
- Explain: SQL statement
2. Slow logs in TiDB
2.1 introduction to slow logs in tidb
TiDB will output the statements whose execution time exceeds the slow threshold (the default value is 300ms) to the slow query file log file to help users locate slow query statements and analyze and solve the performance problems of SQL execution. TiDB enables slow query log by default. You can modify the configuration enable slow log to enable or disable it.
- tidb_slow_log_threshold: set the threshold of the slow log. SQL statements whose execution time exceeds the threshold will be recorded in the slow log. The default value is 300 ms.
- tidb_query_log_max_len: sets the maximum length of slow logging SQL statements. The default value is 4096 byte s.
- tidb_redact_log: whether to desensitize user data when setting slow logging SQL? Replace. The default value is 0, which turns off the function.
- tidb_enable_collect_execution_info: set whether to record the physical execution information of each operator in the execution plan. The default value is 1. The performance impact of this feature is about 3%.
Users can query INFORMATION_SCHEMA.SLOW_QUERY table is used to query the contents in the slow query log. The column names in the table correspond to the field names in the slow log one by one. In TiDB 4.0, SLOW_QUERY already supports querying the slow log of any TIME period, that is, it supports querying the data of the slow log file that has been rotate d. When querying, the user only needs to specify the TIME range to locate the slow log file to be parsed. If the query does not specify a TIME range, only the current slow log file will be parsed
2.2 using Pt query digest to analyze slow logs
The slow logs in TiDB can be parsed using the tool Pt query Digest:
[root@tango-centos01 bin]# ./pt-query-digest --report /tidb-deploy/tidb-4000/log/tidb_slow_query.log > /tmp/result.txt # Query 1: 0.00 QPS, 0.00x concurrency, ID 0xF56F9650E481C1EAE6740F83A3E451C9 at byte 1705729 # This item is included in the report because it matches --limit. # Scores: V/M = 45.86 # Time range: 2020-11-15T18:02:59 to 2021-03-15T19:32:58 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 27 1151 # Exec time 28 5698s 301ms 344s 5s 21s 15s 1s # Query size 20 201.21k 179 183 179.01 174.84 0 174.84 # Backoff time 32 18s 2ms 6s 1s 5s 2s 91ms # Compile time 33 577s 190us 42s 501ms 2s 2s 3ms # Cop backoff 29 15s 2s 6s 4s 6s 2s 5s # Cop backoff 29 57s 11s 19s 14s 19s 3s 17s # Cop backoff 9 2ms 2ms 2ms 2ms 2ms 0 2ms # Cop backoff 14 1s 1s 1s 1s 1s 0 1s # Cop backoff 78 3s 61ms 2s 256ms 163ms 513ms 82ms # Cop backoff 61 16s 1s 5s 1s 2s 1s 992ms # Cop backoff 31 222ms 2ms 164ms 56ms 163ms 66ms 108ms # Cop backoff 25 27s 1s 18s 7s 18s 7s 12s # Cop proc avg 19 1.57 0 0.93 0.00 0 0.03 0 # Cop time 37 715s 1us 38s 639ms 2s 2s 219ms # Cop wait avg 45 16.85 0 2.46 0.02 0.00 0.15 0 # Mem max 39 33.54M 30.69k 30.70k 30.69k 30.09k 0 30.09k # Num cop task 27 1.09k 0 1 0.97 0.99 0.16 0.99 # Parse time 22 359s 47us 66s 312ms 705ms 3s 144us # Process time 15 2s 1ms 934ms 31ms 56ms 134ms 972us # Request coun 27 1.09k 1 1 1 1 0 1 # Total keys 0 1.09k 1 1 1 1 0 1 # Txn start ts 27 410.92E 0 376.21P 365.58P 1.25P 209.72T 1.25P # Wait time 35 17s 1ms 2s 86ms 393ms 346ms 972us # String: # Cop proc add 192.168.112.102:20160 (656/56%)... 2 more # Cop wait add 192.168.112.102:20160 (656/56%)... 2 more # Digest e0cf0e2dccc824f34e52c6341356354f77cce9342074... # Has more res false # Index names [bind_info:time_index] # Is internal true # Plan tidb_decod... (1149/99%), tidb_decod... (2/0%) # Plan digest 76900b4634678820c876c2855d6bd7e7a7b0f9b46a8d... # Plan from ca false # Prepared false # Stats bind_info:pseudo # Succ true (1119/97%), false (32/2%) # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms ####################################################### # 1s ################################################################ # 10s+ ############ # Tables # SHOW TABLE STATUS FROM `mysql` LIKE 'bind_info'\G # SHOW CREATE TABLE `mysql`.`bind_info`\G # EXPLAIN /*!50100 PARTITIONS*/ select original_sql, bind_sql, default_db, status, create_time, update_time, charset, collation from mysql.bind_info where update_time > "0000-00-00 00:00:00" order by update_time\G
2.3 fields in slow query
1) Slow Query basic information:
- Time: indicates the log printing time.
- Query_time: indicates the time spent executing this statement.
- Parse_time: indicates the time spent in the syntax parsing phase of this statement.
- Compile_time: indicates the time spent in the query optimization phase of this statement.
- Query: represents an SQL statement. Query will not be printed in the slow log, but after mapping to the memory table, the corresponding field is called query.
- Digest: indicates the fingerprint of the SQL statement.
- Txn_start_ts: indicates the start time stamp of the transaction. It is also the unique ID of the transaction. You can use this value to find other transaction related logs in the TiDB log.
- Is_internal: indicates whether it is an internal SQL statement of TiDB. true indicates the SQL statement executed inside the TiDB system, and false indicates the SQL statement executed by the user.
- Index_ids: indicates the ID of the index involved in the statement.
- Succ: indicates whether the statement was executed successfully.
- Backoff_time: indicates the time to wait before retrying when the statement encounters an error that needs to be retried. Common errors that need to be retried include the following: lock, Region split, tikv server is busy.
- Plan: represents the execution plan of the statement, using select tidb_ decode_ The plan ('xxx... ') SQL statement can parse the specific execution plan.
- Prepared: indicates whether this statement is a Prepare or Execute request.
- Plan_from_cache: indicates whether the statement hits the execution plan cache.
- Rewrite_time: indicates the time spent in the query rewrite phase of this statement.
- Preproc_subqueries: indicates the number of subqueries executed in advance in this statement. For example, where id in (select if from t) this subquery may be executed in advance.
- Preproc_subqueries_time: indicates that the subquery executed in advance in this statement takes time.
- Exec_retry_count: indicates the number of retries for the execution of this statement. It usually occurs in pessimistic transactions. When locking fails, try to execute the statement again.
- Exec_retry_time: indicates the retry execution time of this statement. For example, if a query has been executed three times (the first two failed), Exec_retry_time represents the sum of the execution times of the previous two, Query_time minus Exec_retry_time is the last execution time.
2) Fields related to transaction execution:
- Prewrite_time: indicates the time spent in the first phase (prewrite phase) of the two-phase commit of a transaction.
- Commit_time: indicates the time spent in the second phase (commit phase) of the two-phase commit of a transaction.
- Get_commit_ts_time: indicates the time taken to obtain the commit timestamp in the second phase (commit phase) of the two-phase commit of a transaction.
- Local_latch_wait_time: indicates the time spent waiting for locks on the TiDB side before initiating the second phase (commit phase) in the two-phase commit of a transaction.
- Write_ Key s: indicates the number of keys written by the transaction to the Write CF of TiKV.
- Write_size: indicates the total write size of key or value when the transaction is committed.
- Prewrite_region: indicates the number of tikv regions involved in the first phase (prewrite phase) of the two-phase commit of a transaction. Each region triggers a remote procedure call.
3) Fields related to memory usage:
- Mem_max: indicates the maximum memory space used by TiDB during execution, in byte s.
4) Fields related to hard disk usage:
- Disk_max: indicates the maximum hard disk space used by TiDB during execution, in byte s.
5) Fields related to users executing SQL:
- User: indicates the user name to execute the statement.
- Conn_ID: indicates the user's link ID. you can use a keyword similar to con:3 to find other logs related to the link in TiDB logs.
- DB: represents the database used when executing the statement.
6) Fields related to TiKV Coprocessor Task:
- Request_count: indicates the number of Coprocessor requests sent by this statement.
- Total_keys: indicates the number of key s scanned by Coprocessor.
- Process_time: the sum of the processing time of executing SQL in TiKV. Because the data will be sent to TiKV for execution in parallel, this value may exceed Query_time.
- Wait_time: indicates the sum of the waiting time of this statement in TiKV, because the number of Coprocessor threads in TiKV is limited. When all Coprocessor threads are working, requests will be queued; When some requests in the queue take a long time, the waiting time of subsequent requests will increase.
- Process_keys: indicates the number of keys processed by Coprocessor. Compared to total_keys,processed_keys does not contain older versions of MVCC. If processed_keys and total_keys are quite different, indicating that there are many old versions.
- Cop_proc_avg: average execution time of cop task.
- Cop_proc_p90: P90 quantile execution time of cop task.
- Cop_proc_max: the maximum execution time of the cop task.
- Cop_proc_addr: the address of the cop task with the longest execution time.
- Cop_wait_avg: average waiting time of cop task.
- Cop_wait_p90: P90 quantile waiting time of cop task.
- Cop_wait_max: the maximum waiting time of the cop task.
- Cop_wait_addr: the address of the cop task with the longest waiting time.
- Cop_backoff_{backoff-type}_total_times: the total number of backoff caused by an error.
- Cop_backoff_{backoff-type}_total_time: the total backoff time caused by some error.
- Cop_backoff_{backoff-type}_max_time: the maximum backoff time caused by an error.
- Cop_backoff_{backoff-type}_max_addr: the cop task address of the maximum backoff time caused by an error.
- Cop_backoff_{backoff-type}_avg_time: the average backoff time caused by an error.
- Cop_backoff_{backoff-type}_p90_time: P90 quantile backoff time caused by some error.
reference material:
- https://blog.csdn.net/enweitech/article/details/80239189
- https://www.cnblogs.com/luyucheng/p/6265873.html
- https://www.cnblogs.com/mysqljs/p/14779645.html
- https://docs.pingcap.com/zh/tidb/stable/identify-slow-queries
Please indicate the original address for Reprint: https://blog.csdn.net/solihawk/article/details/120932818
The article will be synchronized in the official account of "shepherd's direction". Interested parties can official account. Thank you!