Slow log analysis in MySQL and TiDB of database Series

Keywords: Database MySQL

Slow logs are mainly used to record SQL statements that take more than a specified time to execute in the database. Through slow query logs, you can find out which statements are inefficient for optimization. This paper briefly introduces the slow log structure in MySQL and TiDB, and deepens the understanding of the slow log structure of distributed database through horizontal comparison.

1. Slow logs in MySQL

1.1 introduction to MySQL slow log

Slow logs are mainly used to record SQL statements that take more than a specified time to execute in the database. Through slow query logs, you can find out which statements are inefficient for optimization. By default, MySQL does not enable slow logging. You can modify slow_query_log parameter to open the slow log. The parameters related to slow log are described as follows:

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG ptuagfea-1635052662407)( https://note.youdao.com/favicon.ico )]

Generally, we only need to enable slow logging and configure the threshold time. Other parameters can be configured by default. In the actual production environment, it can be adjusted as needed, usually set to long_query_time is 100ms.

1.2 slow log analysis tool

1.2.1 enable slow query

By default, slow_ query_ The value of log is OFF, which means that slow query log is disabled. You can set slow_query_log. The following modification method is to temporarily start the slow query without restarting the database, but the slow query will become invalid after restarting mysql. If it takes effect after restart, the configuration file needs to be modified.

mysql> show variables like 'slow_query%';
+---------------------+-----------------------------------------------+
| Variable_name       | Value                                         |
+---------------------+-----------------------------------------------+
| slow_query_log      | OFF                                           |
| slow_query_log_file | /usr/local/mysql/data/tango-GDB-DB01-slow.log |
+---------------------+-----------------------------------------------+
2 rows in set (0.11 sec)
mysql> show variables like 'long%';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |
+-----------------+-----------+
1 row in set (0.00 sec)

#Start slow query and set the threshold to 100ms
mysql> set global slow_query_log=1;
Query OK, 0 rows affected (0.10 sec)

mysql> set long_query_time=0.1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'slow_query%';
+---------------------+-----------------------------------------------+
| Variable_name       | Value                                         |
+---------------------+-----------------------------------------------+
| slow_query_log      | ON                                            |
| slow_query_log_file | /usr/local/mysql/data/tango-GDB-DB01-slow.log |
+---------------------+-----------------------------------------------+
2 rows in set (0.00 sec)

mysql> show variables like 'long%';
+-----------------+----------+
| Variable_name   | Value    |
+-----------------+----------+
| long_query_time | 0.100000 |
+-----------------+----------+
1 row in set (0.00 sec)

After the slow log is enabled, run simple SQL to view the slow query log

mysql> select count(1) from sbtest.sbtest1 where id between 100 and 10000 ;
+----------+
| count(1) |
+----------+
|     9901 |
+----------+
1 row in set (0.55 sec)

[root@tango-GDB-DB01 data]# less tango-GDB-DB01-slow.log 
# Time: 2021-10-17T00:26:03.079183Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.546503  Lock_time: 0.386796 Rows_sent: 1  Rows_examined: 9901
SET timestamp=1634430362;
select count(1) from sbtest.sbtest1 where id between 100 and 10000;

Check the number of slow query logs through the command

mysql> show global status like '%Slow_queries%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Slow_queries  | 3     |
+---------------+-------+
1 row in set (0.49 sec)

1.2.2 log analysis tool mysqldumpslow

In the production environment, if you want to manually analyze logs, find and analyze SQL, it is obviously an individual effort. MySQL provides a log analysis tool mysqldumpslow. It's easy to use. You can follow - h to see the specific usage

[root@tango-GDB-DB01 data]# ../bin/mysqldumpslow  tango-GDB-DB01-slow.log
Reading mysql slow query log from tango-GDB-DB01-slow.log
Count: 1  Time=0.16s (0s)  Lock=0.39s (0s)  Rows=1.0 (1), root[root]@localhost
  select count(N) from sbtest.sbtest1 where id between N and N

The main function is to count the following attributes of different sql:

Number of occurrences (Count),
Maximum execution time (Time),
Time to wait for lock (Lock),
The total number of rows sent to the client (Rows),
The user and the sql statement itself (the format is abstracted. For example, limit 1 and 20 are represented by limit N and N)

1.2.3 open source tool Pt query Digest

PT query digest is a tool for analyzing MySQL slow queries. It can analyze binlog, General log, slowlog, SHOWPROCESSLIST or MySQL protocol data captured through tcpdump. The analysis results can be output to the file. The analysis process is to parameterize the conditions of the query statement, and then group and count the parameterized queries to calculate the execution time, times and proportion of each query. The problems can be found out and optimized with the help of the analysis results.

1) Install Pt query Digest

# yum install perl-DBI
# yum install perl-DBD-MySQL
# yum install perl-Time-HiRes
# yum install perl-IO-Socket-SSL
# yum -y install perl-Digest-MD5
cd /usr/local/src
wget percona.com/get/percona-toolkit.tar.gz
tar zxf percona-toolkit.tar.gz
cd percona-toolkit-3.3.1
perl Makefile.PL PREFIX=/usr/local/percona-toolkit
make && make install

2) PT query digest syntax and important options

pt-query-digest [OPTIONS] [FILES] [DSN]
--create-review-table When used--review When the parameter outputs the analysis results to a table, it will be automatically created if there is no table.
--create-history-table When used--history When the parameter outputs the analysis results to a table, it will be automatically created if there is no table.
--filter The input slow query is matched and filtered according to the specified string, and then analyzed
--limit Limit the percentage or number of output results. The default value is 20,That is, the slowest 20 statements are output. If it is 50%Then the total response time will be sorted from large to small, and the total output will reach 50%Position cut-off.
--host mysql server address
--user mysql user name
--password mysql User password
--history Save the analysis results to the table. The analysis results are detailed and can be used next time--history If the same statement exists and the time interval of the query is different from that in the history table, it will be recorded in the data table. You can query the same statement CHECKSUM To compare the historical changes of a certain type of query.
--review Save the analysis results to the table. This analysis only parameterizes the query criteria. One record for one type of query is relatively simple. When next used--review If the same statement analysis exists, it will not be recorded in the data table.
--output Analysis result output type. The value can be report(Standard analysis report),slowlog(Mysql slow log),json,json-anon，General use report，For easy reading.
--since When does the analysis start? The value is a string, which can be a specified " yyyy-mm-dd [hh:mm:ss]"The time point in format can also be a simple time value: s(second),h(hour),m(minute),d(day)，Such as 12 h It means that the statistics began 12 hours ago.
--until Deadline, with - since Slow queries over a period of time can be analyzed.

3) Analyze Pt query digest output

Part I: overall statistical results

# The tool performs log analysis of user time, system time, physical memory footprint, and virtual memory footprint
# 200ms user time, 130ms system time, 25.73M rss, 220.28M vsz
# Tool execution time
# Current date: Sat Oct 24 10:30:19 2021
# The hostname on which the analysis tool is running
# Hostname: tango-GDB-DB01
# File name to be analyzed
# Files: /usr/local/mysql/data/tango-GDB-DB01-slow.log
# Total statements, unique statements, QPS, concurrency
# Overall: 4 total, 4 unique, 0.01 QPS, 0.11x concurrency________________
# Time range of logging
# Time range: 2021-10-17T00:24:42 to 2021-10-17T00:29:33
# Attribute total minimum maximum average 95% standard medium
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Statement execution time
# Exec time            33s   398ms     32s      8s     32s     14s     16s
# Lock occupancy time
# Lock time          390ms   147us   387ms    97ms   374ms   162ms   188ms
# Number of rows sent to the client
# Rows sent              4       1       1       1       1       0       1
# Number of rows scanned by select statement
# Rows examine       9.67k       0   9.67k   2.42k   9.33k   4.04k   4.67k
# Number of characters to query
# Query size           208      28      74      52   72.65   18.73   69.27

Overall: how many queries are there in total
Time range: the time range of query execution
Unique: unique query quantity, that is, the total number of different queries after parameterization of query criteria
Total: total min: minimum max: maximum avg: Average
95%: arrange all values from small to large, and the number with the position at 95% is generally the most valuable for reference
Median: median, which arranges all values from small to large, with the position in the middle

Part II: query grouping statistics results

# Profile
# Rank Query ID                            Response time Calls R/Call  V/M
# ==== =================================== ============= ===== ======= ===
#    1 0x028E79DCDB99AC4C8AE06173AA02BA16  31.9112 95.7%     1 31.9112  0.00 SELECT sbtest?
# MISC 0xMISC                               1.4378  4.3%     3  0.4793   0.0 <3 ITEMS>

Rank: the ranking of all statements. By default, they are arranged in descending order of query time, which is specified by – order by
Query ID: the ID of the statement, (remove redundant spaces and text characters, and calculate the hash value)
Response: total response time
Time: the total time proportion of this query in this analysis
calls: execution times, that is, the total number of query statements of this type in this analysis
R/Call: average response time per execution
V/M: ratio of response time variance to mean
Item: query object

Part III: detailed statistical results of each query

# Query 1: 0 QPS, 0x concurrency, ID 0x028E79DCDB99AC4C8AE06173AA02BA16 at byte 382
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: all events occurred at 2021-10-17T00:24:42
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         25       1
# Exec time     95     32s     32s     32s     32s     32s       0     32s
# Lock time      0     2ms     2ms     2ms     2ms     2ms       0     2ms
# Rows sent     25       1       1       1       1       1       0       1
# Rows examine   0       0       0       0       0       0       0       0
# Query size    13      28      28      28      28      28       0      28
# String:
# Hosts        localhost
# Users        root
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
# Tables
#    SHOW TABLE STATUS LIKE 'sbtest1'\G
#    SHOW CREATE TABLE `sbtest1`\G
# EXPLAIN /*!50100 PARTITIONS*/
select count(1) from sbtest1\G

Based on the detailed statistical results of the query below, the top table lists the statistics of execution times, maximum, minimum, average, 95% and other items.
ID: the ID number of the query, which corresponds to the Query ID in the figure above
Databases: database name
Users: the number of times each user executes (percentage)
Query_time distribution: query time distribution. The length reflects the proportion of intervals. In this example, the number of queries between 1s and 10s is twice that of more than 10s.
Tables: the tables involved in the query
Explain: SQL statement

2. Slow logs in TiDB

2.1 introduction to slow logs in tidb

TiDB will output the statements whose execution time exceeds the slow threshold (the default value is 300ms) to the slow query file log file to help users locate slow query statements and analyze and solve the performance problems of SQL execution. TiDB enables slow query log by default. You can modify the configuration enable slow log to enable or disable it.

tidb_slow_log_threshold: set the threshold of the slow log. SQL statements whose execution time exceeds the threshold will be recorded in the slow log. The default value is 300 ms.
tidb_query_log_max_len: sets the maximum length of slow logging SQL statements. The default value is 4096 byte s.
tidb_redact_log: whether to desensitize user data when setting slow logging SQL? Replace. The default value is 0, which turns off the function.
tidb_enable_collect_execution_info: set whether to record the physical execution information of each operator in the execution plan. The default value is 1. The performance impact of this feature is about 3%.

Users can query INFORMATION_SCHEMA.SLOW_QUERY table is used to query the contents in the slow query log. The column names in the table correspond to the field names in the slow log one by one. In TiDB 4.0, SLOW_QUERY already supports querying the slow log of any TIME period, that is, it supports querying the data of the slow log file that has been rotate d. When querying, the user only needs to specify the TIME range to locate the slow log file to be parsed. If the query does not specify a TIME range, only the current slow log file will be parsed

2.2 using Pt query digest to analyze slow logs

The slow logs in TiDB can be parsed using the tool Pt query Digest:

[root@tango-centos01 bin]# ./pt-query-digest --report /tidb-deploy/tidb-4000/log/tidb_slow_query.log > /tmp/result.txt

# Query 1: 0.00 QPS, 0.00x concurrency, ID 0xF56F9650E481C1EAE6740F83A3E451C9 at byte 1705729
# This item is included in the report because it matches --limit.
# Scores: V/M = 45.86
# Time range: 2020-11-15T18:02:59 to 2021-03-15T19:32:58
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         27    1151
# Exec time     28   5698s   301ms    344s      5s     21s     15s      1s
# Query size    20 201.21k     179     183  179.01  174.84       0  174.84
# Backoff time  32     18s     2ms      6s      1s      5s      2s    91ms
# Compile time  33    577s   190us     42s   501ms      2s      2s     3ms
# Cop backoff   29     15s      2s      6s      4s      6s      2s      5s
# Cop backoff   29     57s     11s     19s     14s     19s      3s     17s
# Cop backoff    9     2ms     2ms     2ms     2ms     2ms       0     2ms
# Cop backoff   14      1s      1s      1s      1s      1s       0      1s
# Cop backoff   78      3s    61ms      2s   256ms   163ms   513ms    82ms
# Cop backoff   61     16s      1s      5s      1s      2s      1s   992ms
# Cop backoff   31   222ms     2ms   164ms    56ms   163ms    66ms   108ms
# Cop backoff   25     27s      1s     18s      7s     18s      7s     12s
# Cop proc avg  19    1.57       0    0.93    0.00       0    0.03       0
# Cop time      37    715s     1us     38s   639ms      2s      2s   219ms
# Cop wait avg  45   16.85       0    2.46    0.02    0.00    0.15       0
# Mem max       39  33.54M  30.69k  30.70k  30.69k  30.09k       0  30.09k
# Num cop task  27   1.09k       0       1    0.97    0.99    0.16    0.99
# Parse time    22    359s    47us     66s   312ms   705ms      3s   144us
# Process time  15      2s     1ms   934ms    31ms    56ms   134ms   972us
# Request coun  27   1.09k       1       1       1       1       0       1
# Total keys     0   1.09k       1       1       1       1       0       1
# Txn start ts  27 410.92E       0 376.21P 365.58P   1.25P 209.72T   1.25P
# Wait time     35     17s     1ms      2s    86ms   393ms   346ms   972us
# String:
# Cop proc add 192.168.112.102:20160 (656/56%)... 2 more
# Cop wait add 192.168.112.102:20160 (656/56%)... 2 more
# Digest       e0cf0e2dccc824f34e52c6341356354f77cce9342074...
# Has more res false
# Index names  [bind_info:time_index]
# Is internal  true
# Plan         tidb_decod... (1149/99%), tidb_decod... (2/0%)
# Plan digest  76900b4634678820c876c2855d6bd7e7a7b0f9b46a8d...
# Plan from ca false
# Prepared     false
# Stats        bind_info:pseudo
# Succ         true (1119/97%), false (32/2%)
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms  #######################################################
#    1s  ################################################################
#  10s+  ############
# Tables
#    SHOW TABLE STATUS FROM `mysql` LIKE 'bind_info'\G
#    SHOW CREATE TABLE `mysql`.`bind_info`\G
# EXPLAIN /*!50100 PARTITIONS*/
select original_sql, bind_sql, default_db, status, create_time, update_time, charset, collation from mysql.bind_info where update_time > "0000-00-00 00:00:00" order by update_time\G

2.3 fields in slow query

1) Slow Query basic information:

Time: indicates the log printing time.
Query_time: indicates the time spent executing this statement.
Parse_time: indicates the time spent in the syntax parsing phase of this statement.
Compile_time: indicates the time spent in the query optimization phase of this statement.
Query: represents an SQL statement. Query will not be printed in the slow log, but after mapping to the memory table, the corresponding field is called query.
Digest: indicates the fingerprint of the SQL statement.
Txn_start_ts: indicates the start time stamp of the transaction. It is also the unique ID of the transaction. You can use this value to find other transaction related logs in the TiDB log.
Is_internal: indicates whether it is an internal SQL statement of TiDB. true indicates the SQL statement executed inside the TiDB system, and false indicates the SQL statement executed by the user.
Index_ids: indicates the ID of the index involved in the statement.
Succ: indicates whether the statement was executed successfully.
Backoff_time: indicates the time to wait before retrying when the statement encounters an error that needs to be retried. Common errors that need to be retried include the following: lock, Region split, tikv server is busy.
Plan: represents the execution plan of the statement, using select tidb_ decode_ The plan ('xxx... ') SQL statement can parse the specific execution plan.
Prepared: indicates whether this statement is a Prepare or Execute request.
Plan_from_cache: indicates whether the statement hits the execution plan cache.
Rewrite_time: indicates the time spent in the query rewrite phase of this statement.
Preproc_subqueries: indicates the number of subqueries executed in advance in this statement. For example, where id in (select if from t) this subquery may be executed in advance.
Preproc_subqueries_time: indicates that the subquery executed in advance in this statement takes time.
Exec_retry_count: indicates the number of retries for the execution of this statement. It usually occurs in pessimistic transactions. When locking fails, try to execute the statement again.
Exec_retry_time: indicates the retry execution time of this statement. For example, if a query has been executed three times (the first two failed), Exec_retry_time represents the sum of the execution times of the previous two, Query_time minus Exec_retry_time is the last execution time.

2) Fields related to transaction execution:

Prewrite_time: indicates the time spent in the first phase (prewrite phase) of the two-phase commit of a transaction.
Commit_time: indicates the time spent in the second phase (commit phase) of the two-phase commit of a transaction.
Get_commit_ts_time: indicates the time taken to obtain the commit timestamp in the second phase (commit phase) of the two-phase commit of a transaction.
Local_latch_wait_time: indicates the time spent waiting for locks on the TiDB side before initiating the second phase (commit phase) in the two-phase commit of a transaction.
Write_ Key s: indicates the number of keys written by the transaction to the Write CF of TiKV.
Write_size: indicates the total write size of key or value when the transaction is committed.
Prewrite_region: indicates the number of tikv regions involved in the first phase (prewrite phase) of the two-phase commit of a transaction. Each region triggers a remote procedure call.

3) Fields related to memory usage:

Mem_max: indicates the maximum memory space used by TiDB during execution, in byte s.

4) Fields related to hard disk usage:

Disk_max: indicates the maximum hard disk space used by TiDB during execution, in byte s.

5) Fields related to users executing SQL:

User: indicates the user name to execute the statement.
Conn_ID: indicates the user's link ID. you can use a keyword similar to con:3 to find other logs related to the link in TiDB logs.
DB: represents the database used when executing the statement.

6) Fields related to TiKV Coprocessor Task:

Request_count: indicates the number of Coprocessor requests sent by this statement.
Total_keys: indicates the number of key s scanned by Coprocessor.
Process_time: the sum of the processing time of executing SQL in TiKV. Because the data will be sent to TiKV for execution in parallel, this value may exceed Query_time.
Wait_time: indicates the sum of the waiting time of this statement in TiKV, because the number of Coprocessor threads in TiKV is limited. When all Coprocessor threads are working, requests will be queued; When some requests in the queue take a long time, the waiting time of subsequent requests will increase.
Process_keys: indicates the number of keys processed by Coprocessor. Compared to total_keys，processed_keys does not contain older versions of MVCC. If processed_keys and total_keys are quite different, indicating that there are many old versions.
Cop_proc_avg: average execution time of cop task.
Cop_proc_p90: P90 quantile execution time of cop task.
Cop_proc_max: the maximum execution time of the cop task.
Cop_proc_addr: the address of the cop task with the longest execution time.
Cop_wait_avg: average waiting time of cop task.
Cop_wait_p90: P90 quantile waiting time of cop task.
Cop_wait_max: the maximum waiting time of the cop task.
Cop_wait_addr: the address of the cop task with the longest waiting time.
Cop_backoff_{backoff-type}_total_times: the total number of backoff caused by an error.
Cop_backoff_{backoff-type}_total_time: the total backoff time caused by some error.
Cop_backoff_{backoff-type}_max_time: the maximum backoff time caused by an error.
Cop_backoff_{backoff-type}_max_addr: the cop task address of the maximum backoff time caused by an error.
Cop_backoff_{backoff-type}_avg_time: the average backoff time caused by an error.
Cop_backoff_{backoff-type}_p90_time: P90 quantile backoff time caused by some error.

reference material:

Please indicate the original address for Reprint: https://blog.csdn.net/solihawk/article/details/120932818
The article will be synchronized in the official account of "shepherd's direction". Interested parties can official account. Thank you!

Posted by paparts on Sat, 23 Oct 2021 21:27:04 -0700

Programmer Group