MySQL slow query optimization

Keywords: MySQL

In the early stage of application development, the amount of data is small. When developing functions, the program pays more attention to the realization of functions. With the growth of production data, many SQL statements begin to expose performance problems and have a greater and greater impact on production. Sometimes these problematic SQL slow queries are the bottleneck of the performance of the whole system.

I   Introduction to SQL slow query

1.1. What is SQL slow query?

This refers to MySQL slow query, which specifically means that the running time exceeds long_ query_ SQL for time value.

We often hear that common MySQL logs include binary log binlog, relay log relaylog, redo rollback log redolog, undo, etc. For slow queries, there is also a slow query log, which is used to record statements whose response time exceeds the threshold in MySQL.

We should not be misled by the name of slow query. We think that the slow query log will only record the select statement. In fact, it will also record that the execution time exceeds long_ query_ DML statements such as insert and update with the threshold set by time.

#Check whether slow SQL is enabled

show variables like "slow_query_log%";

#View the threshold set by slow query unit: seconds

show variables like "long_query_time";

For the AliSQL-X-Cluster we use, namely XDB, the default slow query is on, long_query_time is set to 1 second.

1.2 why does slow SQL query cause failure?

Real slow SQL is often accompanied by a large number of line scanning, temporary file sorting or frequent disk flush ing. The direct impact is that the disk IO increases, and the normal SQL becomes slow SQL, resulting in large-area execution timeout.

What are the consequences of a slow query? Before, I always thought that it would be slower to return data, and the user experience would be worse? In fact, it's far more than that. I've experienced several online accidents, one of which was caused by a slow SQL query.

That time, an SQL query took 2-3 seconds and missed the index, resulting in a full table scan. Due to the high-frequency query, the concurrency quickly filled the DB thread pool, resulting in a large number of query requests piled up. The DB server CPU was 100% + for a long time and a large number of request timeout s... Finally, the system crashed and the boss came on stage! It can be seen that if the team does not pay enough attention to slow queries, the risk is great.

2, General steps of slow SQL optimization

1. Locate SQL statements with low execution efficiency through slow log query

2. explain analyze the execution plan of SQL

Focus on type, rows, filtered, and extra.

From top to bottom, the efficiency is getting higher and higher

  • ALL full table scan

  • index full scan

  • During range index range scanning, queries such as >, <, in and between appear in general condition queries

  • ref uses non unique index scanning or unique index prefix scanning to return a single record, which often appears in association queries

  • eq_ref is similar to ref, except that it uses a unique index and an associated query using a primary key

  • const/system for a single record, the system will treat other columns in the matching row as constants, such as primary key or unique index query

  • null MySQL does not access any tables or indexes and returns results directly. Although the efficiency is getting higher and higher from top to bottom, according to the cost model, it is assumed that there are two indexes idx1 (a, B, c) and idx2 (a, c), and the SQL is "select * from t where a = 1 and b in (1, 2) order by c"; If idx1 is used, the type is range. If idx2 is used, the type is ref; When the number of lines to be scanned is more than 5 times that of idx1, idx1 will be used, otherwise idx2 will be used

Extra

  • Using filesort: MySQL needs an extra pass to find out how to retrieve rows in sorted order. Sorting is accomplished by browsing all rows according to the join type and saving sorting keywords and row pointers for all rows that match the WHERE clause. The keywords are then sorted and the rows are retrieved in the sort order.

  • Using temporary: temporary tables are used to save intermediate results. The performance is particularly poor and needs to be optimized

  • Using index: indicates that the covering index is used in the corresponding select operation to avoid accessing the data rows of the table. The efficiency is good! If using where appears at the same time, it means that qualified data cannot be queried directly through index lookup.

  • Using index condition: the ICP added after MySQL 5.6 uses the ICP (index push down) to filter data at the storage engine layer instead of at the service layer, and uses the existing index data to reduce the data returned to the table.

3. show profile analysis

Understand the status and time consumption of SQL execution threads. It is closed by default, and the statement "set profiling = 1" is enabled

SHOW PROFILES ;

SHOW PROFILE FOR QUERY  #{id};

4,trace

Trace analyzes how the optimizer selects the execution plan. Through the trace file, you can further understand why the coupon selects the A execution plan instead of the B execution plan.

set optimizer_trace="enabled=on";

set optimizer_trace_max_mem_size=1000000;

select * from information_schema.optimizer_trace;

5. Identify problems and take corresponding measures

Optimize index

Optimize SQL statements: modify SQL, IN query segmentation, time query segmentation, and filter based on the last data

Use other implementation methods: ES, data warehouse, etc

Data fragment processing

3, Explain parsing slow query SQL

3.1. explain SQL execution plan

explain explains how mysql handles SQL statements, the loading order of tables, how tables are connected, and index usage. Is an important tool for SQL optimization.

Analyze the MySQL slow query log, and use the Explain keyword to simulate the optimizer to execute SQL query statements to analyze SQL slow query statements.

The following test table is an app information table of 137w data. Let's analyze it with an example.

The SQL example is as follows:

-- 1.185s

SELECT * from vio_basic_domain_info where app_name like '%translate%' ;

This is an ordinary fuzzy query statement. The query took 1.185s and 148 data were found.

The analysis results of Explain are shown in the following table. According to the table information, the SQL does not use the field app_ The index on name. The query type is full table scan. The number of rows scanned is 137w.

mysql> EXPLAIN SELECT * from vio_basic_domain_info where app_name like '%translate%' ;

+----+-------------+-----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+

| id | select_type | table                 | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |

+----+-------------+-----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+

|  1 | SIMPLE      | vio_basic_domain_info | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1377809 |    11.11 | Using where |

+----+-------------+-----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+

1 row in set, 1 warning (0.00 sec)

When the index is used in this SQL, the SQL is as follows: query time: 0.156s, 141 data are found:

-- 0.156s

SELECT * from vio_basic_domain_info where app_name like 'translate%' ;

The results of Explain analysis are shown in the table below; According to the table information, IDX is used in this SQL_ app_ Name index. The query type is index range query. The number of rows scanned is 141.

Because the columns of the query are not all in the index (select *), the table is returned once and the data of other columns is retrieved.

mysql> EXPLAIN SELECT * from vio_basic_domain_info where app_name like 'translate%' ;

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+

| id | select_type | table                 | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                 |

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+

|  1 | SIMPLE      | vio_basic_domain_info | NULL       | range | idx_app_name  | idx_app_name | 515     | NULL |  141 |   100.00 | Using index condition |

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+

1 row in set, 1 warning (0.00 sec)

When this SQL is used to overwrite the index, the SQL is as follows: query time: 0.091s, 141 data are found.

-- 0.091s

SELECT app_name from vio_basic_domain_info where app_name like 'translate%' ;

The results of Explain analysis are shown in the table below; According to the table information, the index is used as in the SQL above. Since the query column is included in the index column, the table return time of 0.06s is saved.

mysql> EXPLAIN SELECT app_name from vio_basic_domain_info where app_name like 'translate%' ;

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

| id | select_type | table                 | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                    |

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

|  1 | SIMPLE      | vio_basic_domain_info | NULL       | range | idx_app_name  | idx_app_name | 515     | NULL |  141 |   100.00 | Using where; Using index |

+----+-------------+-----------------------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

1 row in set, 1 warning (0.00 sec)

So how to analyze SQL through EXPLAIN parsing results? What does each column attribute represent? Look down together.

3.2. The interviewer asked: what are your opinions on the explain fields in mysql?

First, we need to understand the meaning of each field in order to make better use of the keyword explain

(1) The properties of each column are described below

  • id: the query sequence number of the SELECT, which reflects the execution priority. If it is a sub query, the id sequence number will increase. The larger the id value, the higher the priority, and the first to be executed;

  • select_type: indicates the type of query;

  • Table: the table that outputs the result set. If an alias is set, it will also be displayed;

  • Partitions: matching partitions;

  • type: access method to the table;

  • possible_keys: indicates the index that may be used during query;

  • key: indicates the index actually used;

  • key_len: length of index field;

  • ref: comparison between column and index;

  • Rows: number of rows scanned (estimated number of rows);

  • Filtered: percentage of rows filtered by table criteria;

  • Extra: description and description of implementation. This column shows some additional information, which is important.

(2) common attributes for slow query analysis

1)type

The access method to the table indicates how MySQL finds the required row in the table, also known as "access type".

The existing types are: ALL, index, range, ref, eq_ref, const, system, NULL (from left to right, performance from low to high).

Introduce three things we see every day:

  • ALL: (Full Table Scan) MySQL will traverse the whole table to find matching rows, which is often called Full Table Scan;

  • Index: (Full Index Scan) the difference between index and ALL is that the index type only traverses the index tree;

  • Range: retrieve only rows in a given range, and use an index to select rows.

system: the table has only one row of records. This is a special case of const. It generally does not appear and can be ignored

const: indicates that the index can be found once. const is used to compare the primary key or unique index. Because only one row of data is matched, it is very fast.

eq_ref: unique index scan. Only one record in the table matches it. Generally, two tables are associated. The fields in the association conditions are primary keys or unique indexes.

ref: non unique row index scan, which returns all rows matching a single value

Range: Retrieves rows in a given range. Queries such as >, <, in and between appear in general condition queries

Index: traverse the index tree. It is usually faster than all because the index file is usually smaller than the data file. Both all and index read the whole table, but index is retrieved from the index, while all is retrieved from the hard disk.

all: traverse the entire table to find matching rows

2)key

The key column shows the index actually used by SQL, which is usually possible_ One of the indexes in the keys column. The MySQL optimizer usually selects a more suitable index by calculating the number of scan rows. If no index is selected, NULL is returned.

Of course, the MySQL optimizer has an error in selecting an index. You can force Mysql to "use or ignore an index" by modifying SQL:

  • Force to use one index: FORCE INDEX (index_name), USE INDEX (index_name);

  • Force an index to be ignored: IGNORE INDEX (index_name).

3)rows

Rows is the number of rows MySQL estimates to read (scan) in order to find the required rows. It may not be accurate.

4)Extra

This column shows some additional information, which is important.

Using index: the query column is overwritten by the index, and the where filter condition is that the index is the leading column, and in Extra is using index. It means that qualified data can be found directly through index lookup without returning to the table.

Note: the leading column generally refers to the first column or "previous columns" in the joint index and the case of single column index; Here, for convenience of understanding, I collectively refer to it as the leading column.

Using where: indicates that the MySQL server will filter the rows after the storage engine retrieves them; That is, the index is not used and the query is returned to the table.

Possible causes:

  • The query column is not overwritten by the index;

  • The where filter is not the leading column of the index or cannot use the index correctly.

Using temporary: this means that MySQL will use a temporary table when sorting query results.

Using filesort: indicates that MySQL will use an external index to sort the results instead of reading rows from the table in index order.

Using index condition: the columns queried are not all in the index. The where condition is the range of a leading column. Newly added after 5.6 indicates that the queried columns have non indexed columns. First judge the index conditions to reduce disk IO

Using where; Using index: the query column is overwritten by the index, and the where filter condition is one of the index columns, but not the leading column of the index, or there are other situations that affect the direct use of the index (such as range filter conditions). In Extra, it is using where; Using index means that qualified data cannot be queried directly through index lookup, which has little impact.

4, Scene analysis

4.1 leftmost matching

Indexes

KEY `idx_shopid_orderno` (`shop_id`,`order_no`)

SQL statement

select * from _t where orderno=''

Query matching is from left to right, and order is used_ No, if you want to go to the index, you must carry the shop in the query criteria_ ID or index (shop_id,order_no). In addition, wechat search pays attention to the Java technology stack and replies in the background: for the interview, you can get my java series interview questions and answers, which are very complete.

4.2 implicit conversion

Indexes

KEY `idx_mobile` (`mobile`)

SQL statement

select * from _user where mobile=12345678901

Implicit conversion is equivalent to an operation on the index, which will invalidate the index. mobile is a character type and uses numbers. String matching should be used, otherwise MySQL will use implicit replacement, resulting in index failure.

4.3 large paging

Indexes

KEY `idx_a_b_c` (`a`, `b`, `c`)

SQL statement

select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;

For large paging scenarios, the product optimization requirements can be given priority. If there is no optimization, there are two optimization methods. One is to transfer the last data of the last time, that is, the above c, and then process it with "c < XXX". However, this generally needs to change the interface protocol, which is not necessarily feasible.

The other is to use the delayed association method to reduce SQL back to the table, but remember that the index needs to be completely overwritten. The SQL changes are as follows

select t1.* from _t t1, (select id from _t where a = 1 and b = 2 order by c desc limit 10000, 10) t2 where t1.id = t2.id;

4.4,in + order by

Indexes

KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)

SQL statement

select * from _order where shop_id = 1 and order_status in (1, 2, 3) order by created_at desc limit 10

In query is searched through n*m at the bottom of MySQL. It is similar to union, but its efficiency is higher than Union. When calculating the cost of in query (cost = number of tuples * average IO value), the number of tuples is obtained by querying the values contained in in one by one, so the calculation process will be relatively slow. Therefore, MySQL sets a critical value (eq_range_index_dive_limit). After 5.6, the cost of this column will not participate in the calculation.

Therefore, the selection of execution plan will be inaccurate. The default value is 200, that is, if the in condition exceeds 200 data, there will be problems in the cost calculation of in, and the index selected by Mysql may be inaccurate.

The processing method can be (order_status,   created_at) swap the sequence before and after, and adjust the SQL to delay Association.

4.5. Range query is blocked, and subsequent fields cannot be indexed

Indexes

KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)

SQL statement

select * from _order where shop_id = 1 and created_at > '2021-01-01 00:00:00' and order_status = 10

Range query also includes "IN, between"

4.6. Quick search not equal to or including indexes that cannot be used. (ICP can be used)

select * from _order where shop_id=1 and order_status not in (1,2)
select * from _order where shop_id=1 and order_status != 1

On the index, avoid NOT,! =, < >,! <,! > NOT EXISTS, NOT IN, NOT LIKE, etc

4.7. The optimizer chooses not to use the index

If the amount of data required to be accessed is very small, the optimizer will still select the secondary index. However, when the accessed data accounts for a large part of the data in the whole table (generally about 20%), the optimizer will select the clustered index to find the data.

select * from _order where  order_status = 1

Query all unpaid orders. Generally, there are few such orders. Even if the index is built, the index cannot be used.

4.8 complex query

select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';
select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;

If some data is collected, data warehouse may be used to solve it;

If there is such a complex query in business, it may not be recommended to continue to use SQL, but use other methods to solve it, such as ES.

4.9 mixed use of asc and desc

select * from _t where a=1 order by b desc, c asc

When desc and asc are mixed, the index will become invalid

4.10 big data

For the data storage of push business, the amount of data may be large. If you choose to store it on MySQL and save it with a validity period of 7 days.

It should be noted that frequent data cleaning will result in data fragmentation. It is necessary to contact the DBA for data fragmentation processing.

Reference link:

MySQL learning notes - how to write better SQL

Query too slow? SQL optimization is right!

Boss: who makes this kind of SQL slow query accident again, just leave!

Posted by mad81 on Mon, 11 Oct 2021 10:42:54 -0700