Execution Order of SQL Logical Query Statements

Keywords: MySQL SQL

The first condition for efficient index use is to know what queries will be used in the index. This problem is related to the "Leftmost Prefix Principle" in B+Tree. Here is an example to illustrate the Leftmost Prefix Principle.

Let's start with the concept of joint indexing. In the above, we all assume that the index refers to only a single column. In fact, the index in MySQL can refer to multiple columns in a certain order. This kind of index is called a joint index. Generally, a joint index is an ordered tuple, in which each element is a column of a data table. In fact, relational algebra is needed to define the index strictly, but I don't want to discuss it here. There are too many topics about relational algebra, because that would seem boring, so there is no more strict definition here. In addition, a single column index can be considered as a special case of a joint index element number of 1.

Take the employees.titles table as an example. Let's first look at the indexes on it:

SHOW INDEX FROM employees.titles;
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Null | Index_type |
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+
| titles |          0 | PRIMARY  |            1 | emp_no      | A         |        NULL |      | BTREE      |
| titles |          0 | PRIMARY  |            2 | title       | A         |        NULL |      | BTREE      |
| titles |          0 | PRIMARY  |            3 | from_date   | A         |      443308 |      | BTREE      |
| titles |          1 | emp_no   |            1 | emp_no      | A         |      443308 |      | BTREE      |
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+

From the results, you can see that the main index of the titles table is, and there is an auxiliary index. To avoid complicating things with multiple indexes (MySQL's SQL optimizer behaves more complex with multiple indexes), here we drop the auxiliary index:

ALTER TABLE employees.titles DROP INDEX emp_no;

This allows you to concentrate on analyzing the behavior of index PRIMARY.

Case 1: Full column matching.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title='Senior Engineer' AND from_date='1986-06-26';
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref               | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
|  1 | SIMPLE      | titles | const | PRIMARY       | PRIMARY | 59      | const,const,const |    1 |       |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+

Obviously, an index can be used when an exact match is made according to all columns in the index (in this case, an exact match refers to a "=" or "IN" match). One thing to note here is that indexing is theoretically sequence sensitive, but because MySQL's query optimizer automatically adjusts the conditional order of where clauses to use suitable indexes, such as reversing the conditional order in where:

EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26' AND emp_no='10001' AND title='Senior Engineer';
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref               | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
|  1 | SIMPLE      | titles | const | PRIMARY       | PRIMARY | 59      | const,const,const |    1 |       |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+

The effect is the same.

Case 2: Leftmost prefix matching.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | titles | ref  | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+

When the query condition exactly matches one or more columns on the left side of the index, if or, it can be used, but only part of it, the leftmost prefix of the condition, can be used. The query above uses the PRIMARY index from the analysis results, but key_len is 4, indicating that only the first prefix of the index is used.

Case 3: Query conditions use exact matching of columns in the index, but a middle condition is not provided.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | titles | ref  | PRIMARY       | PRIMARY | 4       | const |    1 | Using where |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+

At this time, the index usage is the same as the second case, because title is not provided, so the query only uses the first column of the index, while the latter from_date is also in the index, but because title does not exist and can not be connected with the left prefix, the result needs to be scanned and filtered from_date (because emp_no is unique here, there is no scan). If you want from_date to use an index instead of where filtering, you can add an auxiliary index, which will be used by the query above. In addition, an optimization method called "isolation column" can be used to fill the "pit" between emp_no and from_date.

First, let's look at title s that have several different values:

SELECT DISTINCT(title) FROM employees.titles;
+--------------------+
| title              |
+--------------------+
| Senior Engineer    |
| Staff              |
| Engineer           |
| Senior Staff       |
| Assistant Engineer |
| Technique Leader   |
| Manager            |
+--------------------+

There are only seven. In the case that the column value of "pit" is relatively small, we can consider filling the "pit" with "IN" to form the leftmost prefix:

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no='10001'
AND title IN ('Senior Engineer', 'Staff', 'Engineer', 'Senior Staff', 'Assistant Engineer', 'Technique Leader', 'Manager')
AND from_date='1986-06-26';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | titles | range | PRIMARY       | PRIMARY | 59      | NULL |    7 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+

This time key_len is 59, indicating that the index has been used completely, but from the type and rows it can be seen that IN actually performs a range query, where seven keys are checked. Look at the performance comparison of the two queries:

SHOW PROFILES;
+----------+------------+-------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                         |
+----------+------------+-------------------------------------------------------------------------------+
|       10 | 0.00058000 | SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26'|
|       11 | 0.00052500 | SELECT * FROM employees.titles WHERE emp_no='10001' AND title IN ...          |
+----------+------------+-------------------------------------------------------------------------------+

After filling, the performance is improved a little. If there is a lot of data left after emp_no filtering, the performance advantage of the latter will be more obvious. Of course, if the title has a lot of values, filling the pit is not appropriate, and auxiliary index must be established.

Case 4: Query conditions do not specify the first column of the index.

EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | titles | ALL  | NULL          | NULL | NULL    | NULL | 443308 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

Since it is not the leftmost prefix, queries such as indexes obviously do not need indexes.

Case 5: Match a prefix string of a column.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title LIKE 'Senior%';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | titles | range | PRIMARY       | PRIMARY | 56      | NULL |    1 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+

Indexes can be used at this time, but they cannot be used if wildcards do not appear only at the end. (The original expression is incorrect. If wildcard% does not appear at the beginning, it can be used for indexing, but only one prefix may be used depending on the specific situation.)

Situation 6: Range query.

EXPLAIN SELECT * FROM employees.titles WHERE emp_no < '10010' and title='Senior Engineer';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | titles | range | PRIMARY       | PRIMARY | 4       | NULL |   16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+

Range columns can be used for indexing (they must be the leftmost prefix), but columns behind range columns cannot be used for indexing. At the same time, the index is used for at most one range column, so if there are two range columns in the query condition, the index cannot be used completely.

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no < '10010'
AND title='Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | titles | range | PRIMARY       | PRIMARY | 4       | NULL |   16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+

You can see that the index has nothing to do with the second range index. In particular, one interesting aspect of MySQL is that explain ing alone may not distinguish range indexes from multivalued matches, because both are shown as ranges in type. At the same time, the use of "between" does not mean that it is a range query, such as the following query:

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no BETWEEN '10001' AND '10010'
AND title='Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | titles | range | PRIMARY       | PRIMARY | 59      | NULL |   16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+

It seems that two range queries are used, but "BETWEEN" acting on emp_no is actually equivalent to "IN", that is to say, emp_no is actually a multi-valued exact match. You can see that this query uses all three columns of the index. Therefore, we should carefully distinguish between multi-value matching and range matching in MySQL, otherwise it will confuse the behavior of MySQL.

Case 7: Query conditions contain functions or expressions.

Unfortunately, if the query condition contains functions or expressions, MySQL does not use indexes for this column (although some can be used mathematically). For example:

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND left(title, 6)='Senior';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | titles | ref  | PRIMARY       | PRIMARY | 4       | const |    1 | Using where |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+

Although this query has the same function as in Case 5, it is not possible to apply an index to the title column because of the use of the function left, whereas in Case 5, LIKE can be used. Again,

EXPLAIN SELECT * FROM employees.titles WHERE emp_no - 1='10000';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | titles | ALL  | NULL          | NULL | NULL    | NULL | 443308 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+

Obviously this query is equivalent to querying emp_no as a function of 10001, but because the query condition is an expression, MySQL cannot use an index for it. It seems that MySQL is not intelligent enough to automatically optimize constant expressions, so when writing query statements, try to avoid expressions appearing in queries, but first hand algebraic operations, converted to non-expression query statements.


from: https://www.kancloud.cn/kancloud/theory-of-mysql-index/41857

Posted by garygay on Wed, 22 May 2019 17:26:16 -0700