Summary of MySQL 8 Anti-Join

Keywords: Database MySQL SQL socket Unix

Guidance:

Author: Zheng Songhua, Zhi Shu Tang's SQL Optimized Class Teacher, Net Name: Rabbit Riding Tortoise

Today, I would like to share some articles about not in, not exists.In fact, this can be summarized as exists to in, which was treated differently in previous versions of mysql until version 8.0.16.

Prior to version 16, in could be optimized to semi join several optimizations about semi join, such as loosescan=on, first match=on, duplicateweedout=on, materialization=on. For in, only one operation of exists was DEPENDENT SUBQUERY. Until version 16, exists could be equivalent to in or could enjoy semi join. This time, the latest version of mysqlservertiam.com websiteThere are new articles about anti join in.

Antijoin in MySQL 8                     mysqlserverteam.com

We can simply understand that not exists and not in are just sub-queries that become semi-joined at the straight point. Here are two different execution plans for the same sql for versions 5.7 and 8.0.18

First 5.7

--------------mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper
Connection id:          2
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.14-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql3306.sock
Uptime:                 5 min 4 sec
Threads: 1  Questions: 21  Slow queries: 0  Opens: 116  Flush tables: 1  Open tables: 109  Queries per second avg: 0.069
--------------
root@mysql3306.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY            | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+

As in version 5.7.14 above, the not exists execution plan is DEPENDENT SUBQUERY

We looked at the same sql in version 8.0.18:

root@mysql3308.sock>[employees]>\s
--------------
/usr/local/mysql8/bin/mysql  Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)
Connection id:          7
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         8.0.18 MySQL Community Server - GPL
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
mb4Db     characterset:    utf8
mb4Client characterset:    utf8
mb4Conn.  characterset:    utf8
mb4UNIX socket:            /tmp/mysql3308.sock
Uptime:                 19 min 24 sec
Threads: 1  Questions: 46  Slow queries: 0  Opens: 175  Flush tables: 3  Open tables: 95  Queries per second avg: 0.039
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+

We can see that it's turned into an anti join

root@mysql3308.sock>[employees]>desc format=tree  select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.25 rows=10)
   -> Table scan on t  (cost=1.25 rows=10)
   -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.01 rows=1) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Since there is no new explain format=tree for version 8.0.18 in 5.7 to see how to execute the plan, we compare show warningsG

5.7

root@mysql3306.sock>[employees]>show warnings\G
*************************** 1. row ***************************  
Level: Note   
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************  
Level: Note   
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`))))
2 rows in set (0.00 sec)

8.0

root@mysql3308.sock>[employees]>show warnings\G
*************************** 1. row ***************************  
Level: Note   
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************  
Level: Note   
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true
2 rows in set (0.00 sec)

You can see whether 5.7 or not exists and 8.0.18 became anti-join, so someone asked DEPENDENT SUBQUERY or what is anti-join really good for?

Simply put, DEPENDENT SUBQUERY is similar to a function call, requiring an important premise: parameters must be accepted!That is, it can only be nested loop join.But anti join does not necessarily mean it can also be a nested loop join or a hash join; hash join is a new feature in version 8.0.18!

The article about hash join can be read as follows:

hash join                          mysqlserverteam.com

In addition, DEPENDENT SUBQUERY will take longer to execute as the amount of data in the result set of the outer table increases. Previously, if we encountered this extreme problem, we could use left join to solve it. Specific methods can be used to see my public classes.

Now with the new version of anti join we can omit that.

Finally, what I want to say is that optimization cannot leave version, and it is not possible to leave version and talk about optimization.

There are also new introductions in this 8.0.18 release

Explain format=tree/analyze A serious problem was found in the way the plan was executed

root@mysql3308.sock>[employees]>desc  select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.17 rows=10)
   -> Table scan on t  (cost=1.25 rows=10)
   -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.00 rows=1) 
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

As mentioned above, you can see that nested loop anti-join has occurred, and now I want to change this to the following:

root@mysql3308.sock>[employees]>desc  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |     10 |   100.00 | NULL                                                           |
|  1 | SIMPLE      | e     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299246 |   100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+root@mysql3308.sock>[employees]>desc analyze  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'

As shown above, this time desc will have an execution plan, but desc analyze will have an error!

My guess is that this might be a bug! Or it's not ready to support it, because it's new and I hope to change it later.

Limited level. If there are any errors, please correct them.

Thank you ~Welcome to Forward ~

I am a SQL Optimized Class Teacher ~^^

For questions and exchanges on SQL optimization, please add:

High performance MySQL,SQL optimization group

also @Rabbit @Zhi Shu Tang SQL Optimization

Welcome to Zhishutang.

Posted by eyespark on Tue, 07 Jan 2020 18:55:46 -0800