Guidance:
Author: Zheng Songhua, Zhi Shu Tang's SQL Optimized Class Teacher, Net Name: Rabbit Riding Tortoise
Today, I would like to share some articles about not in, not exists.In fact, this can be summarized as exists to in, which was treated differently in previous versions of mysql until version 8.0.16.
Prior to version 16, in could be optimized to semi join several optimizations about semi join, such as loosescan=on, first match=on, duplicateweedout=on, materialization=on. For in, only one operation of exists was DEPENDENT SUBQUERY. Until version 16, exists could be equivalent to in or could enjoy semi join. This time, the latest version of mysqlservertiam.com websiteThere are new articles about anti join in.
Antijoin in MySQL 8 mysqlserverteam.com
We can simply understand that not exists and not in are just sub-queries that become semi-joined at the straight point. Here are two different execution plans for the same sql for versions 5.7 and 8.0.18
First 5.7
--------------mysql Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using EditLine wrapper Connection id: 2 Current database: employees Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.7.14-log MySQL Community Server (GPL) Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /tmp/mysql3306.sock Uptime: 5 min 4 sec Threads: 1 Questions: 21 Slow queries: 0 Opens: 116 Flush tables: 1 Open tables: 109 Queries per second avg: 0.069 -------------- root@mysql3306.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+ | 1 | PRIMARY | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using index | +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
As in version 5.7.14 above, the not exists execution plan is DEPENDENT SUBQUERY
We looked at the same sql in version 8.0.18:
root@mysql3308.sock>[employees]>\s -------------- /usr/local/mysql8/bin/mysql Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL) Connection id: 7 Current database: employees Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.18 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 mb4Db characterset: utf8 mb4Client characterset: utf8 mb4Conn. characterset: utf8 mb4UNIX socket: /tmp/mysql3308.sock Uptime: 19 min 24 sec Threads: 1 Questions: 46 Slow queries: 0 Opens: 175 Flush tables: 3 Open tables: 95 Queries per second avg: 0.039 root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
We can see that it's turned into an anti join
root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop anti-join (cost=12.25 rows=10) -> Table scan on t (cost=1.25 rows=10) -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.01 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)
Since there is no new explain format=tree for version 8.0.18 in 5.7 to see how to execute the plan, we compare show warningsG
5.7
root@mysql3306.sock>[employees]>show warnings\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1 *************************** 2. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)))) 2 rows in set (0.00 sec)
8.0
root@mysql3308.sock>[employees]>show warnings\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1 *************************** 2. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true 2 rows in set (0.00 sec)
You can see whether 5.7 or not exists and 8.0.18 became anti-join, so someone asked DEPENDENT SUBQUERY or what is anti-join really good for?
Simply put, DEPENDENT SUBQUERY is similar to a function call, requiring an important premise: parameters must be accepted!That is, it can only be nested loop join.But anti join does not necessarily mean it can also be a nested loop join or a hash join; hash join is a new feature in version 8.0.18!
The article about hash join can be read as follows:
hash join mysqlserverteam.com
In addition, DEPENDENT SUBQUERY will take longer to execute as the amount of data in the result set of the outer table increases. Previously, if we encountered this extreme problem, we could use left join to solve it. Specific methods can be used to see my public classes.
Now with the new version of anti join we can omit that.
Finally, what I want to say is that optimization cannot leave version, and it is not possible to leave version and talk about optimization.
There are also new introductions in this 8.0.18 release
Explain format=tree/analyze A serious problem was found in the way the plan was executed
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop anti-join (cost=12.17 rows=10) -> Table scan on t (cost=1.25 rows=10) -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.00 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)
As mentioned above, you can see that nested loop anti-join has occurred, and now I want to change this to the following:
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299246 | 100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+root@mysql3308.sock>[employees]>desc analyze select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ;ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'
As shown above, this time desc will have an execution plan, but desc analyze will have an error!
My guess is that this might be a bug! Or it's not ready to support it, because it's new and I hope to change it later.
Limited level. If there are any errors, please correct them.
Thank you ~Welcome to Forward ~
I am a SQL Optimized Class Teacher ~^^
For questions and exchanges on SQL optimization, please add:
High performance MySQL,SQL optimization group
also @Rabbit @Zhi Shu Tang SQL Optimization
Welcome to Zhishutang.