Originality: Translated by Guan ChanglongAikesheng Open Source Community Yesterday
Author: Guilhem Bichot Translation: Guan Changlong
With LATERAL, JOIN can have a second table - a derived table based on subqueries - defined based on the values of columns in the first table, so it can recalculate each row of the first table. Typical:
SELECT ... FROM t1, LATERAL (SELECT ... FROM t2 ^ WHERE t2.col=t1.col ... ) AS derived; | | | | +---------------------------+
In the second table (derived), t1.col is the "horizontal external reference" of the first table t1. Referenced tables are placed next to "derived tables" (i.e., both are part of the same FROM clause).
In implementing this LATERAL function, I also added another related function: support for non-horizontal external references in derived tables.
Hierarchical data example:
CREATE TABLE employees ( id INT PRIMARY KEY NOT NULL, name VARCHAR(100) NOT NULL, manager_id INT NULL, INDEX (manager_id), FOREIGN KEY (manager_id) REFERENCES employees (id) ); INSERT INTO employees VALUES (333, "Yasmina", NULL), # Yasmina is the CEO (manager_id is NULL) (198, "John", 333), # John has ID 198 and reports to 333 (Yasmina) (692, "Tarek", 333), (29, "Pedro", 198), (4610, "Sarah", 29), (72, "Pierre", 29), (123, "Adil", 692);
The number of direct and indirect reports received by each person? This process contains MySQL recursive syntax
SELECT emp.*, ( WITH RECURSIVE reports AS ( SELECT emp.id UNION ALL SELECT e.id FROM reports AS rep JOIN employees AS e ON rep.id = e.manager_id ) SELECT COUNT(*)-1 FROM reports # Statistical results returned by each calculation ) AS count_of_all_reports FROM employees AS emp;
Description: For each employee:
-
Evaluate a scalar quantum query (lines 2-12) count_of_all_reports, where:
-
Construct CTE by recursively finding all direct and indirect reports of employees (lines 3-10)
-
Calculate the number of rows in CTE (line 11), subtract one line and do not count employees
-
Returns the count.
CTE, which means Common Table Expression, is usually used to build complex queries.
Result:
+------+---------+------------+----------------------+ | id | name | manager_id | count_of_all_reports | +------+---------+------------+----------------------+ | 29 | Pedro | 198 | 2 | | 72 | Pierre | 29 | 0 | | 123 | Adil | 692 | 0 | | 198 | John | 333 | 3 | | 333 | Yasmina | NULL | 6 | | 692 | Tarek | 333 | 1 | | 4610 | Sarah | 29 | 0 | +------+---------+------------+----------------------+ 7 rows in set (0.02 sec)
Interpretation of CTE: Recursion begins with SELECT emp.id, which is a reference to the current employee we want to compute; this emp.id comes from one of the lines EMP (outside CTE).
If we draw an arrow from "reference" to "reference column", the arrow starts from CTE, traverses to the boundary, then traverses to the boundary of the surrounding scalar quantum query, and finally reaches the top query. That's why it's not a "horizontal external reference".
SELECT emp.*, ( WITH RECURSIVE reports AS ( +----------------------------------+ | | SELECT emp.id | UNION ALL | SELECT e.id | FROM reports AS rep JOIN employees AS e | ON rep.id = e.manager_id | ) | crosses CTE's bounds SELECT COUNT(*)-1 FROM reports | ) AS count_of_all_reports | crosses scalar subquery's bounds FROM employees AS emp; | ^ | | | +-----------------------------+ reaches to farthest outside
Before MySQL 8.0.14, this was impossible (MySQL did not know what emp.id was in the definition of CTE).
The new version of MySQL detects this reference; it concludes that the scalar quantum query and the CTE it contains must be recalculated for each line of emp.id.
Check the EXPLAIN query:
+----+--------------------+------------+------------+------+---------------+------------+---------+--------+------+----------+------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+------------+------------+------+---------------+------------+---------+--------+------+----------+------------------------+ | 1 | PRIMARY | emp | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 100.00 | NULL | | 2 | DEPENDENT SUBQUERY | | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL | | 3 | DEPENDENT DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | | 4 | UNCACHEABLE UNION | rep | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where | | 4 | UNCACHEABLE UNION | e | NULL | ref | manager_id | manager_id | 5 | rep.id | 1 | 100.00 | Using index | +----+--------------------+------------+------------+------+---------------+------------+---------+--------+------+----------+------------------------+
We have seen that MySQL has realized that scalar quantum queries are "dependent" (depending on external data), as is the case with derived tables. It also sees that the content of UNION in CTE is "not cacheable" and must be recalculated every time.
In retrospect, starting with MySQL 8.0.14:
-
By default, when parsing the definition of derived tables, MySQL accepts non-horizontal external references, as shown in the example query above.
-
If you add the LATERAL keyword, MySQL also accepts horizontal external references; in other words, it also searches in the FROM clause that contains derived tables.
Note: There are other solutions to the report counting problem. One solution is to use a recursive CTE to build a big result in a pass, list all connections between all employees and each indirect manager, and then aggregate each manager with that big result. It works, but it's hard to read. Instead, what we did above is to generate smaller collections one by one from the hierarchy. So it's part of hierarchy/aggregation/repetition rather than whole hierarchy/aggregation.