Mysql Join Parsing and Performance Analysis

Keywords: MySQL SQL Database less

I. An Overview of Join's Grammar

join is used to link fields in multiple tables. The grammar is as follows:

... FROM table1 INNER|LEFT|RIGHT JOIN table2 ON conditiona

Table 1: left table; table 2: right table.

JOIN can be roughly divided into three categories according to its functions:

INNER JOIN (inner join, or equivalent join): Gets records of connection matching relationships in two tables.

LEFT JOIN (left join): Get a complete record of the left table (table1), that is, the right table (table2) has no matching record.

RIGHT JOIN (Right Connection): Contrary to LEFT JOIN, a complete record of the right table (table2) is obtained, that is, the left table (table1) has no matching corresponding record.

Note: mysql does not support Full join, but you can combine LEFT JOIN with RIGHT JOIN to simulate FULL join by using UNION keywords.

Next, a column is given to explain the following categories. The following two tables (A,B)

mysql> select A.id,A.name,B.name from A,B where A.id=B.id;
+----+-----------+-------------+
| id | name       | name             |
+----+-----------+-------------+
|  1 | Pirate       | Rutabaga      |
|  2 | Monkey    | Pirate            |
|  3 | Ninja         | Darth Vader |
|  4 | Spaghetti  | Ninja             |
+----+-----------+-------------+
4 rows in set (0.00 sec)

II. Inner join

Internal join, also known as equivalent join, produces a set of data that simultaneously conforms to A and B.

mysql> select * from A inner join B on A.name = B.name;
+----+--------+----+--------+
| id | name   | id | name   |
+----+--------+----+--------+
|  1 | Pirate |  2 | Pirate |
|  3 | Ninja  |  4 | Ninja  |
+----+--------+----+--------+

III. Left join

mysql> select * from A left join B on A.name = B.name;
#Or:select * from A left outer join B on A.name = B.name;

+----+-----------+------+--------+
| id | name      | id   | name   |
+----+-----------+------+--------+
|  1 | Pirate    |    2 | Pirate |
|  2 | Monkey    | NULL | NULL   |
|  3 | Ninja     |    4 | Ninja  |
|  4 | Spaghetti | NULL | NULL   |
+----+-----------+------+--------+
4 rows in set (0.00 sec)

Left join (or left outer join: Both are equivalent in Mysql. It is recommended to use left join.) Left join generates a complete set of records from the left table (A) and matches the records (right table (B). If there is no match, the right side will contain null.

If you want to generate only a set of records from the left table (A), but not from the right table (B), you can do so by setting where statement, as follows:

mysql> select * from A left join B on A.name=B.name where A.id is null or B.id is null;
+----+-----------+------+------+
| id | name      | id   | name |
+----+-----------+------+------+
|  2 | Monkey    | NULL | NULL |
|  4 | Spaghetti | NULL | NULL |
+----+-----------+------+------+
2 rows in set (0.00 sec)

Similarly, inner join can also be simulated.

mysql> select * from A left join B on A.name=B.name where A.id is not null and B.id is not null;
+----+--------+------+--------+
| id | name   | id   | name   |
+----+--------+------+--------+
|  1 | Pirate |    2 | Pirate |
|  3 | Ninja  |    4 | Ninja  |
+----+--------+------+--------+
2 rows in set (0.00 sec)

Difference sets:

According to the example above, we can find the difference set as follows:

SELECT * FROM A LEFT JOIN B ON A.name = B.name
WHERE B.id IS NULL
union
SELECT * FROM A right JOIN B ON A.name = B.name
WHERE A.id IS NULL;
# Result
    +------+-----------+------+-------------+
| id   | name      | id   | name        |
+------+-----------+------+-------------+
|    2 | Monkey    | NULL | NULL        |
|    4 | Spaghetti | NULL | NULL        |
| NULL | NULL      |    1 | Rutabaga    |
| NULL | NULL      |    3 | Darth Vader |
+------+-----------+------+-------------+

IV. Right join

mysql> select * from A right join B on A.name = B.name;
+------+--------+----+-------------+
| id   | name   | id | name        |
+------+--------+----+-------------+
| NULL | NULL   |  1 | Rutabaga    |
|    1 | Pirate |  2 | Pirate      |
| NULL | NULL   |  3 | Darth Vader |
|    3 | Ninja  |  4 | Ninja       |
+------+--------+----+-------------+
4 rows in set (0.00 sec)

With left join.

V. Cross join

cross join: cross join, the result is the product of two tables, namely Cartesian product

Descartes product is also called direct product. Assuming that set A={a,b} and set B={0,1,2}, the Cartesian product of the two sets is {(a,0),(a,1),(a,2),(b,0),(b,1), (b,2)}. It can be extended to multiple sets. Similarly, if A represents a collection of students in a school and B represents a collection of all courses in that school, then the Cartesian product of A and B represents all possible courses.

mysql> select * from A cross join B;
+----+-----------+----+-------------+
| id | name      | id | name        |
+----+-----------+----+-------------+
|  1 | Pirate    |  1 | Rutabaga    |
|  2 | Monkey    |  1 | Rutabaga    |
|  3 | Ninja     |  1 | Rutabaga    |
|  4 | Spaghetti |  1 | Rutabaga    |
|  1 | Pirate    |  2 | Pirate      |
|  2 | Monkey    |  2 | Pirate      |
|  3 | Ninja     |  2 | Pirate      |
|  4 | Spaghetti |  2 | Pirate      |
|  1 | Pirate    |  3 | Darth Vader |
|  2 | Monkey    |  3 | Darth Vader |
|  3 | Ninja     |  3 | Darth Vader |
|  4 | Spaghetti |  3 | Darth Vader |
|  1 | Pirate    |  4 | Ninja       |
|  2 | Monkey    |  4 | Ninja       |
|  3 | Ninja     |  4 | Ninja       |
|  4 | Spaghetti |  4 | Ninja       |
+----+-----------+----+-------------+
16 rows in set (0.00 sec)

#Re-execution: MySQL > select * from A inner join B; Have a try

#Execute MySQL > select * from A cross join B on A. name = B. name; try

In fact, CROSS JOIN and INNER JOIN perform the same in MySQL (MySQL only). The results obtained without specifying ON conditions are Cartesian products, whereas the results obtained by two tables match perfectly.
INNER JOIN and CROSS JOIN can omit INNER or CROSS keywords, so the following SQL effect is the same:

... FROM table1 INNER JOIN table2
... FROM table1 CROSS JOIN table2
... FROM table1 JOIN table2

6. Full join

mysql> select * from A left join B on B.name = A.name 
    -> union 
    -> select * from A right join B on B.name = A.name;
+------+-----------+------+-------------+
| id   | name      | id   | name        |
+------+-----------+------+-------------+
|    1 | Pirate    |    2 | Pirate      |
|    2 | Monkey    | NULL | NULL        |
|    3 | Ninja     |    4 | Ninja       |
|    4 | Spaghetti | NULL | NULL        |
| NULL | NULL      |    1 | Rutabaga    |
| NULL | NULL      |    3 | Darth Vader |
+------+-----------+------+-------------+
6 rows in set (0.00 sec)

All records generated by full connection (matching records of both sides) are shown in tables A and B. If there is no match, null will be included opposite.

VII. Performance optimization

1. Display (explicit) inner join VS implicit inner join

Such as:

select * from
table a inner join table b
on a.id = b.id;

VS

select a.*, b.*
from table a, table b
where a.id = b.id;

I compared them in the database (10w data). They used almost the same time. The first one was the inner join displayed, and the second one was the implicit inner join.

Reference: Explicit vs implicit SQL joins

2.left join/right join VS inner join

Avoid LEFT JOIN and NULL.

When using left join (or right join), the following points should be clearly known:

(1). The execution order of on and where

The ON condition ("ON in A LEFT JOIN B ON conditional expression") is used to determine how to retrieve rows from table B. If none of the rows in table B matches the ON condition, an additional row of all NULL-listed data will be generated, and the condition of the WHERE clause will not be used during the matching phase. WHERE clause conditions are used only after the matching phase is completed. It retrieves and filters the data generated in the matching phase.

So we should pay attention to: when using Left (right) join, we must first give as many matching conditions as possible to reduce Where execution. Such as:

PASS

select * from A
inner join B on B.name = A.name
left join C on C.name = B.name
left join D on D.id = C.id
where C.status>1 and D.status=1;

Great

select * from A
inner join B on B.name = A.name
left join C on C.name = B.name and C.status>1
left join D on D.id = C.id and D.status=1

From the above example, we can see that the ON condition is satisfied as far as possible, and the Where condition is used less. The second is obviously more time-saving in terms of execution performance.

(2) Note the difference between ON clause and WHERE clause

For example, the author cites a list:

mysql> SELECT * FROM product LEFT JOIN product_details
       ON (product.id = product_details.id)
       AND product_details.id=2;
+----+--------+------+--------+-------+
| id | amount | id   | weight | exist |
+----+--------+------+--------+-------+
|  1 |    100 | NULL |   NULL |  NULL |
|  2 |    200 |    2 |     22 |     0 |
|  3 |    300 | NULL |   NULL |  NULL |
|  4 |    400 | NULL |   NULL |  NULL |
+----+--------+------+--------+-------+
4 rows in set (0.00 sec)
 
mysql> SELECT * FROM product LEFT JOIN product_details
       ON (product.id = product_details.id)
       WHERE product_details.id=2;
+----+--------+----+--------+-------+
| id | amount | id | weight | exist |
+----+--------+----+--------+-------+
|  2 |    200 |  2 |     22 |     0 |
+----+--------+----+--------+-------+
1 row in set (0.01 sec)

As you can see from the above, the first query uses ON condition to retrieve all data rows from the product_details table of LEFT JOIN. The second query makes a simple LEFT JOIN, and then uses the WHERE clause to filter out the unqualified data rows from the LEFT JOIN data.

(3) Avoid subqueries as much as possible and use join

Often, performance is more often reflected in the large amount of data, at this time, we should avoid complex sub-queries. As follows:

PASS

insert into t1(a1) select b1 from t2 where not exists(select 1 from t1 where t1.id = t2.r_id); 

Great

insert into t1(a1)  
select b1 from t2  
left join (select distinct t1.id from t1 ) t1 on t1.id = t2.r_id   
where t1.id is null;  

This can be referred to. The performance differences between exists and inner join and not exists and left join of mysql are astonishing

Supplement: Use of MySQL STRAIGHT_JOIN and NATURAL JOIN

Thank netizens ** Broken-winged Birds** Give a reminder.

To make a long story short: straight_join implements the forced loading order of multiple tables, from left to right, such as:

...A straight_join B on A.name = B.name 

Straight_join is exactly the same as inner join except that the join grammar decides who loads first according to "which table has a small result set and which table is the driving table", and straight_join forces the table on its left to load first.

Often when we analyze the performance of mysql processing, such as Explain, we can use this statement if we find that the loading order of mysql is unreasonable, but often mysql can analyze and process automatically.

For more information: MySQL STRAIGHT_JOIN and NATURAL JOIN
and MySQL Optimized StraAIGHT_JOIN

8. Reference:

A Visual Explanation of SQL Joins

Five Ways to Improve SQL Performance

Three things you may need to know about MySQL LEFT JOIN

Posted by sottwell on Tue, 02 Apr 2019 23:27:29 -0700