MySQL Query Optimization 6 - Nested Join Optimization

Keywords: MySQL SQL Windows

MySQL Query Optimization 6 - Nested Join Optimization

If you need to reproduce, please indicate the source: http://blog.csdn.net/itas109
QQ Technology Exchange Group: 12951803

Environmental Science:
MySQL Version: 5.5.15
Operating system: windows

This article discusses Nested Join Optimization.

Compared with the SQL standard, the table_factor syntax has been extended. The latter accepts only table_reference, not a list in parentheses. This is a conservative extension if we treat each comma in the table_reference project list as equivalent to an internal join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                 ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

Equate to

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                 ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)


In MySQL, CROSS JOIN is grammatically equivalent to INNER JOIN; they can replace each other. In standard SQL, they are not equivalent. INNER JOIN is used with the ON clause; otherwise CROSS JOIN is used.

Usually, parentheses can be ignored in join expressions that contain only internal join operations. Consider this join expression:

t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
   ON t1.a=t2.a

After removing the left parentheses and grouping operations, the join expression is converted to the following expression:

(t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3
    ON t2.b=t3.b OR t2.b IS NULL

However, the two expressions are not the same. To see this, suppose that tables t1, t2 and t3 have the following states:

  • Table t1 contains rows (1, NULL), (2, NULL)
  • Table t2 contains rows (1, 101)
  • Table t3 contains rows (NULL, 101)

In this case, the first expression returns the result set containing rows (1, 1, 101, 101), (2, NULL, NULL, NULL), and the second expression returns rows (1, 1, 101, 101), (2, NULL, NULL, 101):

mysql> SELECT *
       FROM t1
            LEFT JOIN
            (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
            ON t1.a=t2.a;
+------+------+------+------+
| a    | a    | b    | b    |
+------+------+------+------+
|    1 |    1 |  101 |  101 |
|    2 | NULL | NULL | NULL |
+------+------+------+------+

mysql> SELECT *
       FROM (t1 LEFT JOIN t2 ON t1.a=t2.a)
            LEFT JOIN t3
            ON t2.b=t3.b OR t2.b IS NULL;
+------+------+------+------+
| a    | a    | b    | b    |
+------+------+------+------+
|    1 |    1 |  101 |  101 |
|    2 | NULL | NULL |  101 |
+------+------+------+------+

In the following example, the outer connection operation is used with the inner connection operation:

t1 LEFT JOIN (t2, t3) ON t1.a=t2.a

This expression cannot be converted to the following expression:

t1 LEFT JOIN t2 ON t1.a=t2.a, t3

For a given table state, these two expressions return different sets of rows:

mysql> SELECT *
       FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;
+------+------+------+------+
| a    | a    | b    | b    |
+------+------+------+------+
|    1 |    1 |  101 |  101 |
|    2 | NULL | NULL | NULL |
+------+------+------+------+

mysql> SELECT *
       FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;
+------+------+------+------+
| a    | a    | b    | b    |
+------+------+------+------+
|    1 |    1 |  101 |  101 |
|    2 | NULL | NULL |  101 |
+------+------+------+------+

Therefore, if we omit parentheses in join expressions with outer join operators, we may change the result set of the original expression.

To be more precise, we cannot ignore the brackets in the right operand of the left external join operation and the left operand of the right join operation. In other words, we can't ignore the parentheses of the inner table expression of the outer join operation. The parentheses of other operands (operands of external tables) can be ignored.

The following expression:

(t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)

This expression corresponds to any table t1, t2, t3, and any condition P for attributes t2.b and t3.b:

t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)

Nested joins are discussed whenever the order of execution of join operations in join_table is not from left to right. Consider the following queries:

SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a
  WHERE t1.a > 1

SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
  WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1

These queries are considered to contain these nested joins:

t2 LEFT JOIN t3 ON t2.b=t3.b
t2, t3

In the first query, nested joins are formed by left join operations. In the second query, it consists of an internal join operation.

In the first query, parentheses can be omitted: the grammatical structure of the join expression determines the order in which the join operation is performed. For the second query, parentheses cannot be ignored, although the join expression here can be interpreted without ambiguity. In our extended grammar, parentheses in the second query (t2, t3) are necessary, although theoretically queries can be parsed without them: because of the roles played by LEFT JOIN and ON, we still have explicit query grammar structures for left and right separators of expressions (t2, t3).

The preceding examples prove these points:

  • For connection expressions involving only internal connections (but not external connections), parentheses can be removed and connections can be calculated from left to right. In fact, tables can be evaluated in any order.
  • Generally speaking, the same is true for external connections or mixtures of external connections and internal connections. Removing parentheses may change the result.

Queries with nested external connections are executed in the same pipeline as queries with internal connections. To be more precise, a variant of nested loop join algorithm is utilized. Think back to the nested loop join algorithm for executing queries (see "nested loop join algorithm"). Suppose that join queries on three tables T1, T2 and T3 have this form:

SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
                 INNER JOIN T3 ON P2(T2,T3)
  WHERE P(T1,T2,T3)

Here, P1 (T1, T2) and P 2 (T3, T3) are some join conditions (in expression), while P (T1, T2, T3) is a condition for the columns of Tables T1, T2, T3.

The nested loop join algorithm will execute the query as follows:

FOR each row t1 in T1 {
  FOR each row t2 in T2 such that P1(t1,t2) {
    FOR each row t3 in T3 such that P2(t2,t3) {
      IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
      }
    }
  }
}

The symbol T1 T2 T3 denotes rows formed by joining columns of rows t1, T2 and t3. In some examples below, NULL with a table name means that each column of the table uses NULL rows. For example, t1 || t2 || NULL represents rows constructed by linking columns of rows T1 and t2, and NULL for each column of t3. Such lines are considered NULL supplements.

Now consider a nested query with external connections:

FOR each row t1 in T1 {
  BOOL f1:=FALSE;
  FOR each row t2 in T2 such that P1(t1,t2) {
    BOOL f2:=FALSE;
    FOR each row t3 in T3 such that P2(t2,t3) {
      IF P(t1,t2,t3) {
        t:=t1||t2||t3; OUTPUT t;
      }
      f2=TRUE;
      f1=TRUE;
    }
    IF (!f2) {
      IF P(t1,t2,NULL) {
        t:=t1||t2||NULL; OUTPUT t;
      }
      f1=TRUE;
    }
  }
  IF (!f1) {
    IF P(t1,NULL,NULL) {
      t:=t1||NULL||NULL; OUTPUT t;
    }
  }
}

Typically, for any nested loop of the first internal table in an external join operation, a flag is introduced that closes before the loop and checks after the loop. This flag is opened when the current row in the external table matches the table representing the internal operands. If the flag is still closed at the end of the loop, the current row of the appearance does not find a match. In this case, the row is supplemented by the NULL value of the column in the internal table. The result row is passed to the final check of the output or to the next nested loop, but only if it satisfies the join conditions for all embedded external connections.

In this example, an external join table represented by the following expression is embedded:

(T2 LEFT JOIN T3 ON P2(T2,T3))

For queries with internal connections, the optimizer can select different nested loops, such as:

FOR each row t3 in T3 {
  FOR each row t2 in T2 such that P2(t2,t3) {
    FOR each row t1 in T1 such that P1(t1,t2) {
      IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
      }
    }
  }
}

For queries with external joins, the optimizer can only select the order in which the loops of external tables precede those of internal tables. Therefore, for queries using external connections, it is possible to have only one nested order. For the following queries, the optimizer evaluates two different nestings. In both nests, T1 must be processed in an external loop because it is used in an external connection. T2 and T3 are used in internal connections, so connections must be processed in internal loops. However, because joins are internal joins, T2 and T3 can be processed in any order.

SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3)
  WHERE P(T1,T2,T3)

A nested assessment of T2, then T3:

FOR each row t1 in T1 {
  BOOL f1:=FALSE;
  FOR each row t2 in T2 such that P1(t1,t2) {
    FOR each row t3 in T3 such that P2(t1,t3) {
      IF P(t1,t2,t3) {
        t:=t1||t2||t3; OUTPUT t;
      }
      f1:=TRUE
    }
  }
  IF (!f1) {
    IF P(t1,NULL,NULL) {
      t:=t1||NULL||NULL; OUTPUT t;
    }
  }
}

Another nested evaluation T3, and then T2:

FOR each row t1 in T1 {
  BOOL f1:=FALSE;
  FOR each row t3 in T3 such that P2(t1,t3) {
    FOR each row t2 in T2 such that P1(t1,t2) {
      IF P(t1,t2,t3) {
        t:=t1||t2||t3; OUTPUT t;
      }
      f1:=TRUE
    }
  }
  IF (!f1) {
    IF P(t1,NULL,NULL) {
      t:=t1||NULL||NULL; OUTPUT t;
    }
  }
}

When discussing nested looping algorithms for inline joins, we omitted some details that have a significant impact on query execution performance. We did not mention the so-called "knock-down" situation. Suppose that our WHERE condition P (T1, T2, T3) can be expressed by a connection formula:

P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).

In this case, MySQL actually uses the following nested loop algorithm to execute queries with internal connections:

FOR each row t1 in T1 such that C1(t1) {
  FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2)  {
    FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) {
      IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
      }
    }
  }
}

You can see that each conjunction C1 (T1), C2 (T2), C3 (T3) is pushed from the innermost loop to the outermost loop, so that it can be evaluated. If C1 (T1) is a very strict condition, then pushing under this condition may greatly reduce the number of rows passed to table T1 in the internal loop. Therefore, query execution time may be greatly increased.

For queries with external joins, the WHERE condition is checked only after the matching of the current row in the appearance with that in the inner table is found. Therefore, optimization of conditions derived from internal nested loops cannot be directly applied to queries with external connections. Here, we have to introduce pushing predicates under the condition of open logo protection when encountering matches.

Recall the examples of external connections:

P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)

For this example, the nested loop algorithm using push-down conditions is as follows:

FOR each row t1 in T1 such that C1(t1) {
  BOOL f1:=FALSE;
  FOR each row t2 in T2
      such that P1(t1,t2) AND (f1?C2(t2):TRUE) {
    BOOL f2:=FALSE;
    FOR each row t3 in T3
        such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) {
      IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) {
        t:=t1||t2||t3; OUTPUT t;
      }
      f2=TRUE;
      f1=TRUE;
    }
    IF (!f2) {
      IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) {
        t:=t1||t2||NULL; OUTPUT t;
      }
      f1=TRUE;
    }
  }
  IF (!f1 && P(t1,NULL,NULL)) {
      t:=t1||NULL||NULL; OUTPUT t;
  }
}

Generally, push-down predicates can be extracted from join conditions such as P1 (T1, T2) and P (T2, T3). In this case, the drop-down predicate is also protected by a flag that prevents checking the NULL supplementary row predicate generated by the corresponding external join operation.

If triggered by predicates in WHERE conditions, access from one internal table to another through keys in the same nested join is prohibited.

Reference:
https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html

I think this article is helpful to you. I can use the two-dimensional code scanned by Wechat to donate to the blogger. Thank you.

If you need to reproduce, please indicate the source: http://blog.csdn.net/itas109
QQ Technology Exchange Group: 12951803

Posted by eternalprophet on Mon, 07 Jan 2019 13:27:10 -0800