Union index (leftmost prefix principle)

Keywords: Python Database SQL

1) What is a federated index?

A federated index refers to indexing multiple columns in a table. In other words, multiple columns on the table add up to form an index for fast query.

2) How do I use federated indexes?

First, add an index to the table
Add to table when creating table:

CREATE TABLE t(
    a int,
    b int,
    primary key(a),
    key idx_a_b(a,b)
)

After creating a table, add to the table:

CREATE TABLE t(
    a int,
    b int,
    primary key(a)
)

To add an index statement to a table:

ALTER TABLE t ADD INDEX indx_a_b(a,b);

Then we use the joint index to query. Before that, we need to learn the leftmost prefix principle of the joint index.
The leftmost prefix principle specifies the query in which the joint index can take effect. The rules are as follows:
If you want to use a federated index, the leftmost column of the federated index must be used as a filter condition, otherwise the federated index will not take effect.
In other words, a federated index actually has the function of a single column index.
Let's insert data into the above table t and learn how to use the joint index correctly:

insert into t(a,b) values(1,4);
insert into t(a,b) values(2,5);
insert into t(a,b) values(3,6);

Leftmost prefix principle experiment:


3) Joint index usage scenario

So what are the advantages of joint indexing?

First, we need to look at the internal results of the joint index: in essence, the joint index is still a B + tree, but the number of key values of the joint index is not 1,
But greater than or equal to 2. For a two column joint index, assuming that the two key values are key1 and key2 respectively, the B + tree structure is as follows:

As can be seen from the above figure, all data can be read out logically and sequentially through leaf nodes:
(1,2) (1,2) (2,1) (2,4) (3,1) (3,2).
Therefore, for queries

SELECT * FROM TABLE WHERE key1=xxx and key2=xxx Obviously, it can be used( key1,key2)This joint index.
For a single column key1 Query of `SELECT * FROM TABLE WHERE key1=xxx` Joint indexes can also be used according to the leftmost prefix principle.

However, for the query of a single column key2, the B + tree index cannot be used:

SELECT * FROM TABLE WHERE key2=xxx;

When we look at the B + tree in the figure above, we can see that the values of key2 on the leaf node are 1, 2, 1, 4, 1, 2, which is obviously not sequential, so the joint index cannot be used for the query of key2 column.

To sum up, we can summarize one advantage of joint index: joint index is a good choice when the combination of two columns is a unique value.

Another benefit of a federated index is that the second key value can be sorted

For example, in many cases, we need to query the shopping status of a user and sort by time. At this time, using the joint index (userid,buy_date) can reduce one sorting operation, because the index itself has been sorted on the leaf node.
example:
First create a table

CREATE TABLE buy_log(
    userid int unsigned not null,
    buy_date DATE
)

Then add one or two indexes to the table:

ALTER TABLE buy_log ADD KEY (userid);
ALTER TABLE buy_log ADD KEY(userid,buy_date);

Finally, insert data into the table:

INSERT INTO buy_log (userid,buy_date) VALUES (1,'2021-01-01');
INSERT INTO buy_log (userid,buy_date) VALUES (2,'2021-01-01');
INSERT INTO buy_log (userid,buy_date) VALUES (3,'2021-01-01');
INSERT INTO buy_log (userid,buy_date) VALUES (1,'2021-02-01');
INSERT INTO buy_log (userid,buy_date) VALUES (3,'2021-02-01');
INSERT INTO buy_log (userid,buy_date) VALUES (1,'2021-03-01');

At this time, the B + tree structure of the joint index is as follows:

In this example, two indexes are created for comparison, one is the single column index of userid, and the other is the joint index of userid and buy_date.
If only the userid is queried, for example:

EXPLAIN SELECT * FROM buy_log WHERE userid=2;

It can be seen that the optimizer finally selects userid. It can be seen from this that when querying a row of data, when the single column index conflicts with the joint index, the single column index is preferred.

Next, suppose you want to retrieve the last three purchase records with userid 1:

EXPLAIN SELECT * FROM buy_log WHERE userid=1 ORDER BY buy_date DESC LIMIT 3;


From here, we can see that the optimizer has selected the joint index. Let's see the results of this query:

One advantage of the federated index mentioned above is that it can sort the second key value
Therefore, the following statement can get the sorting result without using ORDER BY additional sorting:

EXPLAIN SELECT * FROM buy_log WHERE userid=1 DESC LIMIT 3;


This is a huge advantage of using federated indexes!!!

Posted by Bit343 on Fri, 15 Oct 2021 14:42:37 -0700