How much does MySQL count know

Keywords: MySQL Stored Procedure less Database

Statistics of the amount of data in a table is a common requirement, but different table designs and different writing methods can lead to large differences in statistical performance. Here is a simple experiment to test (everyone should pay attention to the cache when testing, otherwise the test results will be affected).

1. Preparations

In order to proceed with the subsequent test, first prepare several tables and data for the test. In order to make the test data meaningful, it is recommended to have a large amount of data in the test table to avoid the query time being too small. Therefore, you can continue to use the continuous number generation method commonly used before, as follows:

/* Create Continuous Number Table */
CREATE TABLE nums(id INT primary key);

/* Generate a continuous number of stored procedures, optimized after */
DELIMITER $$
CREATE  PROCEDURE `sp_createNum`(cnt INT )
BEGIN
    DECLARE i INT  DEFAULT 1;
    TRUNCATE TABLE nums;
    INSERT INTO nums SELECT i;
    WHILE i < cnt DO
      BEGIN
        INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt;
        SET i = i*2;
      END;
    END WHILE;
END$$

DELIMITER ;

Generate data, this time prepare to generate 1 kW records

/* Call Stored Procedure */
mysql> call sp_createNum(10000000);
Query OK, 1611392 rows affected (32.07 sec)

If you go through the cycle one by one, it's quite a long time, so you can test it yourself and refer to the links. Continuous Integer Generation Method with 16,800-fold Efficiency Improvement

1.1 Create innodb table

Generate three tables, innodb, as follows:

nums_1 table only has string primary key fields

/*  Generate table nums_1 with a single string type field primary key */
mysql> create table  nums_1 (p1 varchar(32) primary key ) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/*  Import data, convert id to string through md5 function */
mysql> insert into  nums_1 select md5(id) from nums;
Query OK, 10000000 rows affected (1 min 12.63 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

The nums_2 table has five fields, where the primary key is p1 for the string type field, the other fields are integer id, non-empty c1, empty c2, and empty c3.

The contents of C1 and c2 fields are identical. The difference is that the field constraints are different (c1 can not be empty, c2 can be empty). The difference between C3 and c1,c2 is that the values at the beginning of aa in C1 are null in c3, and the other contents are the same.

/* Create table nums_2 */
mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (1.03 sec)

/*Import data */
mysql> insert into  nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums;
Query OK, 10000000 rows affected (5 min 6.68 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

The contents of the nums_3 table are exactly the same as that of nums_2, except that the primary key fields are different, and the c3 table is an integer id

/*  Create table nums_3 */
mysql> create table nums_3(p1 varchar(32) ,id int primary key  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/* Import directly from nums_2 because the content is identical */
mysql> insert into nums_3 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 18.81 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

1.2 Create MyISAM Engine Table

Create another table of MyISAM with the same structure and content as nums_2, except that the engine is MyISAM.

/* Create a nums_4 table for the MyISAM engine*/
mysql> create table nums_4(p1 varchar(32) not null  primary key ,id int  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM;
Query OK, 0 rows affected (0.00 sec)

/* Import data directly from the nums_2 table */
mysql> insert into nums_4 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 16.78 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

2. Method of querying data quantity of a table

There are several types of data to query a table:

Query the approximate amount of data, you can look up statistical information, 2.1 will introduce specific methods

Find the exact amount of data by counting (primary key field), count(*), count(1) [where 1 can be replaced with any constant]

2.1 Inexact Query

If you only look up the approximate amount of data in a table, especially if a very large table only queries the magnitude of the table (millions, millions, or hundreds of millions), you can directly query the statistical information in the following ways:

Query index information, where Cardinalit is the approximate amount of data (see the value of the PRIMARY row for the primary key, or the Cardinalit value for the last column if it is a composite primary key for multiple columns)

mysql> show index from nums_2;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| nums_2 |          0 | PRIMARY  |            1 | p1          | A         |     9936693 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

View the table status, where Rows is the approximate amount of data

mysql> show table status like  'nums_2';
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name   | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| nums_2 | InnoDB |      10 | Dynamic    | 9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)

Viewing a STATISTICS or TABLES table directly is similar to viewing index information or table status, where TABLE_ROWS has an approximate amount of data

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_2';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_2     | BASE TABLE | InnoDB |      10 | Dynamic    |    9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

Be careful:

The tables caused by innodb can query the approximate amount of data for the corresponding tables in all three ways, and the results are the same because they are all derived from the same statistical information
The result of the MyISAM table is an exact value (table data volume, excluding other fields)

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_4';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME         | CHECK_TIME          | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_4     | BASE TABLE | MyISAM |      10 | Dynamic    |   10000000 |             75 |   759686336 | 281474976710655 |    854995968 |         0 |           NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

2.2 Find exactly

Since the results of innodb's table query in 2.1 are all statistical values, not prepared values, and statistical precision is required in most cases in practice, there are several ways to query for accurate values, and all engine tables are applicable.

Count (primary key)

mysql> select count(p1) from nums_2;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.60 sec)

count(1)

One of these can be any constant, such as count(2),count('a'), etc.

mysql> select count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.45 sec)

count(*)

mysql> select count(*) from nums_2;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.52 sec)

3. Performance comparison of count

Compare count (primary key) count(1) count(*) count (non-empty field) count (nullable field) Performance comparison

3.1 MyISAM Engine Table

3.1.1 Query the entire table

If you want to precisely query the amount of data in a MyISAM table, using count (primary key) count(1) count(*) is consistently efficient and takes almost 0s to find out the exact result directly

mysql> select count(p1) from nums_4;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (0.00 sec)

mysql> select count(1) from nums_4;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from nums_4;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

Execution plans are also consistent, you can see that there is no statistics by primary key or other index scan

mysql> explain select count(*) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(p1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

Summary:

MyISAM's method of lookup table data volume efficiency is count (primary key) = count(1) = count(*)

3.1.2 Query partial data

When querying part of the data, you cannot get the statistics directly, so the time-consuming situation is roughly as follows:

mysql> select count(p1) from nums_4 where  p1 like 'aa%';
+-----------+
| count(p1) |
+-----------+
|     39208 |
+-----------+
1 row in set (0.14 sec)

mysql> select count(1) from nums_4 where  p1 like 'aa%';
+----------+
| count(1) |
+----------+
|    39208 |
+----------+
1 row in set (0.13 sec)

mysql> select count(*) from nums_4 where p1 like 'aa%';
+----------+
| count(*) |
+----------+
| 39208 |
+----------+
1 row in set (0.13 sec)

Execution plans are the same:

mysql> explain select count(1) from nums_4 where  p1 like 'aa%';
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | nums_4 | NULL       | range | PRIMARY       | PRIMARY | 98      | NULL | 42603 |   100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

Summary: MyISAM engine table statistics part of the data directly derived data volume, perhaps scan data statistics, several writing efficiency is similar.

3.2 innodb Engine Table

Because the innodb engine supports MVCC, it can not persist the entire table data volume. Each query needs to traverse statistics, but the efficiency of queries varies according to the writing method, and the different dimensions will be compared later.

3.2.1 Performance Comparison of Different Writing Methods

Compare query efficiency by count (primary key), count (1), count(*)

mysql> select count(p1) from nums_2  ;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.68 sec)

mysql> select count(1) from nums_2  ;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.37 sec)

mysql> select count(*) from nums_2  ;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.38 sec)

A simple comparison shows that the query performance result is count < count(1)count(*)

But looking at the execution plan is like this

mysql> explain select count(p1) from nums_2;
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | nums_2 | NULL       | index | NULL          | PRIMARY | 98      | NULL | 9936693 |   100.00 | Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec

However, the query efficiency is different because the statistical methods are different, as follows:

Count: The innodb engine traverses the entire table based on the corresponding index, taking out the primary key values for each row and returning them to the server layer.When the server layer gets the primary key field, it decides that it is not empty (it can be optimized here), and it adds up by line.
count(1): The entire table is also traversed, because the result of each row is 1 (not empty), so you can count directly without having to decide if it is empty.
count(*): The innodb engine is optimized, similar to count(1), by directly accumulating statistics by row

3.2.2 Performance comparison of primary key field types

nums_2 has the same content as nums_3, except that the primary key field of num_3 is an integer id field. Now compare the query performance of different primary key fields.

mysql> select /* SQL_NO_CACHE */count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (2.02 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.69 sec)

Tests have found that tables with the same content data have different primary keys, different performance, and better query efficiency when the primary key (index) field type is small.

Note: If an index is added to the id field of nums_2, the query will follow the id index because the primary key index (clustered index) is of type varchar(32) and the id is int. Indexes of different sizes will have less IO overhead if they are integer.

Therefore, MySQL's primary key is recommended to use self-increasing id as the primary key (the advantage is not only statistical, but also an opportunity to explain it).

3.2.3 Comparison of table sizes

The difference between nums_1 and nums_3 in preparation is that the primary keys are integer id s, but nums_3 has more fields, that is, larger tables, and a comparison of query efficiency is as follows:

mysql> select /* SQL_NO_CACHE */count(1) from nums_1;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.61 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.67 sec)

Query time is for reference only and depends on machine performance.

Therefore, the query efficiency varies with the size of the table, and the smaller the table, the more efficient the query.

3.2.4 count (common field)

Because the c2 field of the nums_3 table is allowed to be empty, but the contents are not empty, and the c3 field is allowed to be empty, but there are cases where the contents are empty.Now count the c2 and c3 fields of the nums_3 table separately to see the results (add an index first to improve query performance)

mysql> select  count(c2) from  nums_3 ;
+-----------+
| count(c2) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.69 sec)

mysql> select  count(c3) from  nums_3 ;
+-----------+
| count(c3) |
+-----------+
|   9960792 |
+-----------+
1 row in set (1.73 sec)

Because there are null values in the c3 field, rows with null values are ignored when the index counts the number of c3 rows.

4. Summary

By comparing the statistical efficiency of different writings between the MyISAM engine and InnoDB engine tables, the following conclusions can be drawn:

MyISAM tables are the most efficient way to count the number of rows in an entire table, but MyISAM tables do not support transactions
InnoDB table statistical efficiency count (primary key) < count(1)count(*)
MySQL recommends setting primary keys for self-increasing field types
The smaller the table size, the more efficient the query statistics are

In fact, through the preparation work of several tables can also do more tests, interested students can test themselves (verbose, pay attention to caching, haha), but also pay attention to WeChat Public Number [Database Dry Store] to enter the technical exchange group for timely communication, thank you.

Posted by rwfresh on Sun, 05 Apr 2020 14:41:25 -0700

Programmer Group