MySQL 8.0. Optimizer New Feature Cost Model, Histogram and Optimizer Overhead Optimization

Keywords: MySQL SQL Database Session

MySQL is currently released to MySQL version 8.0. In the new version, you can see that MySQL has been criticized before the optimizer part made a lot of changes. Because the author's previous working environment is 5.6, recently switched to the latest version 8.0, this article covers some of the optimizer-related parts that I am interested in, including MySQL 5.7 cost model and MySQL 8.0 histogram. Function.

Based on the latest version of MySQL 8.0.12, this article mainly talks about the use of cost model and histogram and related code.

Cost Model

Configurable cost constants

Why do we need to configure cost model constants? We know that MySQL has developed for decades, but we still use the weight of hardcode to measure io, cpu and other resources in the optimizer. These weight values are actually based on years or even decades of experience. Think about how fast hardware has developed over the years. Dozens of hundreds of core servers are not only widely used in a few or even some large companies, ssd has long been the mainstream, and NVME is also rising. High-speed RDMA network is entering ordinary people's homes. All these even affect the implementation and change of database system. Obviously, the weights of those hardcodes are outdated, and we need to provide user-defined ways, or even further, intelligent automatic setting based on the hardware environment.

MySQL 5.7 introduces two new system tables, which are exposed to users for updates, as follows:

root@(none) 04:05:24>select * from mysql.server_cost;
+------------------------------+------------+---------------------+---------+---------------+
| cost_name                    | cost_value | last_update         | comment | default_value |
+------------------------------+------------+---------------------+---------+---------------+
| disk_temptable_create_cost   |       NULL | 2018-04-23 13:55:20 | NULL    |            20 |
| disk_temptable_row_cost      |       NULL | 2018-04-23 13:55:20 | NULL    |           0.5 |
| key_compare_cost             |       NULL | 2018-04-23 13:55:20 | NULL    |          0.05 |
| memory_temptable_create_cost |       NULL | 2018-04-23 13:55:20 | NULL    |             1 |
| memory_temptable_row_cost    |       NULL | 2018-04-23 13:55:20 | NULL    |           0.1 |
| row_evaluate_cost            |       NULL | 2018-04-23 13:55:20 | NULL    |           0.1 |
+------------------------------+------------+---------------------+---------+---------------+
6 rows in set (0.00 sec)

//Where default_value is generated column, its expression has fixed the default value:

`default_value` float GENERATED ALWAYS AS (
(case `cost_name` 
when _utf8mb3'disk_temptable_create_cost' then 20.0 
when _utf8mb3'disk_temptable_row_cost' then 0.5 
when _utf8mb3'key_compare_cost' then 0.05 
when _utf8mb3'memory_temptable_create_cost' then 1.0 
when _utf8mb3'memory_temptable_row_cost' then 0.1 
when _utf8mb3'row_evaluate_cost' then 0.1 else NULL end)) VIRTUAL

root@(none) 04:05:35>select * from mysql.engine_cost;
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| engine_name | device_type | cost_name              | cost_value | last_update         | comment | default_value |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| default     |           0 | io_block_read_cost     |       NULL | 2018-04-23 13:55:20 | NULL    |             1 |
| default     |           0 | memory_block_read_cost |       NULL | 2018-04-23 13:55:20 | NULL    |          0.25 |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+

You can update it by using the update statement, for example:

root@(none) 04:05:52>update mysql.server_cost set cost_value = 40 where cost_name = 'disk_temptable_create_cost';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

root@(none) 04:07:13>select * from mysql.server_cost where cost_name = 'disk_temptable_create_cost';
+----------------------------+------------+---------------------+---------+---------------+
| cost_name                  | cost_value | last_update         | comment | default_value |
+----------------------------+------------+---------------------+---------+---------------+
| disk_temptable_create_cost |         40 | 2018-06-23 16:07:05 | NULL    |            20 |
+----------------------------+------------+---------------------+---------+---------------+
1 row in set (0.00 sec)


//Update memory by performing a flush optimizer_costs operation after the update
//But old session s still use old cost data
root@(none) 10:10:12>flush optimizer_costs;
Query OK, 0 rows affected (0.00 sec)

As you can see, the usage is also very simple. It contains two tables: server_cost and engine_cost, which configure the server layer and engine layer respectively.

Related codes:

Global cache Cost_constant_cache

Global cache maintains a current cost model information. User threads determine whether or not they initialize the local pointer when lex_start, and if not, copy the pointer to the local cache.

Initialize global cache:

Cost_constant_cache::init
:

Create Cost_model_constants, which contains two types of information: server layer cost model and engine layer cost model. The class structure is as follows:


Cost_constant_cache ----> Cost_model_constants
                            ---> Server_cost_constants
                                //server_cost
                            ---> Cost_model_se_info 
                                --->SE_cost_constants
                                // engine_cost If the storage engine provides the interface function get_cost_constants, it is taken from the storage engine.
                

Read the configuration from the system table for initialization and flush optimizer_costs and update cache:

read_cost_constants()
|--> read_server_cost_constants
|--> read_engine_cost_constants

Since the user can update the system table dynamically, after flush optimizer_costs is executed, it is possible that the old version is still used by some session s, so the reference count is needed, and the old version ref counter is reduced to zero before it can be released.

Thread cost model initialization

  • Cost_model_server

On each thread's thd, a Cost_model_server object THD::m_cost_model is hung. When lex_start(), if the thread's m_cost_model is not initialized, the global pointer is obtained and stored locally:

Cost_model_server::init

const Cost_model_constants *m_cost_constants = cost_constant_cache->get_cost_constants();
// Add a reference count to ensure that references are not deleted

const Server_cost_constants *m_server_cost_constants = m_cost_constants->get_server_cost_constants();
// Again, the global pointer is obtained

As you can see, thd does not create its own cost model, but only refers to pointers in cache

Table Cost Model

struct TABLE::m_cost_model, type: Cost_model_table

Its value is taken from the cost model object stored in the above thd

Cost_estimate

The unified object type cost_estimate stores the computed cost results, including four dimensions:

  double io_cost;      ///< cost of I/O operations
  double cpu_cost;     ///< cost of CPU operations
  double import_cost;  ///< cost of remote operations
  double mem_cost;     ///< memory used (bytes)

Future

At present, only according to the workload, after adequate testing can we get a reasonable configuration value, but how to configure, what is a reasonable value, I think it should be able to automatically adjust the configuration. The key is to find the corresponding relationship between configuration and hardware conditions. This is also a direction that we can work hard in the future.

reference:

1. Cost Model official document
2. Official Blog 1:The MySQL Optimizer Cost Model Project
3. Official Blog 2: A new dimension to MySQL query optimizations
4. Optimizer Cost Model Improvements in MySQL 5.7.5 DMR
5.Slide: MySQL Cost Model

Related Worklog:
WL#7182: Optimizer Cost Model API 
WL#7209: Handler interface changes for new cost model
WL#7276: Configuration data base for Optimizer Cost Model
WL#7315 Optimizer cost model: main memory management of cost constants
WL#7316 Optimizer cost model: Command for online updating of cost model constants

Histogram

Histogram is also a widely expected function of MySQL, which is actually very common in other database products and can guide the optimizer to choose execution path. The histogram is used to store the data distribution of the specified column. MariaDB supports this from an early version of 10.0.2 function MySQL is also starting to support in the latest 8.0 release

Use

MySQL uses histograms through ANALYZE TABLE Grammar execution:

ANALYZE [NO_WRITE_TO_BINLOG | LOCAL]
    TABLE tbl_name
    UPDATE HISTOGRAM ON col_name [, col_name] ...
        [WITH N BUCKETS]
        
ANALYZE [NO_WRITE_TO_BINLOG | LOCAL]
    TABLE tbl_name
    DROP HISTOGRAM ON col_name [, col_name] ...

Take a simple example:

We use ordinary ones. sysbench The table is an example:

root@sb1 05:16:33>show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=200001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.01 sec)


# Create histograms and store them in a data dictionary

root@sb1 05:16:38>ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k with 10 BUCKETS;
+-------------+-----------+----------+----------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                     |
+-------------+-----------+----------+----------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics created for column 'k'. |
+-------------+-----------+----------+----------------------------------------------+
1 row in set (0.55 sec)

root@sb1 05:17:03>ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k,pad with 10 BUCKETS;
+-------------+-----------+----------+------------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                       |
+-------------+-----------+----------+------------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics created for column 'k'.   |
| sb1.sbtest1 | histogram | status   | Histogram statistics created for column 'pad'. |
+-------------+-----------+----------+------------------------------------------------+
2 rows in set (7.98 sec)

//Delete the histogram on the pad column:
root@sb1 05:17:51>ANALYZE TABLE sbtest1 DROP HISTOGRAM ON pad;
+-------------+-----------+----------+------------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                       |
+-------------+-----------+----------+------------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics removed for column 'pad'. |
+-------------+-----------+----------+------------------------------------------------+
1 row in set (0.06 sec)

root@sb1 05:58:12>ANALYZE TABLE sbtest1 DROP HISTOGRAM ON k;
+-------------+-----------+----------+----------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                     |
+-------------+-----------+----------+----------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics removed for column 'k'. |
+-------------+-----------+----------+----------------------------------------------+
1 row in set (0.08 sec)


# If no bucket is specified, the default number of buckets is 100

root@sb1 05:58:27>ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k;
+-------------+-----------+----------+----------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                     |
+-------------+-----------+----------+----------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics created for column 'k'. |
+-------------+-----------+----------+----------------------------------------------+
1 row in set (0.56 sec)

Histogram statistics are stored in InnoDB data dictionary and can be obtained through information_schema table.

root@information_schema 05:34:49>SHOW CREATE TABLE INFORMATION_SCHEMA.COLUMN_STATISTICS\G
*************************** 1. row ***************************
                View: COLUMN_STATISTICS
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`mysql.infoschema`@`localhost` SQL SECURITY DEFINER VIEW `COLUMN_STATISTICS` AS select `mysql`.`column_statistics`.`schema_name` AS `SCHEMA_NAME`,`mysql`.`column_statistics`.`table_name` AS `TABLE_NAME`,`mysql`.`column_statistics`.`column_name` AS `COLUMN_NAME`,`mysql`.`column_statistics`.`histogram` AS `HISTOGRAM` from `mysql`.`column_statistics` where can_access_table(`mysql`.`column_statistics`.`schema_name`,`mysql`.`column_statistics`.`table_name`)
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)

As you can see from the definition of column_statistics table, there is a system table named mysql.column_statistics, but it is hidden and not exposed.

Here is a simple example:

root@sb1 05:58:55>ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k WITH 4 BUCKETS;
+-------------+-----------+----------+----------------------------------------------+
| Table       | Op        | Msg_type | Msg_text                                     |
+-------------+-----------+----------+----------------------------------------------+
| sb1.sbtest1 | histogram | status   | Histogram statistics created for column 'k'. |
+-------------+-----------+----------+----------------------------------------------+
1 row in set (0.63 sec)

# Histogram information on query tables

root@sb1 06:00:43>SELECT JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS WHERE SCHEMA_NAME='sb1' AND TABLE_NAME = 'sbtest1'\G
*************************** 1. row ***************************
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      38671,
      99756,
      0.249795,
      17002
    ],
    [
      99757,
      100248,
      0.500035,
      492
    ],
    [
      100249,
      100743,
      0.749945,
      495
    ],
    [
      100744,
      172775,
      1.0,
      16630
    ]
  ],
  "data-type": "int",
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2018-09-22 09:59:30.857797",
  "sampling-rate": 1.0,
  "histogram-type": "equi-height",
  "number-of-buckets-specified": 4
}
1 row in set (0.00 sec)

As you can see from the output json, there are four buckets in the histogram generated after executing the above statement. The data type is Int and the type is equi-height, i.e. the contour histogram (the other is the contour histogram, i.e. SINGLETON). In each Bucket, the information described includes the upper and lower bounds of values, frequencies, and the number of different values. Through this information, more accurate data distribution can be obtained, so that the optimizer can decide the better execution plan based on these statistics.

If there are a lot of duplicate values on the column, MySQL may choose the same width histogram. For example, in the previous example, we update the value on column k to half 10 and half 20. Then the histogram data is as follows:

root@sb1 10:41:17>SELECT JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS WHERE SCHEMA_NAME='sb1' AND TABLE_NAME = 'sbtest1'\G
*************************** 1. row ***************************
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      10,
      0.499995
    ],
    [
      20,
      1.0
    ]
  ],
  "data-type": "int",
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2018-09-22 14:41:17.312601",
  "sampling-rate": 1.0,
  "histogram-type": "singleton",
  "number-of-buckets-specified": 100
}
1 row in set (0.00 sec)

For the SINGLETON type, for example, each bucket contains only two values: the column value and the corresponding cumulative frequency (i.e., how many percent of the data is smaller or equal to the value in the current Bucket).

Note that sampling-rate here, where the value is 1, means that all the data on the table is read for statistics, but usually for large tables, we may not want to read too much data, because there may be excessive memory consumption, so MySQL also provides a parameter. histogram_generation_max_mem_size To limit the upper limit of memory usage.

If there are not many DML s on the table, the histogram is basically stable, but it is written frequently, then we may need to update the histogram regularly, MySQL itself will not update actively.

Optimizer calculates column filterability through histogram, and most predicates can be used. Specific reference Official documents

About Histogram Influencing Query Planning, this article Blog And This blog

Related code

Code structure:
Take MySQL 8.0.12 as an example. The main code is in the sql/histogram directory:

ls sql/histograms/
equi_height_bucket.cc  
equi_height_bucket.h  
equi_height.cc  
equi_height.h  histogram.cc  
histogram.h  singleton.cc  
singleton.h  
value_map.cc  
value_map.h  
value_map_type.h


//Class structure:

namespace histograms
|---> Histogram  //base class
        |--> Equi_height //Contour histograms, template classes, instantiation parameters for data types, need to be defined for type display
        // See document "equi_height.cc"
        |--> Singleton
        //Constant width histogram, only values and frequency of their occurrence are stored

Create and store histogram:

The related functions and stacks for dealing with histogram are as follows:

Sql_cmd_analyze_table::handle_histogram_command
|--> update_histogram  //Update histogram
   |-->histograms::update_histogram  //Call the interface function in namespace
        a. Judge each column:
        //Hisgrams:: field_type_to_value_map_type: Check whether column types support
        //covered_by_single_part_index: If the column is Pk or uk, no histogram will be created for it.
        //If it is generated column, it finds the columns it depends on and adds them to the set
        b. The determination of the half-fraction of the sample depends mainly on the parameters. histogram_generation_max_mem_size Restrictions, if set large enough, will read the full table data for analysis
        |-> fill_value_maps   //Start reading column data from tables that need to be analyzed
            |->ha_sample_init
            |->ha_sample_next
                |-->  handler::sample_next //Read the next record and sample with random numbers
            Value_map<T>::add_values // Add read data to map
            |->...
            |->ha_sample_end
        
        |-> build_histogram //Create histogram objects
        a. Determine histogram Type: If the number of values is less than the number of buckets, use Singleton,Otherwise use Equi_height type
            |->Singleton<T>::build_histogram
            |->Equi_height<T>::build_histogram
        
        |-> Histogram::store_histogram //Store histogram information in column_statistic s table
            |-> dd::cache::Dictionary_client::update<dd::Column_statistics>

|--> drop_histogram //Delete histogram
    

Use histogram

The method used is simpler:

First, in the table object TABLE_SHARE, add the member m_histograms, which is structured as an unordered map with key value as field index and value as corresponding histogram object.

The related stacks for obtaining column value filterability are as follows:

get_histogram_selectivity
    |-->Histogram::get_selectivity
        |->get_equal_to_selectivity_dispatcher
        |->get_greater_than_selectivity_dispatcher
        |->get_less_than_selectivity_dispatcher
    |-->write_histogram_to_trace // Write to optimizer_trace

MySQL supports the use of histograms by various types of operations, including:

col_name = constant
col_name <> constant
col_name != constant
col_name > constant
col_name < constant
col_name >= constant
col_name <= constant
col_name IS NULL
col_name IS NOT NULL
col_name BETWEEN constant AND constant
col_name NOT BETWEEN constant AND constant
col_name IN (constant[, constant] ...)
col_name NOT IN (constant[, constant] ...)

Through the histogram, we can judge the filterability of the column values according to the conditions on the column to assist in selecting better execution plans. Before there is no histogram, we need to index columns to obtain relatively accurate column value distribution. But we know that indexes are expensive to maintain, and histograms can be created flexibly on demand.

reference

WL#5384 PERFORMANCE_SCHEMA, HISTOGRAMS
WL#8706 Persistent storage of Histogram data
WL#8707 Classes/structures for Histograms
WL#8943 Extend ANALYZE TABLE with histogram support
WL#9223 Using histogram statistics in the optimizer

Other

Optimizing rec_per_key

Relevant worklog:
WL#7338: Interface for improved records per key estimates
WL#7339 Use improved records per key estimate interface in optimizer

MySQL estimates the number of records through the rec_per_key interface (suggesting the number of records corresponding to each index Key), but in earlier versions this number was an integer, which could not represent the exact rec_per_key for decimals, thus affecting the choice of indexes. In version 5.7, the value of its records was changed to float type.

Introducing data cache state to compute overhead

Relevant worklog:

WL#7168 API for estimates for how much of table and index data that is in memory buffer
WL#7170: InnoDB buffer estimates for tables and indexes
WL#7340 IO aware cost estimate function for data access

In previous versions, the optimizer was unable to know the status of data, whether the cache was in memory or needed to be read out from disk. The lack of this information led to the optimizer's unified view that the data belonged to disk to calculate the overhead. This may lead to inefficient implementation plans.

Related codes:

A new api is added to the server layer to get how many percent of the data on a table or index is stored in cache

 handler::table_in_memory_estimate
 handler::index_in_memory_estimate

In the innodb layer, a global variable buf_stat_per_index (corresponding type buf_stat_per_index_t) is added to maintain the number of leaf page s in memory for each index. A lock-free hash structure is implemented in the inner layer. The Key value is (m_space_id) << 32 | m_index_id). When reading pages or creating new pages in memory, if the corresponding page is leaf, the count increases. When removed from page hash, the count decreases.

In order to reduce the performance, the counter is stored by lock-free hash structure, corresponding to ut_lock_free_hash_t.
The basic idea is: hash is an array of fixed length, array elements are (key, val). According to Key, we calculate a hash value and then modular array size to find the corresponding slot. If the slot is occupied, we look for an idle slot to the right.
When the array is full, a new larger array is created, and all search es need to query two arrays before the data has moved to the new hash. When all records are migrated to a new array and no threads access the old array, the old hash can be deleted.

Counters stored in hash also take into account multi-core and numa architectures to avoid cpu cache failures caused by simultaneous updates. This problem may be obvious in a large number of core scenarios. Innodb encapsulates count operations into the class ut_lock_free_cnt_t, maintains counter using an array, updates the index according to cpu no, and accumulates the value in the array when the counter value is needed.

This Lock free hash is not a hash structure for general scenarios: for example, when dealing with conflicts, it may occupy slots for other key s, and when hash is not enough, it needs to be migrated to a new array. In fact, mysql itself implements an lf_hash. It does not need to migrate data when extending Hash. It has time to start a separate blog.

You can read the number of page s per index cache from the information_schema.innodb_cached_indexes table.

When the interface is defined and Innodb provides the corresponding statistical data, the optimizer can use this information to calculate the overhead:

  • Cost_model_table::page_read_cost
  • Cost_model_table::page_read_cost_index


Original link
This article is the original content of Yunqi Community, which can not be reproduced without permission.

Posted by dreamdelerium on Tue, 14 May 2019 12:46:50 -0700