16 day B + tree data structure and basic index structure.md

Keywords: data structure

Advantages and disadvantages of heap table and index organization table?
Advantages of heap table: there is no back table query in heap table, and its cost is the same no matter which index is used.
The disadvantage of heap table: its continuity is not very good. The range query of the primary key is not its strong point.

The fill factor is one sixteenth in MySQL,
Solve the user's pain points
MySQL fill factor

tbs@localhost:[(none)]>show variables like "innodb%fill%";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| innodb_fill_factor | 100   |
+--------------------+-------+
1 row in set (0.00 sec)

innodb_fill_factor
100 remaining 1k
10 means 90%. Generally, this parameter is not modified online.

How to use the b + tree index

tbs@localhost:[dbt3]>show create table orders\G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL,
  `o_custkey` int(11) DEFAULT NULL,
  `o_orderstatus` char(1) DEFAULT NULL,
  `o_totalprice` double DEFAULT NULL,
  `o_orderDATE` date DEFAULT NULL,
  `o_orderpriority` char(15) DEFAULT NULL,
  `o_clerk` char(15) DEFAULT NULL,
  `o_shippriority` int(11) DEFAULT NULL,
  `o_comment` varchar(79) DEFAULT NULL,
  `o_orderdate2` int(11) GENERATED ALWAYS AS ((to_days('2099-01-01') - to_days(`o_orderDATE`))) VIRTUAL,
  PRIMARY KEY (`o_orderkey`),
  KEY `i_o_orderdate` (`o_orderDATE`),
  KEY `i_o_custkey` (`o_custkey`),
  KEY `idx_a_b_c` (`o_custkey`,`o_orderDATE`,`o_orderstatus`),
  KEY `inx_cust_date_status` (`o_custkey`,`o_orderdate2`,`o_orderstatus`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

Index usage scenario 1
Using an index, you can use this column when querying. You can create an index on this column.

tbs@localhost:[dbt3]>explain select * from orders where o_orderkey=1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Deleting indexes online is very fast. You only need to free up the corresponding space.

tbs@localhost:[dbt3]>alter table orders drop index i_o_orderdate;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query after index deletion

tbs@localhost:[dbt3]>explain select * from orders where o_orderdate = '1996-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1483839
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Online tuning uses slow query logs, and unused indexes are recorded.

log_throttle_queries_not_using_indexes=3

I spend most of my time watching slow_log

root@dell-PowerEdge-R740:/var/lib/mysql# mysqldumpslow dell-PowerEdge-R740-slow.log

mysqldumpslow tool
First copy the log, export 10000 lines, and then view it with MySQLdumpslow
taill
Clean up the log and rename the slow query log.
A problem occurred in MySQL version 5.6

-- The time is not formatted
tbs@localhost:[sys]>select * from  x$statement_analysis\G

x$statement_analysis this table is very important

-- How time has passed
tbs@localhost:[sys]>select * from  statement_analysis\G
USE sys;

CREATE ALGORITHM = MERGE DEFINER = `mysql.sys`@`localhost` SQL SECURITY INVOKER VIEW `statement_analysis` AS select
   `sys`.`format_statement` (
   	`performance_schema`.`events_statements_summary_by_digest`.`DIGEST_TEXT`
   ) AS `query`,
   `performance_schema`.`events_statements_summary_by_digest`.`SCHEMA_NAME` AS `db`,

if (
   (
   	(
   		`performance_schema`.`events_statements_summary_by_digest`.`SUM_NO_GOOD_INDEX_USED` > 0
   	)
   	or (
   		`performance_schema`.`events_statements_summary_by_digest`.`SUM_NO_INDEX_USED` > 0
   	)
   ),
   '*',
   ''
) AS `full_scan`,
`performance_schema`.`events_statements_summary_by_digest`.`COUNT_STAR` AS `exec_count`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_ERRORS` AS `err_count`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_WARNINGS` AS `warn_count`,
`sys`.`format_time` (
   `performance_schema`.`events_statements_summary_by_digest`.`SUM_TIMER_WAIT`
) AS `total_latency`,
`sys`.`format_time` (
   `performance_schema`.`events_statements_summary_by_digest`.`MAX_TIMER_WAIT`
) AS `max_latency`,
`sys`.`format_time` (
   `performance_schema`.`events_statements_summary_by_digest`.`AVG_TIMER_WAIT`
) AS `avg_latency`,
`sys`.`format_time` (
   `performance_schema`.`events_statements_summary_by_digest`.`SUM_LOCK_TIME`
) AS `lock_latency`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_SENT` AS `rows_sent`,
round(
   ifnull(
   	(
   		`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_SENT` / nullif(
   			`performance_schema`.`events_statements_summary_by_digest`.`COUNT_STAR`,
   			0
   		)
   	),
   	0
   ),
   0
) AS `rows_sent_avg`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_EXAMINED` AS `rows_examined`,
round(
   ifnull(
   	(
   		`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_EXAMINED` / nullif(
   			`performance_schema`.`events_statements_summary_by_digest`.`COUNT_STAR`,
   			0
   		)
   	),
   	0
   ),
   0
) AS `rows_examined_avg`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_AFFECTED` AS `rows_affected`,
round(
   ifnull(
   	(
   		`performance_schema`.`events_statements_summary_by_digest`.`SUM_ROWS_AFFECTED` / nullif(
   			`performance_schema`.`events_statements_summary_by_digest`.`COUNT_STAR`,
   			0
   		)
   	),
   	0
   ),
   0
) AS `rows_affected_avg`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_CREATED_TMP_TABLES` AS `tmp_tables`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_CREATED_TMP_DISK_TABLES` AS `tmp_disk_tables`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_SORT_ROWS` AS `rows_sorted`,
`performance_schema`.`events_statements_summary_by_digest`.`SUM_SORT_MERGE_PASSES` AS `sort_merge_passes`,
`performance_schema`.`events_statements_summary_by_digest`.`DIGEST` AS `digest`,
`performance_schema`.`events_statements_summary_by_digest`.`FIRST_SEEN` AS `first_seen`,
`performance_schema`.`events_statements_summary_by_digest`.`LAST_SEEN` AS `last_seen`
from
   `performance_schema`.`events_statements_summary_by_digest`
order by
   `performance_schema`.`events_statements_summary_by_digest`.`SUM_TIMER_WAIT` desc

The tables under the sys library are views for easy query.

MySQL 5.6 does not have a sys library
schema_index_statistics view the index usage, and you can see which index is more active.

MySQL 5.6 sys library is installed as the basic configuration.

Assignment 1
Query tables without primary keys?
Assignment 2
Query indexes that have not been used?

There are two tables under the sys library

  • Beginning with a letter: suitable for people to read. The display is a formatted number
  • Beginning with x $: it is suitable for tools to collect data and original class data

General introduction of each type of table

  • sys_ Start with the configuration table in the Library:
  • sys_config is used to configure the sys schema library

View:

host: statistics related to IP packets
InnoDB: InnoDB buffer related information
IO: information related to IO in different dimensions of data
Memory: displays the use of memory by IP, connection, user, allocated type, grouping and total occupation
Metrics: internal statistics of DB
processlist: thread related information (including internal threads and user connections)
ps_: some variables without tool Statistics (no value can be seen)
schema: information related to the table structure, such as auto increment, index, each field type in the table, waiting locks, etc
session: user connection related information
Statement: statement based statistics (repeat store)
Statements_: error statements, full table scanning, long running time and equal sorting (key points)
User: similar to the one starting with host, but based on user grouping statistics
Wait: wait for events. It's professional and difficult to understand.
waits: some delay events counted by IP and user groups, which has a certain reference value.

Common problems based on sys Library

What can sys library do? Let's take a look at the following questions first. Do you have the following questions about the database?

  1. Who uses the most resources? IP based or user based?
  2. Where do most of the connections come from and the SQL sent?
  3. What is a machine that executes multiple SQL statements?
  4. Which file produces the most IO and what is its IO mode?
  5. Which watch has the most IO?
  6. Which table has been visited most?
  7. Which statements are delayed seriously?
  8. Which SQL statements use disk temporary tables
  9. Which table occupies the most buffer pool
  10. How many buffer pools does each library occupy
  11. How much memory is allocated per connection?
  12. Multiple threads are running inside MySQL?

To answer the above questions, we first need to learn some basic knowledge. Now let's take a look at some important view structures:
host_summary

Field namesignificance
hostFrom which server is it connected? If it is NULL, it indicates the internal process
StatementsHow many statements have been executed by this server (counted from startup?)
Statement_latencyThe time this server sends to wait for the statement to execute
Statement_avg_latencyThe average time the server waits for a statement to execute
Table_scansThe number of times the server scanned the table (not all tables)
File_ioThe number of IO event requests for this server
File_io_latencyThe time the server requests to wait for IO
Current_connectionsThe current number of connections to this server
Total_connectionsHow many times does the server connect to the DB
Unique_userThere are several accounts with different user names connected to the server
Current_memoryThe memory occupied by the current connection, etc. on the server
Total_memory_allocatedTotal memory used by requests on this server

https://blog.csdn.net/yanzongshuai/article/details/73441158

Posted by Serpent_Guard on Tue, 26 Oct 2021 06:21:44 -0700