[TOC]
MySQL Create Capable Database Architecture Notes
Database Monitoring
Content to monitor
- Monitor the availability of a database: instead of just monitoring the existence of a database process, connect to the database over the network and determine if it is available
- Monitor database performance:
- QPS
- TPS,
- Number of concurrent threads,
- innnoDB blocking and deadlocks
- Monitor master-slave replication:
- Master-Slave Link Status,
- Master-slave delay,
- Master-Slave Data Consistency
- Monitoring server resources:
- Disk: And that doesn't mean that the disk space is large, so MySQL will be able to use it because the partition allocation may not be large enough.
- CPU usage
- Memory usage
- swap partition usage
- Network IO usage
Database Availability Monitoring
Verify that the database is available for normal connections over the network
Note that if we use SQL to connect to MySQL natively in MySQL, this does not mean that MySQL can also be accessed externally through the tcp/ip protocol, because the external environment is more complex.
For example, tcp/ip is full, so we must actually make connection requests through remote servers:
- Use mysqladmin:
# If the database survives, the command returns mysqld is alive ~ ⌚ 23:30:42 $ mysqladmin -u root -p -h localhost ping Enter password: mysqld is alive
- Use Telnet (typically as manual)
# When a telnet connection succeeds, you know it's successful as long as you don't prompt for a connection failure, but also provide us with an interactive command line ~ ⌚ 23:42:01 $ telnet localhost 3306 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. N 5.6.47-log�+_#,Q4Pv!�"Cjl^uPqe*=4mysql_native_password
- Set up database connections over the network using applications (recommended)
# For example, if we use PHP to do this, we can use PDO to do it, after connecting to MySQL, do something like select 1 <?php $pdo = new PDO('mysql:host=localhost;dbname=test', 'root', '');//Send Connection $result = $pdo->query('select 1')->fetchAll();//runtastic Heart Rate PRO //$result = $pdo->query ('select @@version;') ->fetchAll(); //Or so $pdo = null;//Release Connection var_dump($result); /** [Running] php "/Users/liuhao/Desktop/tmp/pdo.php" array(1) { [0]=> array(2) { [1]=> string(1) "1" [2]=> string(1) "1" } } [Done] exited with code=0 in 0.103 seconds */
- Verify that MySQL can read and write
Because it is possible that the read_only parameter that opens MySQL is not off, on because of incorrect configuration, this database can only read at this time.
For a single-machine service like that, without a master or slave, you can ignore the issue of read-write verification.
At this point, we can create a simple monitoring table for MySQL, such as only two fields, one ID and one time. Then use our monitoring program to read and write the table to monitor whether the database can read and write
#Not to mention here, just change the above code
Monitor the number of connections to the database
There are many situations that can result in MySQL connections being filled, such as blocking, cache penetration, and so on. This can lead to a sharp increase in the number of connections
So you have to keep an eye on the number of connections
- Get the maximum number of MySQL connections
show variables like 'max_connections';
- Get the current number of MySQL connections
show global status like 'Threads_connected'
- Calculate alarm value
# If the current number of connections is > 0.8, you must alert at this time. In fact, in my personal experience, the number > 0.5 should alert, > 0.6 has generally been completed. Threads_connected / max_connections > 0.8
Database Performance Monitoring
Performance monitoring is not used for usability monitoring. Performance monitoring is more about understanding trends in performance.
Performance monitoring is less important than usability monitoring when compared to execution.
Record database status collected during performance monitoring
The following queries,uptime_since_flush_status,Com_Insert,Com_delete can all be obtained by showing the global status like'XXX'. What's more specific is that uptime_since_flush_status is used to get it, which represents the last time FLUSH STATUS was used (in seconds).The FLUSH STATUS command is primarily used to reset the maximum number of connections that have been used simultaneously since the server was started (Max_used_connections)
- Calculate QPS (queries per second)
Qps = (queries2 - queries1) / (uptime_since_flush_status2 - uptime_since_flush_status1)
MySQL's QPS do not refer to every select statement, but to include all SQL statements, so we'll take two samples of queries and calculate MySQL's QPS using their difference_between the two samples
- Calculate TPS (transactions per second)
TPS=((Com_Insert2+Com_update2+Com_delete2) - (Com_Insertl+Com_updatel+Com_deletel)) / (Uptime_since_flush_status2-Uptime_since_flush_statusl)
Here's why I don't record the online calculation by getting Com_commit, because it's obvious that transactions are unreliable.
Monitor the number of concurrent databases
The performance of databases usually decreases as the number of concurrent processing requests increases, and the usage of the cpu is included in the calculations.
- Number of database concurrencies (see the number of threads currently running in the database)
show global status like 'Threads_connected'
The number of concurrent processes is usually much smaller than the number of threads connected to the database at the same time
It is important to note that, for example, a large amount of data congestion can lead to a surge in concurrency, and a high number will not overwhelm the number of available connections in an instant.
-
Monitor innodb blocking
Myisam uses table-level locks and generally does not block, not considered here, but instead obtains myisam's blocking by monitoring the slow query log while optimizing table queries
Here's how to view it:
Note that root privilege information_schema is required here. This library is not authorized by MySQL and is not valid for myisam. It is valid only for innodb.
Blocked thread ID s can be captured accurately, but the blocked thread's SQL may not be captured because it is possible that the thread has executed multiple SQLs. When we go to the query, the SQL is finished. We can't catch it.
The following >10 seconds can be done according to your business needs, personal recommendation 30 seconds is already long! Recommend 30 seconds or 60 seconds
SELECT b.trx_mysql_thread_id AS 'Blocked Threads', b.trx_query AS 'Blocked SQL', c.trx_mysql_thread_id AS 'Blocking Threads', c.trx_query AS 'block SQL', (unix_timestamp() - unix_timestamp(c.trx_started)) AS 'Blocking time' FROM `information_schema`.INNODB_LOCK_WAITS AS a join `information_schema`.INNODB_TRX AS b ON a.requesting_trx_id = b.trx_id join `information_schema`.INNODB_TRX AS c ON a.blocking_trx_id = c.trx_id WHERE (unix_timestamp() - unix_timestamp(c.trx_started)) > 10;
-
Verify the accuracy of blocking query statements
Required SQL
# View the ID of the current session select connection_id(); # Sets the time (in seconds) InnoDB transactions wait for row locks before abandoning set global innodb_lock_wait_timeout=120; # Select Library use test; # Open Transaction begin; # Locking select * from user for update; # RollBACK rollback;
- Execute lock in session1 window
mysql> select connection_id(); +-----------------+ | connection_id() | +-----------------+ | 191 | +-----------------+ 1 row in set (0.00 sec) mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> set global innodb_lock_wait_timeout=120; Query OK, 0 rows affected (0.00 sec) mysql> begin; Query OK, 0 rows affected (0.01 sec) mysql> select * from user for update; +----+--------+-----+ | id | name | age | +----+--------+-----+ | 1 | liuhao | 0 | | 2 | | 0 | +----+--------+-----+ 2 rows in set (6.09 sec)
- Locking in session2, session2 should normally be blocked by session1
mysql> select connection_id(); +-----------------+ | connection_id() | +-----------------+ | 192 | +-----------------+ 1 row in set (0.00 sec) mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> select * from user for update; # The current window is blocked at this time
- Query blocking in session3
mysql> SELECT b.trx_mysql_thread_id AS 'Blocked Threads', -> b.trx_query AS 'Blocked SQL', -> c.trx_mysql_thread_id AS 'Blocking Threads', -> c.trx_query AS 'block SQL', -> (unix_timestamp() - unix_timestamp(c.trx_started)) AS 'Blocking time' -> FROM `information_schema`.INNODB_LOCK_WAITS AS a -> join `information_schema`.INNODB_TRX AS b ON a.requesting_trx_id = b.trx_id -> join `information_schema`.INNODB_TRX AS c ON a.blocking_trx_id = c.trx_id -> WHERE (unix_timestamp() - unix_timestamp(c.trx_started)) > 10; +-----------------+-------------------------------+--------------+-----------+--------------+ | Blocked Threads | Blocked SQL | Blocking Threads | block SQL | Blocking time | +-----------------+-------------------------------+--------------+-----------+--------------+ | 192 | select * from user for update | 191 | NULL | 127 | +-----------------+-------------------------------+--------------+-----------+--------------+ 1 row in set (0.00 sec)
At this point, you can see that the blockage is correctly captured, which really fits our experimental process.
It is emphasized here that you can prepare to grab the blocked thread ID, but you may not be able to grab the blocked thread's SQL because it is possible that the thread has executed multiple SQLs. When we go to the query, the SQL is already executed. We will not be able to grab it.
MySQL Master-Slave Replication Monitoring
Monitor master-slave replication links
View master-slave replication links under slave Libraries
# View Slave Library Status show slave status; # If the following two results are not yes, the slave library is considered complete and the replication link is suspended slave_io_running slave_sql_running
Monitor delays in master-slave replication
- Simple get master-slave delay:
Network 100% of stable servers can do this.
This is not fully prepared, because it is obtained by the primary Library Based on the time difference between the binlog on the primary library and the binlog log synchronized to and reexecuted from the library.
When there is a problem with the network, it will be inaccurate.
# Query the status of slave libraries in the master library show slave status # Judging by the fields below, a second is returned here seconds_behind_master
- Perfect acquisition master-slave delay:
At this point we need a multi-process program to monitor both master and slave binglog file numbers and binlog offsets
- Master Library Monitoring
# View Master Library Status show master # Judge by the following status file: mysql-bin.00001 # binlog file number position: 300000 # binlog offset
- Monitoring from Library
View the name and offset of the binary binlog that is currently being transferred from master to slave
# View Slave Library Status show slave status master_log_file: mysql-bin.0001 # binlog file number read_master_log_pos: 300000 # binlog offset
- Name and offset of binary log on completed master
exec_master_log_pos:300000 relay_log_space:300001
In this case, compare the three returns above, and if their file names are the same and their offsets are the same, there is never any delay for the master.
Master-Slave Data Consistency Check
Note: I don't have a test to verify the function, I just learned it
If the master-slave problem occurs, we need to detect the master-slave data after we have fixed it
Use pt-table-checksum tool for automatic detection
#Just runs in the primary library, which automatically discovers all the slave information under the primary library and detects all the specified data information from the slave Library pt-table-checksum u=user name, p='password'\ --databases library name\ --replicate test.checksums #Under the test library, create the checksums table and write the data to the checksum table