Smart use of Event s to discover problems

Keywords: Database MySQL github mysqlbinlog

Welcome to my In-depth Understanding MySQL Principle 32, as follows:

With a previous understanding of Events, we can use them to do some work.After learning these commonly used Events, I used C to write a tool for parsing Events. I called it'infobin', which means extracting information from a binary log.As far as I know, there are some friends who use this tool even though BUG is a rare occurrence.I'm not here to promote my tools, but to tell you this way of thinking.I have completed this tool in conjunction with some problems encountered in my work. The main functions are as follows:

Analyzing whether there are long-term uncommitted transactions in the binary log will result in more lock contention.
Analyze whether there are large transactions in the binary log, and committing large transactions may block other transactions.
Analyze how many DML Event s each table in the binary log generates so you know which table has the most modifications.
Analyze the speed at which Events are generated in the binary log so you know which time period produces more Events.

Here is the address of the tool for your reference (I haven't updated it for a long time): https://github.com/gaopengcarl/infobin

The help information for this tool is as follows:

[root@gp1 infobin]#  ./infobin 
USAGE ERROR!
[Author]: gaopeng [QQ]:22389860 [blog]:http://blog.itpub.net/7728585/ 
--USAGE:./infobin binlogfile pieces bigtrxsize bigtrxtime [-t] [-force]
[binlogfile]:binlog file!
[piece]:how many piece will split,is a Highly balanced histogram,
        find which time generate biggest binlog.(must:piece<2000000)
[bigtrxsize](bytes):larger than this size trx will view.(must:trx>256(bytes))
[bigtrxtime](sec):larger than this sec trx will view.(must:>0(sec))
[[-t]]:if [-t] no detail is print out,the result will small
[[-force]]:force analyze if unkown error check!!

Next, let's take a look at how these functions are probably implemented.

1. Analyzing long-term uncommitted transactions

As I've mentioned several times before, a manually committed transaction has the following characteristics, which are listed in the'Insert'statement:

GTID_LOG_EVENT and XID_EVENT are the times when the command'COMMIT'was initiated.
QUERY_EVENT was the time when the first'Insert'command was launched.
MAP_EVENT/WRITE_ROWS_EVENT is the time at which each'Insert'command is initiated.

In fact, we can use (1) minus (2) to get the time taken between the first'DML'command initiation and the'COMMIT' command initiation, then use a user input parameter to customize how long uncommitted transactions are called long uncommitted transactions, and use bigtrxtime as this input in my tool.Let's take an example and say the following:

Sentence	time
begin	T1
insert into testrr values(20);	11:25:22
insert into testrr values(30);	11:25:26
insert into testrr values(40);	11:25:28
commit;	11:25:30

Let's look at the sequence and time of events as follows:

Event	time
GTID_LOG_EVENT	11:25:30
QUERY_EVENT	11:25:22
MAP_EVENT (first insert)	11:25:22
WRITE_ROWS_EVENT (first insert)	11:25:22
MAP_EVENT (second insert)	11:25:26
WRITE_ROWS_EVENT (2nd insert)	11:25:26
MAP_EVENT (3rd insert)	11:25:28
WRITE_ROWS_EVENT (3rd insert)	11:25:28
XID_EVENT	11:25:30

If we use the time of the last XID_EVENT minus the time of QUERY_EVENT, the transaction is calculated from the beginning of the first statement to the time of'COMMIT'.One thing to note is that the'BEGIN'command is not actually recorded in Event, it just makes a mark so that transactions do not automatically enter the commit process. Here's my brief article on what the'BEGIN' command does: https://www.jianshu.com/p/6de1e8071279

2. Analyzing Big Business

This part of the implementation is simple. We only need to scan all the Events between GTID_LOG_EVENT and XID_EVENT for each transaction and calculate their sum to get the size of each transaction generating Event (but it is best to calculate the total number of Events between QUERY_EVENT and XID_EVENT for compatibility).Another user input parameter to customize how large a transaction is is called a big transaction, and bigtrxsize is used as the input parameter in my tool.

If the parameter binlog_row_image is set to'FULL', we can roughly calculate the size of the log generated by each row of data modifications for a particular table:

'Insert'and'Delete': Because there is only before_image or after_image, 100-byte rows of data add some extra overhead, about 10 bytes or 110 bytes per row.If a large transaction is located at 100M, the modification is approximately 100W rows of data.
'Update': Because it contains before_image and after_image, the 110 bytes calculated above need to be multiplied by 2.Therefore, if a large transaction is located at 100M, the modification is approximately 50W rows of data.

I think 20M is a good definition for big transactions, but this is calculated according to your needs.

3. Analyzing the generation speed of Event s in binary log

This is a simple implementation. We just need to slice the binary log according to the input parameters. By counting the time difference between the end Event and the start Event, we can figure out how long each slice took to generate. Our tool uses piece as the incoming parameter of the slice.From this analysis, we can probably know which period of time Events generate more, which also reflects the busy level of the database.

4. Analyzing how many DML Event s are generated per table

This function is also very useful, from which analysis we can know which table in the database has the most modifications.The main way to do this is to scan the MAP_EVENT in the binary log and the next DML Events, get the table name through the table id, then put the size of the DML Events into this table, make a chain table, and sort the output.But as we said before, table IDS can change even in a single transaction, which I didn't take into account at first, so there are problems with this tool, but most of them work well.

5. Tool Display

Let me show you what I've said about these features, and I've done the following:

mysql> flush binary logs;
Query OK, 0 rows affected (0.51 sec)

mysql> select count(*) from tti;
+----------+
| count(*) |
+----------+
|    98304 |
+----------+
1 row in set (0.06 sec)

mysql> delete from tti;
Query OK, 98304 rows affected (2.47 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tti values(1,'gaopeng');
Query OK, 1 row affected (0.00 sec)

mysql> select sleep(20);
+-----------+
| sleep(20) |
+-----------+
|         0 |
+-----------+
1 row in set (20.03 sec)

mysql> commit;
Query OK, 0 rows affected (0.22 sec)

mysql> insert into tpp values(10);
Query OK, 1 row affected (0.14 sec)

In the example, I switched a binary log and did three transactions at the same time:

A total of 98304 rows of tti table data were deleted.
Inserted a piece of data into the tti table and waited for more than 20 seconds to submit.
A piece of data was inserted into the tpp table.

Let's use the tools to analyze the following statistical output:

./infobin mysql-bin.000005 3 1000000 15 -t > log.log
more log.log 

...
-------------Total now--------------
Trx total[counts]:3
Event total[counts]:125
Max trx event size:8207(bytes) Pos:420[0X1A4]
Avg binlog size(/sec):9265.844(bytes)[9.049(kb)]
Avg binlog size(/min):555950.625(bytes)[542.921(kb)]
--Piece view:
(1)Time:1561442359-1561442383(24(s)) piece:296507(bytes)[289.558(kb)]
(2)Time:1561442383-1561442383(0(s)) piece:296507(bytes)[289.558(kb)]
(3)Time:1561442383-1561442455(72(s)) piece:296507(bytes)[289.558(kb)]
--Large than 500000(bytes) trx:
(1)Trx_size:888703(bytes)[867.874(kb)] trx_begin_p:299[0X12B] 
trx_end_p:889002[0XD90AA]
Total large trx count size(kb):#867.874(kb)
--Large than 15(secs) trx:
(1)Trx_sec:31(sec)  trx_begin_time:[20190625 14:00:08(CST)] 
trx_end_time:[20190625 14:00:39(CST)] trx_begin_pos:889067
trx_end_pos:889267 query_exe_time:0 
--Every Table binlog size(bytes) and times:
Note:size unit is bytes
---(1)Current Table:test.tpp::
   Insert:binlog size(40(Bytes)) times(1)
   Update:binlog size(0(Bytes)) times(0)
   Delete:binlog size(0(Bytes)) times(0)
   Total:binlog size(40(Bytes)) times(1)
---(2)Current Table:test.tti::
   Insert:binlog size(48(Bytes)) times(1)
   Update:binlog size(0(Bytes)) times(0)
   Delete:binlog size(888551(Bytes)) times(109)
   Total:binlog size(888599(Bytes)) times(110)
---Total binlog dml event size:888639(Bytes) times(111)

We found that we did all the statistics:

Including a large transaction log of more than 500K in total and about 800K in size is the result of my deletion of 98304 rows of data from the tti table.

--Large than 500000(bytes) trx:
(1)Trx_size:888703(bytes)[867.874(kb)] trx_begin_p:299[0X12B] 
trx_end_p:889002[0XD90AA]

Includes a long uncommitted transaction for 31 seconds, which is caused by my intentional wait for more than 20 seconds to commit.

--Large than 15(secs) trx:
(1)Trx_sec:31(sec)  trx_begin_time:[20190625 14:00:08(CST)]
 trx_end_time:[20190625 14:00:39(CST)] trx_begin_pos:889067
 trx_end_pos:889267 query_exe_time:0

This binary log has two table modification records, tti and tpp. The tti table has'Delete'and'Insert' operations, while the TPP table only has'Insert'operations and contains the size of the log volume.

--Every Table binlog size(bytes) and times:
Note:size unit is bytes
---(1)Current Table:test.tpp::
   Insert:binlog size(40(Bytes)) times(1)
   Update:binlog size(0(Bytes)) times(0)
   Delete:binlog size(0(Bytes)) times(0)
   Total:binlog size(40(Bytes)) times(1)
---(2)Current Table:test.tti::
   Insert:binlog size(48(Bytes)) times(1)
   Update:binlog size(0(Bytes)) times(0)
   Delete:binlog size(888551(Bytes)) times(109)
   Total:binlog size(888599(Bytes)) times(110)
---Total binlog dml event size:888639(Bytes) times(111)

Well, what I want to tell you here is that after learning Event, you can try to parse binary log s in different languages yourself, and perhaps you can write better tools to do more.

Of course, you can parse through mysqlbinlog and count through shell/python, but this tool is much faster than this.

Author: gaopengtttt

Links: https://www.jianshu.com/p/45c22e788754

Posted by Steven_belfast on Fri, 03 Jan 2020 07:11:01 -0800

Programmer Group