A Brief Analysis of Logfile Sync Waiting for Super High Case

Keywords: Oracle Database SQL

Monitoring tool DPA found a DB Commit Time indicator alarm on an Oracle database server overseas, which exceeded the red alarm line (about 40 milliseconds, yellow alarm 10 milliseconds, red alarm line 20 milliseconds). As shown in the following screenshot, the AWR report for the corresponding period was generated, and log file sync waited inside Top 5 Timed Events The average wait time for an event is 37 milliseconds, and the average wait time for a log file parallel write event is 40 milliseconds

 

 

 

 

If you are familiar with Tanel Poder:: Understanding LGWR, Log File Sync Waits and Commit Performance s (classic picture below), then by waiting for the event log file sync and log file parallel write Avg Wait(ms) indicator: 37ms & 40ms, you can basically tell where the redo log is locatedDisk I/O on

 

 

 

 

The detailed statistics are then analyzed using the lfsdiag.sql script as follows:

 

 

 

 

HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
 
APPROACH: Look at the wait distribution for log file sync waits
by looking at "wait_time_milli". Look at the high wait times then
see if you can correlate those with other related wait events.
 
   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
         1 log file sync                                          1       4363
         1 log file sync                                          2        835
         1 log file sync                                          4       1650
         1 log file sync                                          8       4937
         1 log file sync                                         16     146252
         1 log file sync                                         32     606674
         1 log file sync                                         64     263377
         1 log file sync                                        128     253254
         1 log file sync                                        256          2
         1 log file switch completion                             1        124
         1 log file switch completion                             2          9
         1 log file switch completion                             4         19
         1 log file switch completion                             8         21
         1 log file switch completion                            16         35
         1 log file switch completion                            32         97
         1 log file switch completion                            64        133
         1 log file switch completion                           128        326
         1 log file switch completion                           256       1736
         1 log file switch completion                           512       3042
         1 log file switch completion                          1024       2020
         1 log file parallel write                                1          0
         1 log file parallel write                                2          0
         1 log file parallel write                                4         80
         1 log file parallel write                                8       2142
         1 log file parallel write                               16     170987
         1 log file parallel write                               32     779205
         1 log file parallel write                               64     311463
         1 log file parallel write                              128      79688
         1 log file parallel write                              256      42763
         1 log file parallel write                              512      13052
         1 log file parallel write                             1024      20468
         1 log file parallel write                             2048      14020
         1 log file parallel write                             4096        921
         1 log file parallel write                             8192         96
         1 log file parallel write                            16384         18
         1 log file parallel write                            32768         18
         1 log file parallel write                            65536          8
         1 log file parallel write                           131072          2
         1 LGWR wait for redo copy                                1       8516
         1 LGWR wait for redo copy                                2         20
         1 LGWR wait for redo copy                                4         19
         1 LGWR wait for redo copy                                8         20
         1 LGWR wait for redo copy                               16         11
 
ORDERED BY WAIT_TIME_MILLI
 
   INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
         1 log file sync                                          1       4363
         1 log file switch completion                             1        124
         1 log file parallel write                                1          0
         1 LGWR wait for redo copy                                1       8516
         1 log file sync                                          2        835
         1 log file switch completion                             2          9
         1 log file parallel write                                2          0
         1 LGWR wait for redo copy                                2         20
         1 log file sync                                          4       1650
         1 log file switch completion                             4         19
         1 log file parallel write                                4         80
         1 LGWR wait for redo copy                                4         19
         1 log file sync                                          8       4937
         1 log file switch completion                             8         21
         1 log file parallel write                                8       2142
         1 LGWR wait for redo copy                                8         20
         1 log file sync                                         16     146252
         1 log file switch completion                            16         35
         1 log file parallel write                               16     170987
         1 LGWR wait for redo copy                               16         11
         1 log file sync                                         32     606674
         1 log file switch completion                            32         97
         1 log file parallel write                               32     779205
         1 log file sync                                         64     263377
         1 log file switch completion                            64        133
         1 log file parallel write                               64     311463
         1 log file sync                                        128     253254
         1 log file switch completion                           128        326
         1 log file parallel write                              128      79688
         1 log file sync                                        256          2
         1 log file switch completion                           256       1736
         1 log file parallel write                              256      42763
         1 log file switch completion                           512       3042
         1 log file parallel write                              512      13052
         1 log file switch completion                          1024       2020
         1 log file parallel write                             1024      20468
         1 log file parallel write                             2048      14020
         1 log file parallel write                             4096        921
         1 log file parallel write                             8192         96
         1 log file parallel write                            16384         18
         1 log file parallel write                            32768         18
         1 log file parallel write                            65536          8
         1 log file parallel write                           131072          2
 
REDO WRITE STATS
 
"redo write time" in centiseconds (100 per second)
11.1: "redo write broadcast ack time" in centiseconds (100 per second)
11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
 
VERSION              INST_ID NAME                                                     VALUE        MILLISECONDS
----------------- ---------- ---------------------------------------- --------------------- -------------------
10.2.0.5.0                 1 redo write time                                        9551524        95515240.000
10.2.0.5.0                 1 redo writer latching time                                   51
10.2.0.5.0                 1 redo writes                                            1434931
 
AWR WORST AVG LOG FILE SYNC SNAPS:

 

As you can see from the above data, the maximum number of log file sync waiting events is in the range of 32ms, and the maximum number of log file parallel writes waiting events is in the range of 32ms. In fact, this value is far more than 7ms, which is extremely abnormal.The log file parallel write event is a wait event exclusive to the LGWR process, which occurs when LGWR writes the redo log information from the log_buffer to a member file of the online redo log file group, on which LGWR waits for the completion of the write process.The event took too long to wait, indicating that the disk on which the log file resides is slow or contentious.Log file sync and log file parallel write are related to each other.In other words, assuming that the log file parallel write takes a very long time, it must cause the log file sync to wait longer.If log file parallel write waits very high, it may generally be a physical disk I/O problem

 

In addition, we also checked the frequency of redo log switching. As shown below, redo log switching is not frequent and the amount of archive logs generated is not large.

 

 

 

 

 

 

Then we found a machine (server with normal index above) to simply test the speed of IO. This method is very simple, that is, to see how long it takes to generate a large file, to simply test the I/O performance (without considering cache, etc., test sampling is not exhaustive), but the comparison data is also basically validated.Certification, feedback disk IO problems).

 

 

 

Problem Server:

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 88.271 seconds, 12.2 MB/s

 

real    1m28.273s

user    0m0.010s

sys     0m0.655s

 

 

 

Compare servers (normal servers):

 

 

 

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 2.48344 seconds, 432 MB/s

 

real    0m2.485s

user    0m0.004s

sys     0m0.386s

 

 

As shown in the above comparison, the two servers generate the same size file, which takes a lot of time and has very different I/O performance. This fully verifies that the alerted server is located in a storage I/O performance problem, but the company division of labor is very clear, and the DBA does not know what is wrong with the underlying storage, so it can only feedback this problemOut and wait for replies from colleagues overseas responsible for maintaining systems and storage.

Posted by Pi_Mastuh on Sat, 07 Sep 2019 09:40:50 -0700