SQL SERVER - Talking about the Solution of Deadlock Monitoring and Analysis
Keywords:
SQL Server
Session
SQL
xml
less
1 background
1.1 Alarm
Recently, I have been sorting out my notes and intend to migrate them all to EVERNOTE. Tidy up to the lock part, there happens to be a case recorded by myself, reorganize and share it to you.
One day at noon, received alarm messages, DB deadlock abnormal, one minute deadlock 120.
The deadlock xml file is as follows:
1 <deadlock-list>
2 <deadlock victim="process810b00cf8">
3 <process-list>
4 <process id="process810b00cf8" taskpriority="0" logused="0" waitresource="RID: 13:1:1541136:62" waittime="7682" ownerId="3396587959" transactionname="UPDATE" lasttranstarted="2016-01-08T12:03:51.067" XDES="0xa99746d08" lockMode="U" schedulerid="41" kpid="17308" status="suspended" spid="108" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2016-01-08T12:03:51.067" lastbatchcompleted="2016-01-08T12:03:51.067" lastattention="1900-01-01T00:00:00.067" clientapp="Microsoft SQL Server Management Studio - query" hostname="test-server" hostpid="1433" loginname="xinysu" isolationlevel="read committed (2)" xactid="3396587959" currentdb="13" lockTimeout="4294967295" clientoption1="671098976" clientoption2="390200">
5 <executionStack>
6 <frame procname="adhoc" line="7" stmtstart="214" stmtend="484" sqlhandle="0x020000003acf4f010561e479685209fb09a7fd15239977c60000000000000000000000000000000000000000">
7 UPDATE FinanceReceiptNoRule SET NowSeqValue=@ReturnNum,ISRUNNING='0',LastWriteTime=GETDATE() WHERE IsRunning='1' AND SeqCode=@SeqCode </frame>
8 </executionStack>
9 <inputbuf>
10 declare @SeqCode varchar(60)
11 declare @ReturnNum bigint
12 set @SeqCode='CGJS20160106'
13 while(1=1)
14 begin
15 UPDATE FinanceReceiptNoRule SET NowSeqValue=@ReturnNum,ISRUNNING='0',LastWriteTime=GETDATE() WHERE IsRunning='1' AND SeqCode=@SeqCode
16 end </inputbuf>
17 </process>
18 <process id="process18fd5d8cf8" taskpriority="0" logused="248" waitresource="KEY: 13:72057594040090624 (b3ade7c5980c)" waittime="4" ownerId="3396522828" transactionname="user_transaction" lasttranstarted="2016-01-08T12:03:05.310" XDES="0x18c1db63a8" lockMode="U" schedulerid="57" kpid="16448" status="suspended" spid="161" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2016-01-08T12:03:58.737" lastbatchcompleted="2016-01-08T12:03:33.847" lastattention="2016-01-08T12:03:33.850" clientapp="Microsoft SQL Server Management Studio - query" hostname="test-server" hostpid="1433" loginname="xinysu" isolationlevel="read committed (2)" xactid="3396522828" currentdb="13" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
19 <executionStack>
20 <frame procname="adhoc" line="6" stmtstart="210" stmtend="400" sqlhandle="0x020000001b4f23368af7bba99098c10dec46585804f1b4ce0000000000000000000000000000000000000000">
21 Update dbo.FinanceReceiptNoRule Set [IsRunning]='1' where SeqCode=@SeqCode and IsRunning='0' </frame>
22 </executionStack>
23 <inputbuf>
24 declare @SeqCode varchar(60)
25 declare @ReturnNum bigint
26 set @SeqCode='CGJS20160106'
27 while(1=1)
28 begin
29 Update dbo.FinanceReceiptNoRule Set [IsRunning]='1' where SeqCode=@SeqCode and IsRunning='0'
30 end
31 </inputbuf>
32 </process>
33 </process-list>
34 <resource-list>
35 <ridlock fileid="1" pageid="1541136" dbid="13" objectname="fin_test.dbo.FinanceReceiptNoRule" id="lock51e8a3980" mode="X" associatedObjectId="72057594040025088">
36 <owner-list>
37 <owner id="process18fd5d8cf8" mode="X" />
38 </owner-list>
39 <waiter-list>
40 <waiter id="process810b00cf8" mode="U" requestType="wait" />
41 </waiter-list>
42 </ridlock>
43 <keylock hobtid="72057594040090624" dbid="13" objectname="fin_test.dbo.FinanceReceiptNoRule" indexname="PK_FINANCERECEIPTNORULE" id="lock7b2c6bc80" mode="U" associatedObjectId="72057594040090624">
44 <owner-list>
45 <owner id="process810b00cf8" mode="U" />
46 </owner-list>
47 <waiter-list>
48 <waiter id="process18fd5d8cf8" mode="U" requestType="wait" />
49 </waiter-list>
50 </keylock>
51 </resource-list>
52 </deadlock>
53 </deadlock-list>
The table structure and simulation data are as follows:
1 --Relevant forms:
2 CREATE TABLE [dbo].[FinanceReceiptNoRule](
3 [SeqCode] [varchar](60) NOT NULL,
4 [NowSeqValue] [bigint] NULL,
5 [SeqDate] [varchar](14) NOT NULL,
6 [IsRunning] [varchar](1) NULL,
7 [LastWriteTime] [datetime] NULL,
8 [Prefix] [varchar](4) NULL
9 ) ON [PRIMARY]
10 GO
11 --Data simulation
12 INSERT [dbo].[FinanceReceiptNoRule] ([SeqCode], [NowSeqValue], [SeqDate], [IsRunning], [LastWriteTime], [Prefix]) VALUES (N'TEST20150108', 1469, N'20150108', N'0', CAST(N'2015-01-08 05:05:49.163' AS DateTime), N'TEST')
13 GO
14 INSERT [dbo].[FinanceReceiptNoRule] ([SeqCode], [NowSeqValue], [SeqDate], [IsRunning], [LastWriteTime], [Prefix]) VALUES (N'TEST20150109', 1377, N'20150109', N'0', CAST(N'2015-01-09 04:50:26.610' AS DateTime), N'TEST')
15 GO
16
17 ALTER TABLE [dbo].[FinanceReceiptNoRule] ADD CONSTRAINT [pk_FinanceReceiptNoRule] PRIMARY KEY NONCLUSTERED
18 (
19 [SeqCode] ASC
20 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
21 GO
1.2 How to monitor
There are many ways to capture deadlocks. Here we introduce two ways: the SQL SERVER Profiler tool and Extended Events. Profiler is relatively resource-intensive, but because it only monitors deadlock, its performance impact is not great, and its visual interface is easy to use; Extended Events consumes less resources, and records the penultimate deadlock in real time, and requires SQL statements to analyze the query record file.
How to use Profiler monitoring?
Open SSMS, click Tools, and select SQL Server Profiler, as shown below.
Log in to the DB instance that needs to be monitored and fill in the corresponding tracking attributes. First, the < General > page, as shown below. Here we pay attention to two aspects: first, choose the < TSQL-Locks > template, which can be used to monitor deadlocks, also can be used to observe the lock application and release situation, very detailed, you can use SELECT UPDATE DELETE and other statements to see the lock application and release situation; second, monitoring resu lt s storage, the proposal can be stored in a form to facilitate regular points. Analysis and statistics.
Then fill in the < Event Selection > item, just select < deadlock graph > Events, no other ticks are needed, and finally click on the run to start monitoring.
A common example for ten thousand years can be used to check whether the monitoring is normal, open three query windows, and execute in the following order will result in resource occupation and application mutually exclusive deadlock. When the fifth step is completed, the deadlock will occur while waiting for 1-3 seconds. The script is provided as follows:
1 --session 1
2 CREATE TABLE Test_DL(
3 id int not null primary key ,
4 name varchar(100));
5
6 INSERT INTO Test_DL(id,name) select 1,'a';
7 INSERT INTO Test_DL(id,name) select 2,'b';
8
9 --session2 2 2 2 2 2 2 2 2 2
10 BEGIN TRANSACTION
11 UPDATE Test_DL SET Name='a-test' WHERE ID=1
12
13 --session3 3 3 3 3 3 3 3 3 3
14 BEGIN TRANSACTION
15 UPDATE Test_DL SET Name='b-test' WHERE ID=2
16
17 --session2 2 2 2 2 2 2 2 2 2
18 SELECT * FROM Test_DL WHERE ID=2
19
20 --session3 3 3 3 3 3 3 3 3 3
21 SELECT * FROM Test_DL WHERE ID=1
Simulated deadlock SQL
The deadlock Interface monitored is as follows:
How to use Extended Events for monitoring?
The script for establishing extended event monitoring is as follows: (Extended event is very good, the 2012 version supports visual operation, and you can learn about it on MSDN if you are interested: https://msdn.microsoft.com/zh-cn/library/bb630282.aspx, this article will not analyze the grammar and other knowledge points)
1 CREATE EVENT SESSION [DeadLock] ON SERVER
2 ADD EVENT sqlserver.xml_deadlock_report
3 ADD TARGET package0.event_file(SET filename=N'F:\events\deadlock\deadlock.xel',max_file_size=(20)),
4 ADD TARGET package0.ring_buffer(SET max_events_limit=(100),max_memory=(10240),occurrence_number=(50))
5 WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
6 GO
Query SQL as follows, here we need to note: query is based on buffer or file analysis, the number of buffer storage is generally limited, such as the above we only allocated 4M storage, file analysis is complete, but it depends on the number of retained files. Here we give buffer's query SQL as follows. File's query is interesting and can be written down by hand.
DECLARE @deadlock_xml XML
SELECT @deadlock_xml=(
SELECT
(
SELECT
CONVERT(XML, target_data)
FROM sys.dm_xe_session_targets st
JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
WHERE s.name = 'deadlock' AND st.target_name = 'ring_buffer'
) AS [x]
FOR XML PATH('') , TYPE
)
SELECT
dateadd(hour,+6,tb.col.value('@timestamp[1]','varchar(max)')) TimePoint,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[1]','VARCHAR(MAX)') statement_parameter_k,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[2]','VARCHAR(MAX)') statement_k,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[3]','VARCHAR(MAX)') statement_parameter,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[4]','VARCHAR(MAX)') [statement],
tb.col.value('(data/value/deadlock/process-list/process/@waitresource)[1]','VARCHAR(MAX)') waitresource_k,
tb.col.value('(data/value/deadlock/process-list/process/@waitresource)[2]','VARCHAR(MAX)') waitresource,
tb.col.value('(data/value/deadlock/process-list/process/@isolationlevel)[1]','VARCHAR(MAX)') isolationlevel_k,
tb.col.value('(data/value/deadlock/process-list/process/@isolationlevel)[2]','VARCHAR(MAX)') isolationlevel,
tb.col.value('(data/value/deadlock/process-list/process/@waittime)[1]','VARCHAR(MAX)') waittime_k,
tb.col.value('(data/value/deadlock/process-list/process/@waittime)[2]','VARCHAR(MAX)') waittime,
tb.col.value('(data/value/deadlock/process-list/process/@clientapp)[1]','VARCHAR(MAX)') clientapp_k,
tb.col.value('(data/value/deadlock/process-list/process/@clientapp)[2]','VARCHAR(MAX)') clientapp,
tb.col.value('(data/value/deadlock/process-list/process/@hostname)[1]','VARCHAR(MAX)') hostname_k,
tb.col.value('(data/value/deadlock/process-list/process/@hostname)[2]','VARCHAR(MAX)') hostname
FROM @deadlock_xml.nodes('//event') as tb(col)
This SQL can query very detailed resource contention situation, if you want to effectively use extended events, we recommend that you take a detailed look at the xml grammar of the official website (SQL SERVER support for xml is also great, looking forward to the 2016 version of json support)
Is it very clear, clear at a glance, with this can be analyzed!?
2 Analysis
Depending on the content of the xml file or the monitoring content of the extended event, you can organize the following information (the deadlock analysis at the beginning):
Look at the execution plans for transaction 1 and transaction 2 as follows:
Combining tables and implementation plans, deadlock processes can be roughly inferred:
Conversation 1:
- Find Index_Page of index page where the key value is located according to the primary key SeqCode, find Index_key of key hashvalue key line above the page, hold IU lock for Index_Page and U lock for Index_key;
- Because the table is a heap table, bookmark lookup is searched by RID, that is, by line identifier, to find the data page where the row data corresponding to RID is located, and then find the row data pointing to the slot number on the page, holding a U lock on the row data;
- At this time, we have found the row data that need to be updated. We can upgrade the IU lock on the data page Data_Page to the IX lock. The row data pointed by RID is upgraded from the U lock to the X lock. After the upgrade, we release the IU lock and the U lock on the index page and the key value row.
- At this point, session 1 holds IX locks on Data_Page and X locks on RID rows.
During this process, Session 2 happens to apply for such a lock:
- Find Lock Resources in Transaction 2According to sys.partitions, you can see that 72057594038910976 is the primary key pk_FinanceReceiptNoRule and the primary key column is SeqCode.
- According to the primary key SeqCode, find Index_Page of index page where the key value is located, find the key line Index_key above the page, hold IU locks for Index_Page and U locks for Index_key.
- Because the table is a heap table, bookmark lookup is searched by RID, that is, by line identifier, to find the data page where the row data corresponding to RID is located, and then find the row data pointing to the slot number on the page, prepare the row data to hold the U lock, but find that session 1 holds the X lock on the RID row, resulting in its application for the U lock Timeout.
- Session 2 holds IU locks on Index_Page, U locks on Index_key, and IU locks on Data_Page, requesting U locks on RID rows.
Suppose that at this point, Session 1 performs an update operation (in the same transaction):
- According to the primary key SeqCode, find the index page Index_Page where the key value is located, find the key line Index_key on the page, hold the IU lock on Index_Page, prepare to hold the U lock on Index_key, but find that Index_key is held by session 2.
Then a deadlock occurs (see figure below for details):
- Session 1 holds IX locks on Data_Page, X locks on RID rows, and applies for U locks on Index_key (waiting for Session 2 to release)
- Session 2 holds IU locks on Index_Page, U locks on Index_key, and IU locks on Data_Page, requesting U locks on RID rows (waiting for Session 1 to release)
3 solution
Without RID lookup, the idea of finding data directly by index will not lead to this deadlock, that is, to re-establish the aggregated index on the primary key and discard the original non-aggregated index primary key. Because this excludes the application and holding of U-lock in RID, it keeps X-lock until the end of the transaction, and can modify the data page where the key value is located directly according to the primary key, thus reducing the time of inquiring rows in RID.
The revised implementation plan is as follows:
The process of applying for release of the lock is as follows (see screenshot for details):
- Find Index_Page of index page where the key value is located according to the primary key SeqCode, find Index_key of key hashvalue key line above the page, hold IU lock for Index_Page and U lock for Index_key;
- Since the table is already an aggregated index table and the page where the primary key is located contains row data, the IU lock can be upgraded to IX lock directly for Index_Page, and the U lock for Index_key can be upgraded to X lock, thus avoiding the lock application of RID for row-by-row data.