C # program memory has been leaking. It turned out to be an asynchronous loop!

Keywords: .NET

1: Background

1. Tell a story

Last month, a friend came to me and said that there was a memory leak in his program. I don't know how to further analyze it. The screenshot is as follows:

My friend, this paragraph has been very concise and comprehensive. Let's talk to windbg.

2: Windbg analysis

1. What is the leakage

According to my friend's description, after the program runs for a period of time, the memory explodes. There should be no casualties, otherwise you won't wx chat with me. Here you can use. Time to see how long the current process has been running.

0:000> .time
Debug session time: Thu Oct 21 14:54:39.000 2021 (UTC + 8:00)
System Uptime: 6 days 4:37:27.851
Process Uptime: 0 days 0:40:14.000
  Kernel time: 0 days 0:01:55.000
  User time: 0 days 0:07:33.000

It can be seen that this dump was caught after the program ran for 40 minutes. Next, let's compare the memory and gc heap ratio of process to see which piece is leaking.

0:000> !address -summary

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                327     7dfc`c665a000 ( 125.987 TB)           98.43%
MEM_RESERVE                             481      201`e91a2000 (   2.007 TB)  99.74%    1.57%
MEM_COMMIT                             2307        1`507f4000 (   5.258 GB)   0.26%    0.00%

0:000> !eeheap -gc
Number of GC Heaps: 2
------------------------------

GC Allocated Heap Size:    Size: 0x139923528 (5260850472) bytes.
GC Committed Heap Size:    Size: 0x13bf23000 (5300695040) bytes.

It is easy to know from the hexagram that this is a complete managed heap memory leak.

2. What occupies so much memory

Knowing that it is the leakage of the hosting layer, I feel happy all at once. Next, use it! dumpheap -stat to see if there are any large objects to dig.

0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ffdeb1fc400  5362921    128710104 xxxBLLs.xxx.BundleBiz+<>c__DisplayClass20_0
00007ffdeaeff140  5362929    171613728 System.Collections.Generic.List`1[[xxx.xxx, xxx]]
00007ffdeaeff640  5362957    171615272 xxx.BLLs.Plan.Dto.xxx[]
00007ffde8171e18 16146362    841456072 System.String
00007ffdeb210098  5362921   1415811144 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
00007ffdea9ca260  5362921   2359685240 xxx.Bundle             

From the output, the memory is mainly consumed by two types of objects: xxx.Bundle and AsyncTaskMethodBuilder, with a quantity of 536w. Here is a very interesting place. If you know asynchrony, I believe you can see what AsyncTaskMethodBuilder + void taskresult is. According to experience, this friend should have mistakenly entered asynchronous infinite recursion, How do you dig? Then look down.

3. Find the problem code

See XXX. BundleBiz + < distributionbundle > d above__ 20? This is the class and method involved in asynchronous operation. Next, ILSpy is used to reflect the anonymous class < distributionbundle > D under BundleBiz__ 20, as shown in the figure below:

Although the source code is found, the code is an asynchronous state machine decompiled by ILSpy. The next question is how to find await and async codes in reverse according to the state machine code? There is a used by function in ILSpy, which can be used here.

Double click used by to see the real calling code, which is simplified as follows:

public async Task DistributionBundle(List<Bundle> list, List<xxx> bwdList, xxx item, List<xxx> sumDetails, List<xxx> details, BundleParameter bundleParameter, IEnumerable<dynamic> labels)
{
    int num = 0;
    foreach (xxx detail in sumDetails)
    {
        IEnumerable<xxx> woDetails = details.Where((xxx w) => w.Size == detail.Size && w.Color == detail.Color);
        foreach (xxx item2 in woDetails)
        {
            xxx
        }
        woDetails = woDetails.OrderBy((xxx s) => s.Seq).ToList();
        num++;
        xxx
        Bundle bundle = new Bundle();
        Bundle bundle2 = bundle;
        bundle2.BundleId = await _repo.CreateBundleId();
        
        foreach (xxx item3 in woDetails)
        {
            item3.TaskQty = item3.WoQty + Math.Ceiling(item3.WoQty * item3.OverCutRate);
            decimal value = default(decimal);
        }

        await DistributionBundle(list, bwdList, item, sumDetails, details, bundleParameter, labels);
    }
}

Take a closer look at the above code. I'll go to await distribution bundle (list, bwdlist, item, sumdetails, details, bundleparameter, labels); It calls itself again, which seems to fall into a dead recursion under some conditions...

Some friends may ask, in addition to experience, can you analyze it from dump? Of course, take one from 500w + and have a look at it! gcroot is enough.

0:000> !DumpHeap /d -mt 00007ffdeb210098
         Address               MT     Size
000001a297913a68 00007ffdeb210098      264     
000001a297913b70 00007ffdeb210098      264  

0:000> !gcroot 000001a297913a68
Thread 5ac:
    000000470B1EE4E0 00007FFE45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
        rbp+10: 000000470b1ee550
            ->  000001A297A25D88 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
            ->  000001A29796D8C0 Microsoft.Extensions.Hosting.Internal.Host
            ...
            ->  000001A298213248 System.Data.SqlClient.TdsParserStateObjectNative
            ->  000001A32E6AB700 System.Threading.Tasks.TaskFactory`1+<>c__DisplayClass38_0`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient],[System.Data.CommandBehavior, System.Data.Common]]
            ->  000001A32E6AB728 System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
            ->  000001A32E6ABB18 System.Threading.Tasks.StandardTaskContinuation
            ->  000001A32E6ABA80 System.Threading.Tasks.ContinuationTaskFromResultTask`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
            ->  000001A32E6AB6C0 System.Action`1[[System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]], System.Private.CoreLib]]
            ->  000001A32E6AB428 System.Data.SqlClient.SqlCommand+<>c__DisplayClass130_0
            ...
            ->  000001A32E6ABC08 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[Dapper.SqlMapper+<QueryRowAsync>d__34`1[[System.String, System.Private.CoreLib]], Dapper]]
            ->  000001A32E6ABD20 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[xxx.DALs.xxx.BundleRepo+<CreateBundleId>d__12, xxx]]
            ->  000001A32E6ABD98 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E6A6BD8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433250520 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E69E0F8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433247D28 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433246330 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E69A568 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433245408 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ...

From the call stack, the code seems to fall into an endless loop in the process of reading records from the database.

4. Why is there no stack overflow

As soon as I see the infinite loop, I believe many friends must ask why there is no stack overflow. After all, the default thread stack space is only 1M, from! From the perspective of gcroot, these references are hung on the 5ac thread, that is, the main thread output below, and the main process stack is also very clean.

0:000> !t
ThreadCount:      30
UnstartedThread:  0
BackgroundThread: 24
PendingThread:    0
DeadThread:       5
Hosted Runtime:   no
                                                                                                            Lock  
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   0    1      5ac 000001A29752CDF0  202a020 Preemptive  0000000000000000:0000000000000000 000001a29754c570 0     MTA 
   4    2     1e64 000001A29752A490    2b220 Preemptive  0000000000000000:0000000000000000 000001a29754c570 0     MTA (Finalizer) 
...


0:000> !clrstack 
OS Thread Id: 0x5ac (0)
        Child SP               IP Call Site
000000470B1EE1D0 00007ffe5eb30544 [GCFrame: 000000470b1ee1d0] 
000000470B1EE318 00007ffe5eb30544 [HelperMethodFrame_1OBJ: 000000470b1ee318] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
000000470B1EE440 00007ffe45103c25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
000000470B1EE4E0 00007ffe45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
000000470B1EE550 00007ffe451032cf System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2861]
000000470B1EE5D0 00007ffe45121b04 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 143]
000000470B1EE600 00007ffe4510482d System.Runtime.CompilerServices.TaskAwaiter.GetResult() [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 106]
000000470B1EE630 00007ffe4de36595 Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(Microsoft.Extensions.Hosting.IHost) [/_/src/Hosting/Abstractions/src/HostingAbstractionsHostExtensions.cs @ 49]
000000470B1EE660 00007ffde80f3b4b xxx.Program.Main(System.String[])
000000470B1EE8B8 00007ffe47c06c93 [GCFrame: 000000470b1ee8b8] 
000000470B1EEE50 00007ffe47c06c93 [GCFrame: 000000470b1eee50] 

If you know a little about asynchronous play, you should know that there is a concept of IO completion port, which can realize the binding of handle and ThreadPool. Infinite recursion is just entered into the waiting callback queue of IO completion port. Theoretically, it has nothing to do with stack space, so there will be no stack overflow.

3: Summary

This memory leak accident is mainly due to the carelessness of the programmer, perhaps the long-term 996 has been in a trance 😂😂😂, With this information, I believe it will be very simple to correct.

Posted by savetime on Wed, 01 Dec 2021 10:38:05 -0800