Very simple string resident pool, do you really know about it

Keywords: C# Windows

Yesterday, I saw the discussion of string resident pool in C เท in the group. It's so hot. After several rounds of discussion, there are lots of theories, but it's awkward to provide evidence. Although this is very basic, a better answer is not so easy. I will share this article with you in my ability

1: Ubiquitous pool

After so many years of development, I believe you are familiar with the concept of "pool". The existence of connection pool, thread pool, object pool and resident pool here is for reuse and sharing. Dulle is not as good as zhonglele. After all, the generation and destruction of a string wastes space and time, and it is better to keep it first.

1. Talk about the phenomenon

We usually assume that if we define several string variables, several string objects will be allocated on the heap. In fact, there is a technology called resident pool in the bottom layer that can be used to allocate only one string object on the heap if the two strings have the same content, and then allocate the reference address to the two string variables, which can greatly reduce the memory usage, such as If you use code, it's like this.

        public static void Main(string[] args)
        {
            var str1 = "nihao";
            var str2 = "nihao";

            var b = string.ReferenceEquals(str1, str2);
            Console.WriteLine(b);
        }

----------- output -----------
True

2. Implementation principle

How can we do that? In fact, when CLR calls JIT at runtime to convert your MSIL code into machine code, it will find that your metadata defines string objects with the same content, and CLR will put your string into its private internal dictionary, where key is the string content, value is the string reference address allocated on the heap, and this dictionary is the so-called resident pool, if not I understand. Let me draw a picture.

3. windbg verification

You can use windbg to see if both str1 and str2 on the stack point to the address of the object on the heap.

~0s - >! Clrstack - L finds the variables str1 and str2 on the thread stack of the main thread

0:000> ~0s
ntdll!ZwReadFile+0x14:
00007ff8`fea4aa64 c3              ret
0:000> !clrstack -l
OS Thread Id: 0x1c1c (0)
        Child SP               IP Call Site

000000ac0b7fed00 00007ff889e608e9 *** WARNING: Unable to verify checksum for ConsoleApp2.exe
ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 30]
    LOCALS:
        0x000000ac0b7fed38 = 0x0000024a21f22d48
        0x000000ac0b7fed30 = 0x0000024a21f22d48

000000ac0b7fef48 00007ff8e9396c93 [GCFrame: 000000ac0b7fef48] 

From the above code's LOCALS 0x000000ac0b7fed38 = 0x0000024a21f22d48 and 0x000000ac0b7fed30 = 0x0000024a21f22d48, it can be seen that the reference addresses of the two local variables are 0x0000024a21f22d48, indicating that they point to a heap object. Next, type the contents on the heap.

0:000> !do 0x0000024a21f22d48
Name:        System.String
MethodTable: 00007ff8e7a959c0
EEClass:     00007ff8e7a72ec0
Size:        36(0x24) bytes
File:        C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      nihao
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e7a985a0  4000281        8         System.Int32  1 instance                5 m_stringLength
00007ff8e7a96838  4000282        c          System.Char  1 instance               6e m_firstChar
00007ff8e7a959c0  4000286       d8        System.String  0   shared           static Empty
                                 >> Domain:Value  0000024a203d41c0:NotInit  <<

As you can see, it is the System.String object, which is consistent with my figure.

II. Verification of resident pool

1. Resident pool verification method under string

Unfortunately, the level is limited. As the resident pool is neither in the heap nor on the stack, we do not know how to use windbg to print the contents of the dictionary of the resident pool in the CLR, but it can also be verified by string.Intern.

        //
        // Summary:
        //     Retrieves the system's reference to the specified System.String.
        //
        // Parameters:
        //   str:
        //     A string to search for in the intern pool.
        //
        // Returns:
        //     The system's reference to str, if it is interned; otherwise, a new reference
        //     to a string with the value of str.
        //
        // Exceptions:
        //   T:System.ArgumentNullException:
        //     str is null.
        [SecuritySafeCritical]
        public static String Intern(String str);

As can be seen from the comments, the meaning of this method is: if the str you defined exists in the resident pool, then the heap reference address of hit content in the resident pool will be returned. If it does not exist, insert the new string into the resident pool and then return the heap reference. First, the previous generation code:

        public static void Main(string[] args)
        {
            var str1 = "nihao";
            var str2 = "nihao";

            //Verify that nihao is in the resident pool, and if so, the same reference as str3 and str1 and str2
            var str3 = string.Intern("nihao");

            //Verify that the new string content enters the resident pool
            var str4 = string.Intern("cnblogs");
            var str5 = string.Intern("cnblogs");

            Console.ReadLine();
        }

Next, verify whether str3 is the same reference as str1 and str2, and whether str5 exists in the resident pool.

ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 37]
    LOCALS:
        0x00000047105fea58 = 0x0000018537312d48
        0x00000047105fea50 = 0x0000018537312d48
        0x00000047105fea48 = 0x0000018537312d48
        0x00000047105fea40 = 0x0000018537312d70
        0x00000047105fea38 = 0x0000018537312d70

From the five variable addresses, we can see that nihao has been shared by str1, str2 and str3, and cnblogs has also entered the resident pool to realize sharing.

2. Whether the same string enters the resident pool during operation

There is a pit in it. The same strings discussed above are known at compile time, but will the same strings in runtime also enter the resident pool? This is a topic full of curiosity. You can try to accept the IO input hello when the program is running to see if you can share the reference address with str1 and str2.

        public static void Main(string[] args)
       {
           var str1 = "nihao";
           var str2 = "nihao";

           var str3 = Console.ReadLine();

           Console.WriteLine("Input completed!");
           Console.ReadLine();
       }

0:000> !clrstack -l
000000f6d35fee50 00007ff889e7090d *** WARNING: Unable to verify checksum for ConsoleApp2.exe
ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 33]
   LOCALS:
       0x000000f6d35fee98 = 0x000002cb1a552d48
       0x000000f6d35fee90 = 0x000002cb1a552d48
       0x000000f6d35fee88 = 0x000002cb1a555f28
0:000> !do 0x000002cb1a555f28
Name:        System.String
MethodTable: 00007ff8e7a959c0
EEClass:     00007ff8e7a72ec0
Size:        36(0x24) bytes
File:        C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      nihao
Fields:
             MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e7a985a0  4000281        8         System.Int32  1 instance                5 m_stringLength
00007ff8e7a96838  4000282        c          System.Char  1 instance               6e m_firstChar
00007ff8e7a959c0  4000286       d8        System.String  0   shared           static Empty
                                >> Domain:Value  000002cb18ad39f0:NotInit  <<


As can be seen from the above content, the reference address received from Console.ReadLine is 0x000002cb1a55f28. Although it is the same content, the resident pool is not used. This is because the resident pool has been resolved in the JIT static resolution period, and it cannot enjoy the advantages of reuse. If you want to reuse it, you can use the package layer 1 of Console.ReadLine() Just string.Intern, as shown below:

        public static void Main(string[] args)
        {
            var str1 = "nihao";
            var str2 = "nihao";

            var str3 = string.Intern(Console.ReadLine());

            Console.WriteLine("Input complete!");
            Console.ReadLine();
        }

ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 33]
    LOCALS:
        0x0000008fac1fe9c8 = 0x000001ff46582d48
        0x0000008fac1fe9c0 = 0x000001ff46582d48
        0x0000008fac1fe9b8 = 0x000001ff46582d48

You can see that str1, str2 and str3 share a memory address 0x000001ff46582d48 at this time.

4: Summary

Resident pool technology is very ๐Ÿฎ๐Ÿ‘ƒ It is a good solution to the problem of repeated string allocation on the heap, which greatly reduces the memory consumption of the heap. However, it is also necessary to understand the solution that the IO input in the runtime cannot share the resident pool.

Well, I'll talk about it here. I hope it can help you!

If you have more questions to interact with me, please come in under the scan~

Posted by stride-r on Tue, 28 Apr 2020 17:46:35 -0700