Understanding the preclean phase of CMS recycler

Keywords: jvm Java Linux Fragment

In "deep understanding of Java virtual machine: advanced features and best practices of JVM (version 2)", the working process of CMS collector is introduced as follows:

CMS collector is based on the "mark clear" algorithm. Its operation process is more complex than the previous collectors. The whole process is divided into four steps, including: CMS initial mark, CMS concurrent mark, CMS remark and CMS concurrent sweep

Many people may only read the introduction of this book (in fact, this should be just the author's summary), and think that the CMS collector only has these four stages. Take a look at the gc log here:

0.245: [GC (CMS Initial Mark) [1 CMS-initial-mark: 32776K(53248K)] 41701K(99328K), 0.0061676 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
0.251: [CMS-concurrent-mark-start]
0.270: [CMS-concurrent-mark: 0.004/0.020 secs] [Times: user=0.08 sys=0.01, real=0.02 secs]
0.270: [CMS-concurrent-preclean-start]
0.272: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
0.272: [CMS-concurrent-abortable-preclean-start]
0.291: [CMS-concurrent-abortable-preclean: 0.004/0.019 secs] [Times: user=0.09 sys=0.00, real=0.02 secs]
0.291: [GC (CMS Final Remark) [YG occupancy: 17928 K (46080 K)]0.291: [Rescan (parallel) , 0.0082702 secs]0.299: [weak refs processing, 0.0000475 secs]0.299: [class unloading, 0.0002451 secs]0.299: [scrub symbol table, 0.0003183 secs]0.300: [scrub string table, 0.0001611 secs][1 CMS-remark: 49164K(53248K)] 67093K(99328K), 0.0091462 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
0.300: [CMS-concurrent-sweep-start]
0.300: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
0.300: [CMS-concurrent-reset-start]
0.300: [CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

Note that there are three phases "concurrent preclean", "concurrent abortable preclean" and "concurrent reset" in the log. Concurrent reset is mainly the process of CMS resetting the internal data structure in order to prepare for the next gc. This article mainly shares my understanding of the two stages of concurrent preclean and concurrent abortable preclean.

concurrent-preclean

The ideal state of generations

We know that jvm uses the idea of divide and conquer to divide the memory into old generation and young generation. In an ideal situation, objects in different generations will not have reference relationship with each other, which belongs to the state of old death and no communication.

At the beginning of gc, in order to mark the living object, the gc thread needs to traverse from gc root along the object reference path (reachability analysis). At this time, in order to get the correct reference relationship, the application thread needs to be suspended, so the cost of traversing the object is very high. Therefore, each generational garbage collector wants to sweep the snow in front of each door and only mark the living objects of its own generation. For example, when young gc encounters a reference to the old generation, the gc thread will stop traversing, because it is only responsible for reclaiming the memory space of the young generation and does not need to access the old generation objects.

But in fact, the objects of the young generation and the old generation are not 100% isolated from each other. Some objects refer to other generational objects, which is called cross generation reference. If the young gc thread only traverses the object references in the young generation, the cross generation references from the old generation to the young generation will be ignored, and the cross generation references of the old generation survival objects will be recycled, which will destroy the operation of the application. The following figure shows the actual object reference. The red arrow represents cross generation reference, which needs to be marked when young gc:

Card Marking

In young gc, in order to find cross generation references, there are usually several methods:

  1. When the object reference path points to the age, continue to traverse the age object to find the cross generation reference
  2. Linear scan for geriatric objects, mark cross generation references, and use sequential reading instead of discrete reading
  3. From the start of the program, a collection is used to record the creation of all cross generation references. When young gc scans the collection for cross generation references to the younger generation

The first two methods need to traverse the aging objects in young gc, because there are many surviving objects in the aging generation and the workload is too large. The jvm uses the third method.

First, it analyzes how cross generational references are generated: there are two conditions for cross generational references (a - > b) from the old generation to the young generation. One is that gc thread moves object a from the young generation to the old generation, and the other is that a itself is the old generation object, and the application thread modifies the reference of a to the young generation B(
There is only the second case for cross generational references from the younger generation to the older generation).

For the cross generation reference created by gc thread itself, it can be recorded directly by gc thread at the time of creation, so the problem becomes: how to record the cross generation reference created when the application thread modifies the object reference?.

In the jvm, divide the old generation into multiple cards (similar to linux memory page) by using the divide and conquer method again. As long as the object reference in the card is modified by the application thread, mark the card as dirty. Then young gc will scan the memory area corresponding to the dirty card in the elderly generation, and record the cross generation references, which is called Card Marking.

The jvm implements the modification of the reference by the thread of the monitor program through the write barrier, and marks the corresponding card. The work mode of the write barrier is similar to that of the agent mode, specifically by adding the modification instruction to the card table when the reference assignment instruction is executed. Take the simplest setFoo(Object bar) method as an example, The assembly instructions compiled by the jvm are as follows , the first line is the assignment instruction, and the following lines mark the card where the modified reference is located as a dirty page, that is, CARD_ TABLE[this address >> 9] = 0:

; rsi is 'this' address
; rdx is setter param, reference to bar
; JDK6:
mov    QWORD PTR [rsi+0x20],rdx  ; this.foo = bar
mov    r10,rsi                   ; r10 = rsi = this
shr    r10,0x9                   ; r10 = r10 >> 9;
mov    r11,0x7ebdfcff7f00        ; r11 is base of card table, imagine byte[] CARD_TABLE
mov    BYTE PTR [r11+r10*1],0x0  ; Mark 'this' card as dirty, CARD_TABLE[this address >> 9] = 0

Summary: the jvm uses card marking to avoid scanning the whole aged living objects when young gc is used. The price is to add additional assembly instructions to implement write barrier and extra memory to save the card table each time the reference is modified.

What preclean did

Now back to the cms collector, in the old generation of gc, card marking is also used. The purpose is not to find cross generation references (cross generation references from the young generation to the old generation are marked by traversing the object from gc root), but to find the object references that were applied and modified in the previous concurrent marking stage.

In the preclean stage, clean the dirty cards generated by these card marking. The cms gc thread will scan the memory area corresponding to the dirty cards, update the previously recorded obsolete reference information, and remove the marks of the dirty cards, as shown in the following figure:

After preclean is executed, the dirty card is cleaned up and the modified reference information is updated.

concurrent-abortable-preclean

The purpose of the concurrent abortable preclean phase is to reduce the burden of the final remark phase, which also scans / cleans the dirty card. The difference between the concurrent abortable preclean phase and the concurrent preclean phase is that the concurrent abortable preclean phase will execute iteratively until the exit condition is met. But concurrent preclean has already dealt with dirty card. Why does the jvm need to perform another similar phase?

Continuous STW

First of all, let's consider this situation: if the final remark phase just starts with young gc (such as ParNew), the application just pauses because of young gc, and then it will pause because of final remark, causing continuous long pauses. In addition, because the young gc thread modifies the reference address of the surviving object, it will generate many objects that need to be rescanned, which increases the workload of final remark.
Therefore, concurrent abortable preclean not only plays the role of clean card, but also plays the role of scheduling the start time of final remark reference resources . According to the cms collector, the most ideal execution time for final remark is when the young generation occupies 50%, which is just in the middle node between the last completion of young gc (0%) and the next start of young gc (100%), as shown in the figure:

configuration parameter

The interrupt condition of abortable preclean. The configuration parameter is - XX: cmsscheduleremarkeedenpenetration = 50 (default value). When the memory occupation of eden area reaches 50%, abortable preclean will be interrupted and final remark will be executed. The corresponding jvm source code The fragments are as follows:

//When the proportion of eden occupation exceeds the configuration, the_ abort_ The preclean tag is assigned a value of true
if ((_collectorState == AbortablePreclean) && !_abort_preclean) {
    size_t used = get_eden_used();
    size_t capacity = get_eden_capacity();
    assert(used <= capacity, "Unexpected state of Eden");
    if (used >  (capacity/100 * CMSScheduleRemarkEdenPenetration)) {
      _abort_preclean = true;
    }
  }

For the trigger condition configuration of abortable preclean, - XX: cmsscheduleremarkeedensizethreshold = 2m (default value), which means that abortable preclean will be executed only when the memory occupied by eden exceeds 2mb, otherwise it is not necessary to execute.

The active exit condition configuration of abortable preclean, - XX: cmsmaxacatableprecleantime = 5000 and cmsmaxacatableprecleanloops, mainly because if the memory occupation of the young generation grows slowly, then abortable preclean will take a long time to execute, maybe because preclean can't catch up with the speed of creating a dirty card in the application process, which will lead to a dirty card More and more, it's better to execute a final remark at this time. The corresponding jvm source code fragment is as follows:

// Try and schedule the remark such that young gen
// occupancy is CMSScheduleRemarkEdenPenetration %.
// Keep the original comments and look at abortable_ Positioning of preclean
void CMSCollector::abortable_preclean() {
  //Check trigger conditions
  if (get_eden_used() > CMSScheduleRemarkEdenSizeThreshold) {

    // Feel the author's tangle, he thinks the current active exit conditions are a bit stupid, FIX ME!!! Ha ha
    // One, admittedly dumb, strategy is to give up
    // after a certain number of abortable precleaning loops
    // or after a certain maximum time. We want to make
    // this smarter in the next iteration.
    // XXX FIX ME!!! YSR
    size_t loops = 0, workdone = 0, cumworkdone = 0, waited = 0;
    //should_abort_preclean will check the above_ abort_ Whether preclean is true
    while (!(should_abort_preclean() ||
             ConcurrentMarkSweepThread::should_terminate())) {
      workdone = preclean_work(CMSPrecleanRefLists2, CMSPrecleanSurvivors2);
      cumworkdone += workdone;
      loops++;
      // Active stop
      if ((CMSMaxAbortablePrecleanLoops != 0) &&
          loops >= CMSMaxAbortablePrecleanLoops) {
        if (PrintGCDetails) {
          gclog_or_tty->print(" CMS: abort preclean due to loops ");
        }
        break;
      }
      if (pa.wallclock_millis() > CMSMaxAbortablePrecleanTime) {
        if (PrintGCDetails) {
          gclog_or_tty->print(" CMS: abort preclean due to time ");
        }
        break;
      }
      // If the work efficiency is not high, take the initiative to pause for a while
      if (workdone < CMSAbortablePrecleanMinWorkPerIteration) {
        // Sleep for some time, waiting for work to accumulate
        stopTimer();
        cmsThread()->wait_on_cms_lock(CMSAbortablePrecleanWaitMillis);
        startTimer();
        waited++;
      }
    }
    //Printing work
    if (PrintCMSStatistics > 0) {
      gclog_or_tty->print(" [%d iterations, %d waits, %d cards)] ",
                          loops, waited, cumworkdone);
    }
  }
  return;
}

Practical verification

The test code is as follows:

 public class DumbObj {
    public DumbObj(int sizeM,DumbObj next) {
        this.data = getM(sizeM);
        this.next = next;
    }
    private Byte[] getM(int m) {
        return new Byte[1024 * 1024 * m];
    }
    private DumbObj next;
    private Byte [] data;
 }
 //Living object
 private static List<DumbObj> liveObjs = new ArrayList<>(5);
 public static void main(String[] args) throws InterruptedException {
        //Create a new object to trigger gc
        for(int i=0;i<25;i++){
            DumbObj dumb = new DumbObj(1, null);
            if(liveObjs.size()<5){
                liveObjs.add(new DumbObj(1, dumb));
            }else{
                dumb.setNext(liveObjs.get(i%5));
            }
        }
        //Waiting for gc thread to work
        TimeUnit.SECONDS.sleep(20);
    }

Add - XX:PrintCMSStatistics=1 in the jvm parameter, and you can see the operation performed by the cms collector in the preclean phase through the gc log:

-Xms101m
-Xmn50m
-Xmx101m
-verbose:gc
-XX:MetaspaceSize=1m
-XX:+UseConcMarkSweepGC
-Xloggc:/tmp/gc.log
-XX:+PrintGCCause
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:PrintCMSStatistics=1
-XX:CMSScheduleRemarkEdenPenetration=50
-XX:CMSScheduleRemarkEdenSizeThreshold=2m
-XX:CMSMaxAbortablePrecleanTime=5000
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50

Run the program and check the gc log: at the time of 0.303, the concurrent preclean starts and scans 5 cards again. At 0.304, the abortable preclean start is started. Multiple threads perform another iteration to scan the dirty card. At 5.324 after 5 seconds, because the maximum running time is reached, they actively exit and start the remark stage.

0.303: [CMS-concurrent-mark: 0.010/0.010 secs] (CMS-concurrent-mark yielded 0 times)
 [Times: user=0.02 sys=0.00, real=0.01 secs]
0.303: [CMS-concurrent-preclean-start]
 (cardTable: 5 cards, re-scanned 5 cards, 1 iterations)
0.304: [CMS-concurrent-preclean: 0.000/0.000 secs] (CMS-concurrent-preclean yielded 0 times)
 [Times: user=0.00 sys=0.00, real=0.00 secs]
0.304: [CMS-concurrent-abortable-preclean-start]
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 (cardTable: 0 cards, re-scanned 0 cards, 1 iterations)
 //With Ellipsis
 CMS: abort preclean due to time  [50 iterations, 49 waits, 0 cards)] 5.324: [CMS-concurrent-abortable-preclean: 0.012/5.020 secs] (CMS-concurrent-abortable-preclean yielded 0 times)
 [Times: user=0.02 sys=0.00, real=5.02 secs]
5.324: [GC (CMS Final Remark) [YG occupancy: 17157 K (46080 K)]5.324: [Rescan (parallel)  (Survivor:0chunks) Finished young gen rescan work in 4th thread: 0.000 sec

Modify - XX: cmsscheduleremarkeedensizethreshold = 50m, which is the same size as the younger generation. Observe the gc log again, and no concurrent abortable preclean phase will occur:

2.296: [CMS-concurrent-mark: 0.010/0.010 secs] (CMS-concurrent-mark yielded 0 times)
 [Times: user=0.02 sys=0.00, real=0.01 secs]
2.296: [CMS-concurrent-preclean-start]
 (cardTable: 1 cards, re-scanned 1 cards, 1 iterations)
2.296: [CMS-concurrent-preclean: 0.000/0.000 secs] (CMS-concurrent-preclean yielded 0 times)
 [Times: user=0.00 sys=0.00, real=0.00 secs]
2.296: [GC (CMS Final Remark) [YG occupancy: 17157 K (46080 K)]2.296: [Rescan (parallel)  (Survivor:0chunks) Finished young gen rescan work in 4th thread: 0.000 sec

summary

Cross generation reference and card marking:

preclean: clean up the dirty card of the card marking tag, and update the reference record

Abortable preclean: adjust the timing of the final remark phase

Reference article:

Brian Goetz's article: https://www.ibm.com/developerworks/library/j-jtp11253/index.html

Introduction to card mark: http://psy-lob-saw.blogspot.com/2014/10/the-jvm-write-barrier-card-marking.html

Understanding preclean: https://stackoverflow.com/questions/44182733/can-someone-explain-what-happens-in-the-concurrent-abortable-preclean-phase-of

jvm source code: https://github.com/JetBrains/jdk8u_hotspot
Pay attention to the official account number one time update

Posted by Muncey on Thu, 25 Jun 2020 03:21:57 -0700