Java Reference Type Principle in-depth analysis, read the article, 90% of the people are in the collection

Keywords: Java jvm JDK

There are four reference types in Java (there are actually other reference types such as FinalReference): strong reference, soft reference, weak reference, virtual reference. Strong references are Object a = new Object(), which we often use; in this form, there is no corresponding Reference class in Java.

This article mainly analyses the implementation of soft reference, weak reference and virtual reference. These three types of reference are inherited from Reference, and the main logic is also in Reference.

problem

Before analyzing, how many questions should we first throw out?

1. Most articles on the Internet introduce soft references as follows: when there is insufficient memory, it will be recycled. What is the definition of insufficient memory? What is called insufficient memory?

2. Most articles on the Internet introduce virtual references as follows: virtual references do not determine the life cycle of the object. It is mainly used to track the activities of objects being recycled by garbage collectors. Is that true?

3. What scenarios are virtual references used in Jdk?

Reference

Let's first look at several fields in Reference.java

public abstract class Reference<T> {
    //Referenced objects
    private T referent;        
	//Recycle queue, specified by the user in the Reference constructor
    volatile ReferenceQueue<? super T> queue;
 	//When the reference is added to the queue, the field is set to the next element in the queue to form a linked list structure.
    volatile Reference next;
    //In GC, the JVM bottom maintains a linked list called DiscoveredList, which stores the Reference object. The discovered field points to the next element in the linked list, which is set by JVM.
    transient private Reference<T> discovered;  
	//Lock object for thread synchronization
    static private class Lock { }
    private static Lock lock = new Lock();
	//The Reference object waiting to join queue, which is set by JVM in GC, will have a java layer thread (Reference Handler) continuously extracting elements from pending to join queue.
    private static Reference<Object> pending = null;
}

The life cycle of a Reference object is as follows:



It is mainly divided into two parts: the Native layer and the Java layer.

The Native layer will add the recovered Reeference object to the DiscoveredList (the code is in the process_discover_references method of reference Processor. cpp), and then move the elements of DiscoveredList to PendingList (the code is in the enqueue_discover_ref_er method of reference Processor. cpp), PendingList. The head of the team is the pending object in the Reference class.

Look at the Java layer of code

private static class ReferenceHandler extends Thread {
     	...
        public void run() {
            while (true) {
                tryHandlePending(true);
            }
        }
  } 
static boolean tryHandlePending(boolean waitForNotify) {
        Reference<Object> r;
        Cleaner c;
        try {
            synchronized (lock) {
                if (pending != null) {
                    r = pending;
                 	//If it's a Cleaner object, record it, and do the special processing below.
                    c = r instanceof Cleaner ? (Cleaner) r : null;
                    //Point to the next object of PendingList
                    pending = r.discovered;
                    r.discovered = null;
                } else {
                   //If pending is null, wait first. When an object is added to the Pending List, the jvm executes notify
                    if (waitForNotify) {
                        lock.wait();
                    }
                    // retry if waited
                    return waitForNotify;
                }
            }
        } 
        ...

        //If the CLeaner object is present, the clean method is called for resource recovery
        if (c != null) {
            c.clean();
            return true;
        }
		//By adding Reference to Reference Queue, developers can sense the event that the object is recycled through poll elements in Reference Queue.
        ReferenceQueue<? super Object> q = r.queue;
        if (q != ReferenceQueue.NULL) q.enqueue(r);
        return true;
 }

The process is relatively simple: extracting elements from Pending List and adding them to Reference Queue in an endless stream, developers can sense the event that the object is recycled by poll ing elements in Reference Queue.

In addition, it should be noted that there will be additional processing for objects of Cleaner type (inheriting dummy references): when the objects it points to are recycled, the clean method will be called. This method is mainly used to do the corresponding resource recovery. In the out-of-heap memory DirectByteBuffer, Cleaner is used to recycle out-of-heap memory, which is also true. It is a typical application of virtual reference in java.

After looking at the implementation of Reference, we can see how different each of the implementation classes is.

SoftReference

public class SoftReference<T> extends Reference<T> {
    
    static private long clock;
    
    private long timestamp;
   
    public SoftReference(T referent) {
        super(referent);
        this.timestamp = clock;
    }
 
    public SoftReference(T referent, ReferenceQueue<? super T> q) {
        super(referent, q);
        this.timestamp = clock;
    }

    public T get() {
        T o = super.get();
        if (o != null && this.timestamp != clock)
            this.timestamp = clock;
        return o;
    }

}

The implementation of soft reference is very simple. There are two more fields: clock and timestamp. Clock is a static variable that is set to the current time every GC. The timestamp field assigns it to clock every time the get method is called (if it is not equal and the object is not recycled).

What are the functions of these two fields? What does this have to do with soft references being recycled when memory is insufficient?

These also depend on the source code of the JVM, because it is implemented in GC that decides whether an object needs to be reclaimed or not.

size_t
ReferenceProcessor::process_discovered_reflist(
  DiscoveredList               refs_lists[],
  ReferencePolicy*             policy,
  bool                         clear_referent,
  BoolObjectClosure*           is_alive,
  OopClosure*                  keep_alive,
  VoidClosure*                 complete_gc,
  AbstractRefProcTaskExecutor* task_executor)
{
 ...
   //Remember the Discovered List mentioned above? refs_lists is Discovered List.
   //The processing of DiscoveredList is divided into several stages, and the processing of SoftReference is in the first stage.
 ...
      for (uint i = 0; i < _max_num_q; i++) {
        process_phase1(refs_lists[i], policy,
                       is_alive, keep_alive, complete_gc);
      }
 ...
}

//The main purpose of this phase is to remove the corresponding SoftReference from the refs_list when memory is sufficient.
void
ReferenceProcessor::process_phase1(DiscoveredList&    refs_list,
                                   ReferencePolicy*   policy,
                                   BoolObjectClosure* is_alive,
                                   OopClosure*        keep_alive,
                                   VoidClosure*       complete_gc) {
  
  DiscoveredListIterator iter(refs_list, keep_alive, is_alive);
  // Decide which softly reachable refs should be kept alive.
  while (iter.has_next()) {
    iter.load_ptrs(DEBUG_ONLY(!discovery_is_atomic() /* allow_null_referent */));
    //Determine whether the referenced object survives
    bool referent_is_dead = (iter.referent() != NULL) && !iter.is_referent_alive();
    //If the referenced object is not alive, the corresponding Reference Policy is called to determine that the object is to be reclaimed from time to time.
    if (referent_is_dead &&
        !policy->should_clear_reference(iter.obj(), _soft_ref_timestamp_clock)) {
      if (TraceReferenceGC) {
        gclog_or_tty->print_cr("Dropping reference (" INTPTR_FORMAT ": %s"  ") by policy",
                               (void *)iter.obj(), iter.obj()->klass()->internal_name());
      }
      // Remove Reference object from list
      iter.remove();
      // Make the Reference object active again
      iter.make_active();
      // keep the referent around
      iter.make_referent_alive();
      iter.move_to_next();
    } else {
      iter.next();
    }
  }
 ...
}

Refs_lists stores some reference types found by GC (virtual reference, soft reference, weak reference, etc.), and the function of process_discover_reflist method is to remove objects that need not be recycled from refs_lists. The last remaining elements of refs_lists are all elements that need to be recycled, and finally the first element of refs_lists will be recycled. The prime assignment is given to the Reference. java pending field mentioned above.

Reference Policy has four implementations: Never ClearPolicy, Always ClearPolicy, LRUCurrent HeapPolicy, LRUMax HeapPolicy. NeverClearPolicy always returns false, representing never recycling SoftReference, which is not used in JVM, Always ClearPolicy always returns true. In the reference Processor. hpp# setup method, you can set policy as Always ClearPolicy. As to when Always ClearPolicy will be used, you are interested. It can be studied by itself.

The should_clear_reference method of LRUCurrent HeapPolicy and LRUMax HeapPolicy is exactly the same:

bool LRUMaxHeapPolicy::should_clear_reference(oop p,
                                             jlong timestamp_clock) {
  jlong interval = timestamp_clock - java_lang_ref_SoftReference::timestamp(p);
  assert(interval >= 0, "Sanity check");

  // The interval will be zero if the ref was accessed since the last scavenge/gc.
  if(interval <= _max_interval) {
    return false;
  }

  return true;
}

timestamp_clock is the static field clock of SoftReference, and java_lang_ref_SoftReference::timestamp(p) corresponds to the field timestamp. If SoftReference get is called after the last GC, the interval value is 0, otherwise it is the time difference between several GC.

_ max_interval represents a critical value, which differs between LRUCurrent HeapPolicy and LRUMax HeapPolicy.

void LRUCurrentHeapPolicy::setup() {
  _max_interval = (Universe::get_heap_free_at_last_gc() / M) * SoftRefLRUPolicyMSPerMB;
  assert(_max_interval >= 0,"Sanity check");
}

void LRUMaxHeapPolicy::setup() {
  size_t max_heap = MaxHeapSize;
  max_heap -= Universe::get_heap_used_at_last_gc();
  max_heap /= M;

  _max_interval = max_heap * SoftRefLRUPolicyMSPerMB;
  assert(_max_interval >= 0,"Sanity check");
}

SoftRefLRUPolicyMSPerMB defaults to 1000. The former calculation method is related to the available heap size after the last GC, while the latter calculation method is related to the heap size (heap size - heap usage size at the last GC).

Seeing this, you will know when the SoftReference was recycled, the strategy it used (LRUCurrentHeapPolicy by default), the heap available size, and the last time the SoftReference called the get method.

WeakReference

public class WeakReference<T> extends Reference<T> {

    public WeakReference(T referent) {
        super(referent);
    }

    public WeakReference(T referent, ReferenceQueue<? super T> q) {
        super(referent, q);
    }

}

You can see that WeakReference only inherits Reference at the Java layer without any changes. When was the referent field null? To clarify this problem, let's look at the process_discover_reflist method mentioned above:

size_t
ReferenceProcessor::process_discovered_reflist(
  DiscoveredList               refs_lists[],
  ReferencePolicy*             policy,
  bool                         clear_referent,
  BoolObjectClosure*           is_alive,
  OopClosure*                  keep_alive,
  VoidClosure*                 complete_gc,
  AbstractRefProcTaskExecutor* task_executor)
{
 ...

  //Phase 1: Remove all surviving but not yet recoverable soft references from refs_lists (where policy is not null only if refs_lists is a soft reference)
  if (policy != NULL) {
    if (mt_processing) {
      RefProcPhase1Task phase1(*this, refs_lists, policy, true /*marks_oops_alive*/);
      task_executor->execute(phase1);
    } else {
      for (uint i = 0; i < _max_num_q; i++) {
        process_phase1(refs_lists[i], policy,
                       is_alive, keep_alive, complete_gc);
      }
    }
  } else { // policy == NULL
    assert(refs_lists != _discoveredSoftRefs,
           "Policy must be specified for soft references.");
  }

  // Phase 2:
  //Remove all references pointing to objects that are still alive
  if (mt_processing) {
    RefProcPhase2Task phase2(*this, refs_lists, !discovery_is_atomic() /*marks_oops_alive*/);
    task_executor->execute(phase2);
  } else {
    for (uint i = 0; i < _max_num_q; i++) {
      process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
    }
  }

  // Phase 3:
  //Depending on the value of clear_referent, decide whether to reclaim an inactive object
  if (mt_processing) {
    RefProcPhase3Task phase3(*this, refs_lists, clear_referent, true /*marks_oops_alive*/);
    task_executor->execute(phase3);
  } else {
    for (uint i = 0; i < _max_num_q; i++) {
      process_phase3(refs_lists[i], clear_referent,
                     is_alive, keep_alive, complete_gc);
    }
  }

  return total_list_count;
}

void
ReferenceProcessor::process_phase3(DiscoveredList&    refs_list,
                                   bool               clear_referent,
                                   BoolObjectClosure* is_alive,
                                   OopClosure*        keep_alive,
                                   VoidClosure*       complete_gc) {
  ResourceMark rm;
  DiscoveredListIterator iter(refs_list, keep_alive, is_alive);
  while (iter.has_next()) {
    iter.update_discovered();
    iter.load_ptrs(DEBUG_ONLY(false /* allow_null_referent */));
    if (clear_referent) {
      // NULL out referent pointer
      //Set the referent field of Reference to null and it will be reclaimed by GC
      iter.clear_referent();
    } else {
      // keep the referent around
      //The object referenced by the tag is alive and will not be reclaimed in GC this time
      iter.make_referent_alive();
    }
	...
  }
    ...
}

Whether it is a weak reference or other reference type, the operation of null the field reference takes place in process_phase3, and the specific behavior is determined by the value of clear_reference. The value of clear_referent is related to the reference type.

ReferenceProcessorStats ReferenceProcessor::process_discovered_references(
  BoolObjectClosure*           is_alive,
  OopClosure*                  keep_alive,
  VoidClosure*                 complete_gc,
  AbstractRefProcTaskExecutor* task_executor,
  GCTimer*                     gc_timer) {
  NOT_PRODUCT(verify_ok_to_handle_reflists());
	...
  //The third field of the process_discover_reflist method is clear_referent
  // Soft references
  size_t soft_count = 0;
  {
    GCTraceTime tt("SoftReference", trace_time, false, gc_timer);
    soft_count =
      process_discovered_reflist(_discoveredSoftRefs, _current_soft_ref_policy, true,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  update_soft_ref_master_clock();

  // Weak references
  size_t weak_count = 0;
  {
    GCTraceTime tt("WeakReference", trace_time, false, gc_timer);
    weak_count =
      process_discovered_reflist(_discoveredWeakRefs, NULL, true,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  // Final references
  size_t final_count = 0;
  {
    GCTraceTime tt("FinalReference", trace_time, false, gc_timer);
    final_count =
      process_discovered_reflist(_discoveredFinalRefs, NULL, false,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }

  // Phantom references
  size_t phantom_count = 0;
  {
    GCTraceTime tt("PhantomReference", trace_time, false, gc_timer);
    phantom_count =
      process_discovered_reflist(_discoveredPhantomRefs, NULL, false,
                                 is_alive, keep_alive, complete_gc, task_executor);
  }
	...
}

As you can see, for both Soft references and Weak references, clear_referent fields are passed in true, which is also in line with our expectation: when the object is unreachable, the reference field is set to null, and then the object is reclaimed (for Soft references, if there is enough memory, in Phase 1, the relevant reference will be Remove from refs_list, and refs_list is an empty set at Phase 3.

But for Final references and Phantom references, the clear_referent field passes in false, which means that objects referenced by these two reference types will not be recycled as long as the Reference object survives without additional processing. Final references and whether the object overrides the finalize method are not within the scope of this article's analysis. Let's look at Phantom references.

PhantomReference

public class PhantomReference<T> extends Reference<T> {
 
    public T get() {
        return null;
    }
 
    public PhantomReference(T referent, ReferenceQueue<? super T> q) {
        super(referent, q);
    }

}

You can see that the virtual reference get method always returns null. Let's look at demo.

public static void demo() throws InterruptedException {
        Object obj = new Object();
        ReferenceQueue<Object> refQueue =new ReferenceQueue<>();
        PhantomReference<Object> phanRef =new PhantomReference<>(obj, refQueue);

        Object objg = phanRef.get();
        //Here's null.
        System.out.println(objg);
        //Make obj garbage
        obj=null;
        System.gc();
        Thread.sleep(3000);
		//gc will then add phanRef to refQueue
        Reference<? extends Object> phanRefP = refQueue.remove();
     	//Here output true
        System.out.println(phanRefP==phanRef);
    }

As you can see from the above code, virtual references can get a'notification'when pointing to objects that are not reachable (in fact, all classes inheriting References have this function). It should be noted that after GC is completed, phanRef.referent still points to creating Objects before, that is to say, Object objects have not been recycled!

The reason for this is stated at the end of the previous section: for Final references and Phantom references, when the clear_reference field is passed in false, it means that objects referenced by these two reference types will not be recycled in GC without additional processing.

For virtual references, from refQueue.remove(); after the reference object is obtained, the clear method can be called to forcibly de-refer the relationship between the reference and the object so that the object can be reclaimed the next time it can be GC.

End

In view of the several questions raised at the beginning of the article, after reading and analyzing, we can already give the answers:

1. We often see the introduction of soft references on the internet: when there is insufficient memory, it will be recycled. How is the definition of insufficient memory? Why is it called insufficient memory?

Soft references are reclaimed when there is insufficient memory. The definition of insufficient memory is related to the time of the reference object get and the size of the available memory in the current heap. The calculation formula has also been given above.

2. The introduction of virtual reference on the Internet is that virtual reference does not determine the life cycle of the object. It is mainly used to track the activities of objects being recycled by garbage collectors. Is that true?

Strictly speaking, virtual references can affect the life cycle of an object. If nothing is done, as long as the virtual references are not recycled, the referenced objects will never be recycled. So in general, if the PhantomReference object is not recycled (such as referenced by other objects accessible by GC ROOT) after the PhantomReference object is obtained from the Reference Queue, the clear method needs to be invoked to remove the reference relationship between the PhantomReference and its reference object.

3. What scenarios are virtual references used in Jdk?

DirectByteBuffer is a virtual reference subclass Cleaner.java to achieve out-of-heap memory recovery, and a subsequent article will be written about out-of-heap memory inside and outside.


Posted by ca87 on Fri, 09 Aug 2019 01:11:07 -0700