Principle and application of thread local storage ThreadLocal

Keywords: Java

summary

Usually, the variables we create can be accessed and modified by any thread. If you want to realize that each thread has its own unique local variables, how to solve it? The ThreadLocal class provided in JDK is to solve this problem** ThreadLocal class is mainly used to bind each thread to its own value. ThreadLocal class can be likened to a box for storing data, in which private data of each thread can be stored** Each thread has its own data, which avoids data sharing and ensures thread safety.

How to use it?

We all know that SimpleDateFormat is thread unsafe, but what should we do if we want to use it in concurrent scenarios?

The simplest way is to assign a SimpleDateFormat to each thread through ThreadLocal, which fundamentally solves the problem of sharing and ensures the thread safety of SimpleDateFormat in concurrent scenarios. Specifically, we can see the following code:

public class ThreadLocalExample implements Runnable{

    // SimpleDateFormat is not thread safe, so each thread should have its own independent copy
    private static final ThreadLocal<SimpleDateFormat> formatter = ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyyMMdd HHmm"));

    public static void main(String[] args) throws InterruptedException {
        ThreadLocalExample obj = new ThreadLocalExample();
        for(int i=0 ; i<10; i++){
            Thread t = new Thread(obj, ""+i);
            Thread.sleep(new Random().nextInt(1000));
            t.start();
        }
    }

    @Override
    public void run() {
        System.out.println("Thread Name= "+Thread.currentThread().getName()+" default Formatter = "+formatter.get().toPattern());
        try {
            Thread.sleep(new Random().nextInt(1000));
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        //formatter pattern is changed here by thread, but it won't reflect to other threads
        formatter.set(new SimpleDateFormat());

        System.out.println("Thread Name= "+Thread.currentThread().getName()+" formatter = "+formatter.get().toPattern());
    }

}

In this code, we create a ThreadLocal object named formatter before the thread starts, and the SimpleDateFormat object is placed for each thread during initialization. During specific use, it is found that although the subsequent thread modifies the SimpleDateFormat() object of the current thread during execution, it does not affect the value of the SimpleDateFormat object of other threads. The specific results are as follows:

Thread Name= 0 default Formatter = yyyyMMdd HHmm
Thread Name= 0 formatter = y/M/d ah:mm
Thread Name= 1 default Formatter = yyyyMMdd HHmm
Thread Name= 1 formatter = y/M/d ah:mm
Thread Name= 2 default Formatter = yyyyMMdd HHmm
Thread Name= 3 default Formatter = yyyyMMdd HHmm
Thread Name= 2 formatter = y/M/d ah:mm
Thread Name= 4 default Formatter = yyyyMMdd HHmm
Thread Name= 4 formatter = y/M/d ah:mm
Thread Name= 5 default Formatter = yyyyMMdd HHmm
Thread Name= 3 formatter = y/M/d ah:mm
Thread Name= 6 default Formatter = yyyyMMdd HHmm
Thread Name= 5 formatter = y/M/d ah:mm
Thread Name= 6 formatter = y/M/d ah:mm
Thread Name= 7 default Formatter = yyyyMMdd HHmm
Thread Name= 7 formatter = y/M/d ah:mm
Thread Name= 8 default Formatter = yyyyMMdd HHmm
Thread Name= 8 formatter = y/M/d ah:mm
Thread Name= 9 default Formatter = yyyyMMdd HHmm
Thread Name= 9 formatter = y/M/d ah:mm

How?

From the example code, it seems that ThreadLocal object is easy to use. After initialization, get() is used directly and put() is executed when modification is needed. But if we design this ThreadLocal class, how do we implement it?

Smart partners may easily think of the following implementation ideas:

Let ThreadLocal hold an "hcconcurrenthashmap" object internally, then key is the thread object, and value is the value stored in ThreadLocal. The schematic diagram is as follows:

class MyThreadLocal<T> {
  Map<Thread, T> locals = 
    new ConcurrentHashMap<>();
  //Get thread variable  
  T get() {
    return locals.get(
      Thread.currentThread());
  }
  //Setting thread variables
  void set(T t) {
    locals.put(
      Thread.currentThread(), t);
  }
}

Talking about a dozen lines of code is to realize the ThreadLocal we want, but is this really a design?

Let's take a look at how ThreadLocal is actually designed in JDK:

First, click the source code to see how the set() and get() methods of ThreadLoca class are implemented:

public void set(T value) {
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    if (map != null)
        map.set(this, value);
    else
        createMap(t, value);
}
ThreadLocalMap getMap(Thread t) {
    return t.threadLocals;
}

It can be seen from the source code that when ThreadLocal puts an object, it does not directly put it into its own Map object, but takes a Map from the Thread.

Let's look at the source code of Thread and explore what's going on with this ThreadLocalMap:

public class Thread implements Runnable {
    //......
    //The ThreadLocal value associated with this thread. Maintained by ThreadLocal class
    ThreadLocal.ThreadLocalMap threadLocals = null;

    //The InheritableThreadLocal value associated with this thread. Maintained by the InheritableThreadLocal class
    ThreadLocal.ThreadLocalMap inheritableThreadLocals = null;
    //......
}

From the source code, we can see that ThreadLocalMap is a subclass of ThreadLocal, but the holder is a Thread object, and the Thread class holds two ThreadLocalMap objects. The first threadLocals stores the ThreadLocal value related to the Thread, and the other is inheritableThreadLocals, which is used to store the inherited ThreadLocal value.

We can simply sort out the relationship between Thread and ThreadLocal:

In other words, Java also has a Map in its implementation, called ThreadLocalMap, but the ThreadLocalMap is not held by ThreadLocal, but by Thread. The Thread class has a private attribute threadLocals. Its type is ThreadLocalMap. When stored, key is ThreadLocal Object and value is Object object (such as simpdataformat Object in the example).

When two ThreadLocal objects are declared in the same Thread, the value value is actually stored in the ThreadLocalMap held by the Thread, but the key is different. The key is the corresponding ThreadLocal object. The specific diagram is as follows:

At this point, we may have a question. Why should Thread hold the ThreadLocalMap object? Why doesn't ThreadLocal hold it?

There are two main reasons to think about:

Firstly, from the perspective of data affinity, the data stored in ThreadLocal is highly related to the Thread, so it is more appropriate to store the data inside the Thread. On the other hand, from the perspective of garbage collection, if a Thread is recycled, the objects stored in ThreadLocal should be recycled together. However, if we let ThreadLocal hold data directly, as long as the ThreadLocal object exists, the Thread object in the Map will never be recycled. The life cycle of ThreadLocal is often longer than that of threads, so this design scheme can easily lead to memory leakage.

In the ThreadLocal implemented by JDK, the Thread holds the ThreadLocalMap, and the key used in the ThreadLocalMap is ThreadLocal, which is a weak reference. Therefore, only the Thread object is recycled, then the ThreadLocalMap it holds can be recycled.

**But in practice, there is still a risk of memory leakage. Why** ❓

Because the Thread in the Thread pool lives too long, it often lives and dies with the program, which means that the ThreadLocalMap held by the Thread will not be recycled. In addition, the Entry in the ThreadLocalMap has a weak reference to ThreadLocal ("weak WeakReference"), so it can be recycled as long as the ThreadLocal ends its life cycle. However, the Value in the Entry is strongly referenced by the Entry, so even if the life cycle of the Value ends, the Value cannot be recycled, resulting in memory leakage. In other words, the following situations will occur:

How to solve this problem?

It is easy to think that since the JVM cannot be released, we can remove the object from ThreadLocal after it is used. For example, we can use the try{}finally {} scheme:

ExecutorService es;
ThreadLocal tl;
es.execute(()->{
  //ThreadLocal add variable
  tl.set(obj);
  try {
    // Omit business logic code
  }finally {
    //Manually clean up ThreadLocal 
    tl.remove();
  }
});

summary

ThreadLocal solves the problem that data sharing may lead to concurrency through the idea of storage. When in use, the data stored in ThreadLocal is highly bound to threads, and each Thread has independent resources. In the specific implementation, the ThreadLocal internal class object ThreadLocalMap object is held through the Thread, and the key of the ThreadLocalMap is ThreadLocal object and is a weak reference. Therefore, when the Thread is recycled, the ThreadLocalMap will be recycled together. However, there is also a problem. If the Thread survives too long, the ThreadLocal object corresponding to the weak reference key can be recycled, but its corresponding Value cannot be recycled because it is a strong reference, which may cause memory leakage. The solution to this problem is also very simple, that is, after using the object, you can manually remove the object from ThreadLocal.

reference resources

  1. Thread local storage mode: no harm without sharing
  2. Summary of the latest Java Concurrent advanced common interview questions in 2020
  3. <ThreadLocal keyword resolution

Posted by arun_desh on Thu, 02 Sep 2021 13:57:07 -0700