Java source code - core source code analysis of TreeMap and LinkedHashMap

No, let's start!

Guiding language

After getting familiar with HashMap, let's take a look at TreeMap and LinkedHashMap in this section to see how TreeMap is sorted by key and how LinkedHashMap is accessed by two strategies.

1. Knowledge reserve

Before understanding TreeMap, let's take a look at the two sorting methods in daily work as the basic reserve for our study. The codes of the two methods are as follows:

public class TreeMapDemo {
 
  @Data
  // DTO sorts the objects for us
  class DTO implements Comparable<DTO> {
    private Integer id;
    public DTO(Integer id) {
      this.id = id;
    }
 
    @Override
    public int compareTo(DTO o) {
      //Default sort from small to large
      return id - o.getId();
    }
  }
 
  @Test
  public void testTwoComparable() {
    // The first sort, sort from small to large, implements the compareTo method of Comparable to sort
    List<DTO> list = new ArrayList<>();
    for (int i = 5; i > 0; i--) {
      list.add(new DTO(i));
    }
    Collections.sort(list);
    log.info(JSON.toJSONString(list));
 
    // The second sort, from large to small, uses the external sorter Comparator to sort
    Comparator comparator = (Comparator<DTO>) (o1, o2) -> o2.getId() - o1.getId();
    List<DTO> list2 = new ArrayList<>();
    for (int i = 5; i > 0; i--) {
      list2.add(new DTO(i));
    }
    Collections.sort(list,comparator);
    log.info(JSON.toJSONString(list2));
  }
}

The results of the first sort output are from small to large, and the results are: [{id: 1}, {id: 2}, {id: 3}, {id: 4}, {id: 5}];

The result of the second output is just the opposite. The result is: [{id: 5}, {id: 4}, {id: 3}, {id: 2}, {id: 1}].

The above two methods are sorting through Comparable and Comparator respectively, and TreeMap uses this principle to sort key s. Let's take a look at it together.

2. Overall TreeMap architecture

The underlying data structure of TreeMap is the red black tree, which is the same as the red black tree structure of HashMap.

The difference is that TreeMap takes advantage of the property that the left node of the red black tree is small and the right node is large. It sorts according to the key, so that each element can be inserted into the appropriate size of the red black tree, and maintains the size relationship of the key. It is suitable for scenarios where keys need to be sorted.

Because the underlying layer uses the balanced red black tree structure, the time complexity of containsKey, get, put, remove and other methods is log(n).

2.1. Properties

Common TreeMap attributes are:

//Comparator. If the comparator is externally transmitted, first use the external comparator
//If the external comparator is empty, the Comparable#compareTo method implemented by key is used
//The comparison method is consistent with the comparison demo in the above daily work
private final Comparator<? super K> comparator;
 
//Root node of red black tree
private transient Entry<K,V> root;
 
//Existing element size of red black tree
private transient int size = 0;
 
//The version number of the tree structure change, which is used for the rapid failure scenario in the iteration process
private transient int modCount = 0;
 
//Red black tree node
static final class Entry<K,V> implements Map.Entry<K,V> {}

2.2. Add node

Let's take a look at the steps of adding nodes to TreeMap:

Judge whether the red black tree node is empty. If it is empty, the new node will be directly used as the root node. The code is as follows:

Entry<K,V> t = root;
//The root node of the red black tree is empty. Create a new one directly
if (t == null) {
    // The compare method restricts the key from being null
    compare(key, key); // type (and possibly null) check
    // Become root node
    root = new Entry<>(key, value, null);
    size = 1;
    modCount++;
    return null;
}

According to the characteristics of the red black tree, which is small on the left and large on the right, judge and find the parent node that should be added. The code is as follows:

Comparator<? super K> cpr = comparator;
if (cpr != null) {
    //Spin to find the location where the key should be added, that is, it should be mounted on the head of that node
    do {
        //At the end of a cycle, the parent is the object compared last time
        parent = t;
        // Compare the size of key s through compare
        cmp = cpr.compare(key, t.key);
        //If the key is less than t, assign the value on the left of t to t, because the value on the left of the red black tree is relatively small, and the cycle is smaller
        if (cmp < 0)
            t = t.left;
        //If the key is greater than t, assign the value on the right side of t to t, because the value on the right side of the red black tree is relatively large, and the cycle ratio is
        else if (cmp > 0)
            t = t.right;
        //If equal, directly overwrite the original value
        else
            return t.setValue(value);
        // t is empty, indicating that the leaf node has been reached
    } while (t != null);
}

Insert a new node on the left or right of the parent node. The code is as follows:

//cmp represents the size of the last comparison, less than 0, and e is on the left of the previous node
if (cmp < 0)
    parent.left = e;
//cmp represents the size of the last comparison, greater than 0, and e is on the right of the previous node. If it is equal, the second step has been processed.
else
    parent.right = e;

From the source code, we can see:

When adding a new node, the red black tree is small on the left and large on the right. Keep looking down from the root node until the node is found to be null. If the node is null, it indicates that it has reached the leaf node;
During the search process, it is found that the key value already exists and is directly overwritten;
TreeMap prohibits the key from being null.

Similarly, TreeMap search has a similar principle. Interested students can go to github to view the source code.

2.3 summary

TreeMap is relatively simple. The red black tree is similar to HashMap. The key is to compare the size of keys through compare, and then use the small left and large right characteristics of red black tree to find their own location for each key, so as to maintain the size sorting order of keys.

3. LinkedHashMap overall architecture

HashMap is out of order. TreeMap can be sorted by key. Can a tree Map maintain the insertion order? Next, let's take a look at LinkedHashMap.

LinkedHashMap itself inherits from HashMap, so it has all the features of HashMap. On this basis, it also provides two features:

Access according to the insertion order;
The function of "least access and first deletion" is realized. The purpose is to automatically delete the key s that have not been accessed for a long time.

Next, let's look at the above two features.

3.1. Access in insertion order

3.1.1 linked HashMap linked list structure

Let's take a look at the new attributes of LinkedHashMap to achieve the of linked list structure:

// Chain header
transient LinkedHashMap.Entry<K,V> head;
 
// List tail
transient LinkedHashMap.Entry<K,V> tail;
 
// Inherit Node and add before and after attributes for each element of the array
static class Entry<K,V> extends HashMap.Node<K,V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next);
    }
}
 
// Fields that control the two access modes. The default is false
// true according to the access order, frequently accessed key s will be placed at the end of the queue
// false provides access in the order of insertion
final boolean accessOrder;

It can be seen from the new attributes of the above Map that the data structure of LinkedHashMap is like replacing each element of LinkedList with the Node of HashMap, like a combination of the two. It is precisely because these structures are added that the elements of Map can be connected in series to form a linked list, and the linked list can ensure the order, You can maintain the order in which elements are inserted.

3.1.2. How to add in order

When LinkedHashMap initializes, the default accessOrder is false, that is, it will provide access in the insertion order. The insertion method uses the put method of the parent class HashMap, but overrides the newNode/newTreeNode and afterNodeAccess methods invoked in the put method execution.

The newNode/newTreeNode method controls the addition of new nodes to the tail of the linked list, so that each new node is added to the tail to ensure the insertion order. Let's take the newNode source code as an example:

// Add a new node and append it to the end of the linked list
Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
    // New node
    LinkedHashMap.Entry<K,V> p =
        new LinkedHashMap.Entry<K,V>(hash, key, value, e);
    // Append to the end of the linked list
    linkNodeLast(p);
    return p;
}
// link at the end of list
private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
    LinkedHashMap.Entry<K,V> last = tail;
    // New node equals bit node
    tail = p;
    // last is empty, indicating that the linked list is empty and the head and tail nodes are equal
    if (last == null)
        head = p;
    // If the linked list has data, you can directly establish the relationship between the new node and the last tail node
    else {
        p.before = last;
        last.after = p;
    }
}

LinkedHashMap adds before and after attributes to each node by adding a head node and a tail node. Each time it adds a new node, it adds the node to the tail node. When it adds a new node, it has maintained the linked list structure according to the insertion order.

3.1.3. Access in sequence

LinkedHashMap only provides one-way access, that is, it is accessed from beginning to end in the order of insertion. It cannot be accessed in both directions like LinkedList.

We mainly access through the iterator. When the iterator is initialized, it defaults to accessing from the first node. In the process of iteration, we can continuously access the after node of the current node.

Map provides iterative methods for key, value and entity (nodes). Assuming that we need to iterate entity, we can use it LinkedHashMap.entrySet().iterator() This method directly returns the LinkedHashIterator, which is an iterator. We can call the nextNode method of the iterator to get the next node. The source code of the iterator is as follows:

// During initialization, the default is to access from the beginning node
LinkedHashIterator() {
    // The head node acts as the first accessed node
    next = head;
    expectedModCount = modCount;
    current = null;
}
 
final LinkedHashMap.Entry<K,V> nextNode() {
    LinkedHashMap.Entry<K,V> e = next;
    if (modCount != expectedModCount)// check
        throw new ConcurrentModificationException();
    if (e == null)
        throw new NoSuchElementException();
    current = e;
    next = e.after; // Find the node of the next iteration through the after structure of the linked list
    return e;
}

When adding new nodes, we have maintained the insertion order between elements, so iterative access is very simple. We only need to continuously access the next node of the current node.

3.2. Minimum access deletion policy

3.2.1,demo

This strategy is also called LRU (Least recently used). It roughly means that frequently accessed elements will be added to the tail of the team, so that infrequently accessed data will naturally be close to the head of the team. Then we can set the deletion strategy. For example, when the number of Map elements is greater than, delete the head node. We write a demo for everyone to understand. The demo is as follows. The complete code can be viewed on github:

public void testAccessOrder() {
  // New LinkedHashMap
  LinkedHashMap<Integer, Integer> map = new LinkedHashMap<Integer, Integer>(4,0.75f,true) {
    {
      put(10, 10);
      put(9, 9);
      put(20, 20);
      put(1, 1);
    }
 
    @Override
    // Override the method of deleting the policy. We set that when the number of nodes is greater than 3, the header node will be deleted
    protected boolean removeEldestEntry(Map.Entry<Integer, Integer> eldest) {
      return size() > 3;
    }
  };
 
  log.info("initialization:{}",JSON.toJSONString(map));
  Assert.assertNotNull(map.get(9));
  log.info("map.get(9): {}",JSON.toJSONString(map));
  Assert.assertNotNull(map.get(20));
  log.info("map.get(20): {}",JSON.toJSONString(map));
 
}

The printed results are as follows:

initialization:{9:9,20:20,1:1}
map.get(9): {20:20,1:1,9:9}
map.get(20): {1:1,9:9,20:20}

It can be seen that when the map is initialized, we put in four elements, but only three elements, 10, are missing. This is mainly because we override the removeEldestEntry method. We implement that if the number of elements in the map is greater than 3, we delete the elements of the queue head. When put (1,1) is executed, we just delete the 10 of the queue head, This indicates that the header node will be automatically deleted when the deletion policy we set is reached.

When we call the map.get(9) method, element 9 moves to the end of the queue. When we call the map.get(20) method, element 20 is moved to the end of the queue, which reflects that frequently accessed nodes will be moved to the end of the queue.

This example well illustrates the least access deletion strategy. Next, let's look at the principle.

3.2.2. Elements are transferred to the end of the team

Let's first look at why elements are moved to the end of the queue when get ting:

public V get(Object key) {
    Node<K,V> e;
    // Call the HashMap get method
    if ((e = getNode(hash(key), key)) == null)
        return null;
    // If LRU policy is set
    if (accessOrder)
    // This method moves the current key to the end of the queue
        afterNodeAccess(e);
    return e.value;
}

From the above source code, we can see that the current access node is moved to the end of the queue through the afterNodeAccess method. In fact, it is not just the get method. It is also done when the getOrDefault, compute, computeIfAbsent, computeIfPresent and merge methods are executed. By constantly moving the frequently accessed nodes to the end of the queue, the nodes close to the head of the queue, Nature is an element that is rarely accessed.

3.2.3. Deletion strategy

In the above demo, when we execute the put method, we find that the team head element is deleted. LinkedHashMap itself is not implemented by the put method. It calls the put method of HashMap, but LinkedHashMap implements the call afterNodeInsertion method in the put method. This method implements deletion. Let's look at the source code:

// Delete elements that are rarely accessed and are called by the put method of HashMap
void afterNodeInsertion(boolean evict) { 
    // Get element header node
    LinkedHashMap.Entry<K,V> first;
    // removeEldestEntry to control the deletion policy. If the queue is not empty and the deletion policy allows deletion, the header node is deleted
    if (evict && (first = head) != null && removeEldestEntry(first)) {
        K key = first.key;
        // removeNode deletes the header node
        removeNode(hash(key), key, null, false, true);
    }
}

3.3 summary

LinkedHashMap provides two interesting functions: the strategy of accessing and deleting the least accessed elements according to the insertion order. It is simply realized through the structure of the linked list, and the design is very ingenious.

summary

This section mainly discusses the data structure of TreeMap and LinkedHashMap, and analyzes the core content source code of both. We find that both make full use of the characteristics of the underlying data structure. TreeMap uses the characteristics of red and black trees with small left and large right to sort. LinkedHashMap simply adds a linked list structure on the basis of HashMap to form the order of nodes, which is very clever, It's very interesting. You can think more about the design ideas in the process of looking at the source code. Maybe you will have different feelings.

No wordy, the end of the article, it is recommended to connect three times!

Posted by spxmgb on Fri, 17 Sep 2021 10:58:37 -0700

Programmer Group