Performance optimization: Trove collection library

Keywords: JDK Java less Maven

First meeting Trove

Yesterday in Startup News See an article above: Optimizing Tips Sharing: Reduce Memory Consumption to 1/20 . In this article, I mentioned a case about how to reduce memory consumption in Java applications, and summarized his optimization process.

  • Start by storing 1.3M Person objects, consuming 1.5GB of heap space
  • Modified to java.util.HashMap

2 Use Trove

  • If you use Maven, you can use the following configuration
<dependency>
    <groupId>net.sf.trove4j</groupId>
    <artifactId>trove4j</artifactId>
    <version>3.0.3</version>
</dependency>
  • Common methods are the same as JDK collection classes, which are easy to migrate.
TIntObjectMap<String> ints = new TIntObjectHashMap<String>();
ints.put(100, "John");
ints.put(101, "Tom");
System.out.println(ints.get(100));

Trove is equivalent to processing all JDK collection classes for native types, such as int. Common classes are
TIntList, TIntObjectMap, TObjectIntMap, TIntSet, you can imagine that the maintenance of Trove is a lot of work.

Trove also provides Map,Set,LinkedList implementations of open addressing methods for reference. Enhance Collection Performance with this Treasure Trove The method is similar to:

public class CollectionFactory {
    static boolean useTrove = true;

    /**
     *  Return a hashmap based on the properties
     */
    public static Map getHashMap() {
        if ( useTrove ) return new THashMap();
        else            return new HashMap();
    }

    /**
     *  Return a hashset based on the properties
     */
    public static Set getHashSet() {
        if ( useTrove ) return new THashSet();
        else            return new HashSet();
    }

    /**
     *  Return a linkedlist based on the properties
     */
    public static List getLinkedList() {
        if ( useTrove ) return new TLinkedList();
        else            return new LinkedList();
    }
}
  • Elements in Iterative Sets

Trove does not recommend JDK's entryXX approach, but uses forEach's callback approach.
The code looks better, and there are memory advantages, because using entryXX, you need to create a new array.

TIntObjectMap<String> ints = new TIntObjectHashMap<String>();
ints.put(100, "John");
ints.put(101, "Tom");
ints.forEachEntry(new TIntObjectProcedure<String>() {
    public boolean execute(int a, String b) {
        System.out.println("key: " + a + ", val: " + b);
        return true;
    }
});
ints.forEachKey(new TIntProcedure() {
    public boolean execute(int value) {
        System.out.println("key: " + value);
        return true;
    }
});
ints.forEachValue(new TObjectProcedure<String>() {
    public boolean execute(String object) {
        System.out.println("val: " + object);
        return true;
    }
});
  • Custom Hash Policy

We know that in JDK collection classes, sometimes it is impossible to customize Hash policies, such as String.
Trove, however, provides the ability to customize Hash policies so that you can optimize them based on data characteristics.

public static void main(String[] args) {
    char[] foo = new char[]{'a', 'b', 'c'};
    char[] bar = new char[]{'a', 'b', 'c'};
    TCustomHashMap<char[], String> ch = new TCustomHashMap<char[], String>(new CharArrayStrategy());
    ch.put(foo, "John");
    ch.put(bar, "Tom");
}

class CharArrayStrategy implements HashingStrategy<char[]> {
    public int computeHashCode(char[] c) {
        // use the shift-add-xor class of string hashing functions
        // cf. Ramakrishna and Zobel, "Performance in Practice
        // of String Hashing Functions"
        int h = 31; // seed chosen at random
        for (int i = 0; i < c.length; i++) { // could skip invariants
            h = h ^ ((h << 5) + (h >> 2) + c[i]); // L=5, R=2 works well for
                                                  // ASCII input
        }
        return h;
    }

    public boolean equals(char[] c1, char[] c2) {
        if (c1.length != c2.length) { // could drop this check for fixed-length
                                      // keys
            return false;
        }
        for (int i = 0, len = c1.length; i < len; i++) { // could skip
                                                         // invariants
            if (c1[i] != c2[i]) {
                return false;
            }
        }
        return true;
    }
}

3 Trove Insider

Trove is designed to reduce memory consumption while maintaining performance. Let's briefly describe the implementation of Trove.
Here is another article for reference: Performance observation: Trove collection class

  • Use native type instead of packaging type directly

JDK5's automatic enclosure mechanism allows us to temporarily ignore the differences between native and package types. The automatic box-closing mechanism is only a kind of grammatical sugar, but in fact it does not improve the efficiency.
Direct use of native type instead of packaging type can obviously occupy less memory and run more efficiently. For basic types of set combinations, Trove provides
Equivalent set classes.

  • Use open addressing rather than chain addressing

Most of JDK collection classes are implemented by chain address method. It needs an address list and links between elements. Trove uses open addressing method.
Although there is a need to maintain enough free space (load factor is less than 0.5), because there is no need for linked list nodes, the overall memory footprint is less and performance is faster.

  • HashSet is no longer implemented through built-in HashMap

JDK's HashSet is implemented through a built-in HashSet, so it wastes the space of value in vain.
The THashSet provided by Trove and other basic types of HashSets are no longer in this way, using open address storage directly.

  • Array with prime length size

In order to avoid hash collision to the greatest extent, in addition to maintaining a smaller loading factor, an array of prime length is also used. See gnu.trove.impl.PrimeFinder for details

  • Maintenance using code generation

Although this has nothing to do with performance. But we also know that to maintain so many primitive type set classes, there are too many repetitive logic but they can not be reused, which is a very tangled matter.
Trove uses code templates to generate a large number of classes, which can greatly reduce the maintenance workload.

4 Summary

As a general collection class, JDK will be preferred in most cases. However, in some performance-sensitive areas, or Trove can provide better options.
As a reliable java developer, Trove should be stored in your toolbox like apache commons and google guava.

Posted by nemxu on Tue, 26 Mar 2019 03:39:29 -0700