Why to override the hashCode method when overriding the equals method

Keywords: Java JDK Programming

I use jdk version 1.8. hashCode is a local method in the Object class, which means that all objects inherit its hashCode method. Without much nonsense, let's first look at the introduction of hashCode method in the Object class:

    /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
    public native int hashCode();

At the beginning of this paper, an important reason for the existence of this method is stated: in order to support some beneficial features of hash table.

As a matter of fact, one of the most important functions of hashCode method is to support all hash based collections. The reason for this is that in a collection like HashMap, the access of put and get operations to data is based on the key hashCode. It's useless to say more. Let's look at the source code of put and get methods in HashMap. To learn more about HashMap: Implementation of jdk1.8 in the bottom layer of Hashmap.

put method

Here is part of the source code for the put method and the hash method. It can be clearly seen that when putting (key, val), the first step is to determine the position of the hash table to be inserted according to (n-1) & hash(key), which is actually determined by hash(key). The return value of the hash method is determined by key.hashCode(), so key.hashCode() determines where val is inserted.

In the source code, we further see that after determining the location where val is to be inserted, if there is already an element in the inserted location, then we need to determine whether the element to be inserted and the element in the inserted location are the same element, because HashMap does not allow the key to repeat, so how to determine? It is not judged by val, but by the hashCode method and equals method of key. Obj 1 = = obj 2 means they are the same object, so obj 1.equals (obj 2) must return true. Please refer to: ==Difference with equals.

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
        // Here hash is the return value of hash(key)
        /*Through (n - 1) & hash, the hash position of the key to insert data in the table is calculated, and P
          Point to the original Node object at that location in table. If P is null, just create a Node and put it there*/
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            /*According to the hash value and equals() of the key, judge whether p is the same as the key of the inserted data*/
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;        //The final code will make the insert data overwrite p
            else {...}

        }

So it can be imagined that if you modify the equals() of an object (such as User), the following code will insert two pieces of data into the hash table, because they are inserted in different positions.

Map map = new HashMap();
map.put(new User("Zhang San"));
map.put(new User("Zhang San"));

get method

The following is part of the source code of the get method of hash map. For map.get (key), first determine the location on tables according to (n-1) & hash (key), and then judge by traversing the conflict list according to the hashCode method and equals method of key, which will not be covered here. Obviously, for the above test code, you can't get the value through map.get(new User("Zhang San").

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
 
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        //If the table length is greater than 0 and the Node of key in the hash position of table is not empty, execute the method
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            //If the first key is the same as the parameter key, return first
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;

            ......

        }
        return null;
    }

How to override the hashCode method

A lot of times we may write to save time:

@Override
public int hashCode(){
    return 520;
}

However, it is not recommended in Effective Java because it makes every object of this class have the same hash code. According to the above explanation, HashMap is based on key.hashCode() to determine the location where val is finally stored in the table, which will lead to that when storing the key value pairs with the object as the key, they will all map to the same hash bucket, and the hash table will degenerate into a linked list. Yes, the program that this time limit is running turns into running in square time. For a large-scale hash table, it is related to whether the hash table can work normally.

A strategy for rewriting hashCode() is also mentioned in the book:

Generate a variable result of type int and initialize a value, such as 17

For each important field in the class, that is, the field that affects the value of the object, that is, the field with comparison in the equals method, do the following operations: a. calculate the value of this field filedHashValue = filed.hashCode(); b. execute result = 31 * result + filedHashValue;

hashCode general agreement

Several general conventions are introduced in the comments of the hashCode method at the top. We should abide by these conventions in our normal development. If we violate these conventions, the corresponding classes may not work together with all hash based collections.

  • During the execution of the application program, as long as the information used for the comparison operation of the equals method of the object is not modified, the hashCode method must always return the same value for multiple calls to the same object.

  • If two objects are equal according to the equals method comparison, then calling the hashCode method in both objects must produce the same integer result.

  • If two objects are not equal according to the equals method comparison, the hashCode method in the two objects of the caller is not necessarily required to produce different results. But it is possible to improve the performance of hash table by generating different integer hash values for different objects.

Published 41 original articles, won praise 19, visited 20000+
Private letter follow

Posted by vexx on Mon, 13 Jan 2020 02:06:07 -0800