JVM interview question series: two objects have the same value (x.equals(y) == true), but may hashcodes be different?

Interviewer's investigation point

This question is still to investigate the basic knowledge at the JVM level. The interviewer believes that only with solid basic skills can we write code with high robustness and stability.

Technical knowledge involved

(x.equals(y)==true). This code looks very simple, but it actually involves some underlying knowledge points. First, we explore it based on the method of equals.

The equals method exists in every object. Take the String type as an example, and its method definition is as follows

public boolean equals(Object anObject) {
  if (this == anObject) { 
    return true;
  }
  if (anObject instanceof String) { //Judge whether the object instance is a String
    String anotherString = (String)anObject; //Strong conversion to string type
    int n = value.length;
    if (n == anotherString.value.length) { //If two strings are equal, they are naturally equal in length.
      //Traverse the two compared strings, convert them to char type, and compare them one by one.
      char v1[] = value;  
      char v2[] = anotherString.value;
      int i = 0;
      while (n-- != 0) {
        if (v1[i] != v2[i]) //Use ` = = ` to judge. If it is different, it returns false
          return false;
        i++;
      }
      return true; //Otherwise, return true.
    }
  }
  return false;
}

First, analyze the first code to determine whether the object passed in is equal to the current object instance this. If it is equal, it returns true.

  if (this == anObject) { 
    return true;
  }

How is the processing logic of = = implemented?

Understanding = = judgment

In the java language, the = = operation symbol, as we all know, is based on the comparison of reference objects. In fact, there are some other differences.

The JVM will generate different instructions at compile time according to the different operation types compared between = = and =.

For boolean, if is generated for integer operands such as byte, short, int and long_ Icmpne instruction, which is used to compare whether the integer values are equal. About if_ For the icmpne instruction, see Chapter 4. The class File Format. Its specific implementation in the bytecodeInterpreter source code in the Hotspot VM is as follows

#define COMPARISON_OP(name, comparison)                                      
CASE(_if_icmp##name): {                                                
  int skip = (STACK_INT(-2) comparison STACK_INT(-1))                
    ? (int16_t)Bytes::get_Java_u2(pc + 1) : 3;             
  address branch_pc = pc;                                            
  UPDATE_PC_AND_TOS(skip, -2);                                       
  DO_BACKEDGE_CHECKS(skip, branch_pc);                               
  CONTINUE;                                                          
}

You can see that the essence is to compare two INT values with offsets of - 1 and - 2 in the operand stack according to the comparison expression.

If the operand is an object, the compiler generates an if_acmpne instruction, and if_ Compared with icmpne, i(int) is changed to a(object reference). The expression of this instruction in the JVM specification: Chapter 4. The class File Format. For its corresponding implementation in the Hotspot VM, please refer to:

COMPARISON_OP(name, comparison)                                        
  CASE(_if_acmp##name): {                                                
  int skip = (STACK_OBJECT(-2) comparison STACK_OBJECT(-1))          
    ? (int16_t)Bytes::get_Java_u2(pc + 1) : 3;            
  address branch_pc = pc;                                            
  UPDATE_PC_AND_TOS(skip, -2);                                       
  DO_BACKEDGE_CHECKS(skip, branch_pc);                               
  CONTINUE;                                                          
}

From stack_ OBJECT(-2) comparison STACK_ From the sentence object (- 1), we can see that the comparison is actually the pointers of two objects on the operand stack in the heap.

Students who have a certain understanding of the JVM must know the judgment ((x==y)=true). If the memory addresses of X and y are the same, it means that they are the same object, so they directly return true.

Therefore, from the above analysis, the conclusion is that = = determines that the memory addresses of the two objects are compared. If = = returns true, it indicates that the memory addresses are the same.

String.equals source code

Continue to analyze the source code in equals. The implementation logic of the remaining source code is

Compare whether the lengths of two strings are equal. If not, return false directly
Convert two String types into char [] array, and compare each char character step by step according to the array order. If they are not equal, false is also returned

public boolean equals(Object anObject) {
  //ellipsis
  if (anObject instanceof String) { //Judge whether the object instance is a String
    String anotherString = (String)anObject; //Strong conversion to string type
    int n = value.length;
    if (n == anotherString.value.length) { //If two strings are equal, they are naturally equal in length.
      //Traverse the two compared strings, convert them to char type, and compare them one by one.
      char v1[] = value;  
      char v2[] = anotherString.value;
      int i = 0;
      while (n-- != 0) {
        if (v1[i] != v2[i]) //Use ` = = ` to judge. If it is different, it returns false
          return false;
        i++;
      }
      return true; //Otherwise, return true.
    }
  }
  return false;
}

==And equals

Through the above analysis, we know that whether two objects are equal in Java is mainly through == Number, comparing their storage address in memory. The Object class is a super class in Java, which is inherited by default by all classes. If a class does not override the Object class Equals method, you can also judge whether the two objects are the same through the equals method, because it is implemented internally through = =.

public boolean equals(Object obj) {
  return (this == obj);
}

The same here means whether the two objects compared are the same object, that is, whether the addresses in memory are equal. Sometimes we need to compare whether the contents of two objects are the same, that is, the class has its own unique concept of "logical equality", rather than want to know whether they point to the same object.

For example, compare whether the following two strings are the same String a = "Hello" and String b = new String("Hello"), there are two cases. Do you want to compare whether a and b are the same object (memory addresses are the same), or whether their contents are the same? How to distinguish this specific need?

If used == It is to compare whether they are the same Object in memory, but the default parent class of String Object is also Object, so the default equals method also compares the memory address, so we need to override it The equals method, as written in the String source code.

Compare memory addresses first
Then compare the value

public boolean equals(Object anObject) {
  //ellipsis
  if (anObject instanceof String) { //Judge whether the object instance is a String
    String anotherString = (String)anObject; //Strong conversion to string type
    int n = value.length;
    if (n == anotherString.value.length) { //If two strings are equal, they are naturally equal in length.
      //Traverse the two compared strings, convert them to char type, and compare them one by one.
      char v1[] = value;  
      char v2[] = anotherString.value;
      int i = 0;
      while (n-- != 0) {
        if (v1[i] != v2[i]) //Use ` = = ` to judge. If it is different, it returns false
          return false;
        i++;
      }
      return true; //Otherwise, return true.
    }
  }
  return false;
}

So when we a == b determines whether a and B are the same object, and a.equals(b) compares whether the contents of a and B are the same, which should be well understood.

In JDK, not only the String class rewrites the equals method, but also the data types Integer, Long, Double, Float and so on equals method. Therefore, when we use Long or Integer as business parameters in the code, if we want to compare whether they are equal, remember to use equals Method instead of using ==.

Because use == There will be unexpected pits. Many data types like this will encapsulate a constant pool internally, such as IntegerCache, LongCache, etc. When the data value is within a certain range, it will be obtained directly from the constant pool instead of creating a new object.

If you want to use = =, you can convert these data wrapper types to basic types and then compare them with = =, because basic types compare values with = =, but you need to pay attention to the occurrence of NPE (NullPointException) during the conversion process.

Understand HashCode in Class

Looking back, let's take a look at the following test question: two objects have the same value (x.equals(y) == true), but may there be different hash code s?

This result returns true, assuming that x and y are String types, which means that it satisfies two points.

x and y may be the same memory address.
The values of x and y are the same.

Based on these two inferences, we can't contact the hash code, that is, the reference addresses of the two objects are the same. Is it related to the hash code?

In Java, any Object is derived from Object. In Object, there is a native method hashCode().

public native int hashCode();

Why hashCode?

For the program language containing container structure, hashCode is basically involved. Its main function is to work with hash based sets, such as HashSet, HashTable, ConcurrentHashMap, HashMap, etc.

When adding elements to such collections, you first need to judge whether the added elements exist (duplicate elements are not allowed). Perhaps most people will think of calling the equals method to compare them one by one. This method is indeed feasible. However, if there are 10000 pieces of data or more in the collection, the efficiency will be very low if the equals method is used to traverse the values of each element one by one for comparison.

At this time, the function of the hashcode method is reflected. When the collection wants to add a new object, first call the hashcode method of the object to get the corresponding hashcode value. In fact, in the specific implementation of HashMap, a table will be used to save the hashcode value of the saved object:

If there is no hashcode value in the table, it can be saved directly without any comparison;
If the hashcode value exists, its equals method will be called to compare with the new element. If it is the same, it will not exist. If it is different, other addresses will be hashed. Therefore, there is a problem of conflict resolution. In this way, the number of actual calls to equals method will be greatly reduced

To put it mildly, the hashCode method in Java is to map the information related to the object (such as the storage address and field of the object) into a value according to certain rules, which is called hash value. The following code is the specific implementation of the put method in java.util.HashMap:

 public V put(K key, V value) {
   if (key == null)
     return putForNullKey(value);
   int hash = hash(key.hashCode());
   int i = indexFor(hash, table.length);
   for (Entry<K,V> e = table[i]; e != null; e = e.next) {
     Object k;
     if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
       V oldValue = e.value;
       e.value = value;
       e.recordAccess(this);
       return oldValue;
     }
   }

   modCount++;
   addEntry(hash, key, value, i);
   return null;
 }

Therefore, through hashCode, the number of query comparisons is reduced, the query efficiency is optimized, and the query time is reduced.

Implementation of hashCode method

The following is a complete annotation of the hashCode method.

 /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
public native int hashCode();

As you can see from the description of the annotation, the hashCode method returns the hash code value of the Object. It can be useful for hash tables like HashMap. The hashCode method defined in the Object class returns different integer values for different objects. The point of confusion and objection is This is typically implemented by converting the internal address of the object into an integer This sentence means that in general, the implementation method is to convert the internal address of the Object into an integer value.

If you don't delve into it, you will think that what it returns is the memory address of the object. We can continue to look at its implementation, but because this is the native method, we can't directly see how it is implemented internally here. The native method itself is not implemented in java. If you want to see the source code, you can only download the complete JDK source code. Oracle's JDK cannot be seen. OpenJDK or other open source jres can find the corresponding C/C + + code. We find the Object.c file in the OpenJDK, and we can see that the hashCode method points to JVM_IHashCode Method.

static JNINativeMethod methods[] = {
    {"hashCode",    "()I",                    (void *)&JVM_IHashCode},
    {"wait",        "(J)V",                   (void *)&JVM_MonitorWait},
    {"notify",      "()V",                    (void *)&JVM_MonitorNotify},
    {"notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll},
    {"clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone},
};

And the JVM_ The ihashcode method implementation is defined in jvm.cpp as:

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))  
  JVMWrapper("JVM_IHashCode");  
  // as implemented in the classic virtual machine; return 0 if object is NULL  
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;  
JVM_END

Here is a three item expression. The real hashCode value is ObjectSynchronizer::FastHashCode, which is specifically implemented in synchronizer.cpp and intercepts some key code fragments.

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {

  //Omit code snippets

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

From the above code snippet, it can be found that the hashCode is actually calculated get_next_hash, part of the code of this method is defined as follows.

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

From get_ next_ In the hash method, we can see that if the calculation starts from 0, there are six schemes for calculating the hash value, including self increasing sequence, random number, associated memory address and so on. The official default is the last one, namely random number generation. It can be seen that the hashCode may be related to the memory address, but it does not directly represent the memory address. The specific needs depend on the virtual machine version and settings.

The overall description above is still relatively complex. To put it directly, the conclusion is:

The hashCode of an object. By default, if it is not overridden, it is generated by get in the JVM_ next_ Hash method. This generation method is not necessarily related to the memory address. It is generated by random number by default. The hashcodes generated by two different objects may be the same. If this problem exists, it is the so-called hash conflict. In HashMap, the method to solve hash conflict is chain addressing.
Use the expression = = to judge that if true is returned, it means that the hashcodes of the two objects must be the same.

Problem solving

Question: two objects have the same value (x.equals(y) == true), but may hashcodes be different?

Based on the above background knowledge, we can answer this question again.

Theoretically, x.equals(y)==true. If the equals method is not overridden, the memory addresses of the two objects are the same, which means that the hashCode must be the same.

Is it possible that hashCode is different? If you have to do it, it can be realized. Let's take a look at the following example.

public class App 
{
    public static void main( String[] args ) {
        A a = new A();
        B b = new B();
        System.out.println(a.equals(b));
        System.out.println(a.hashCode() + "," + b.hashCode());
    }
}
class A {
    @Override
    public boolean equals(Object obj) {
        return true;
    }
}

class B {
}

The operation results are as follows

true
692404036,1554874502

From the results, we can see that equals returns true, but the hashCode is different.

Although we simulated this possibility, it is wrong in principle, because it violates the general provisions of hashCode and may cause this class to not work together with all hash based sets, such as HashMap, HashSet, etc.

public class App {
    public static void main( String[] args ) {
        Person p1=new Person("mic",18);
        Person p2=new Person("mic",18);

        HashMap<Person,String> hashMap=new HashMap<>();
        hashMap.put(p1,"mic");
        System.out.println(hashMap.get(p2));
    }
}
class Person {
    private String name;
    private int age;

    public Person(String name, int age) {
        this.name = name;
        this.age = age;
    }

   //Omit getter/setter

    @Override
    public boolean equals(Object obj) {
        if(this==obj){
            return true;
        }
        if(obj instanceof Person){
            if(this.getName()==((Person) obj).getName()&&this.getAge()==((Person) obj).getAge()){
                return true;
            }
        }
        return false;
    }
}

In the above code, the equals method is overridden, but the hashCode method is not overridden. When calling the hashcodeo method of the Person class, the default is to call the hashCode method of the parent class Object and return an integer value according to the random number. In the equals method, we judge whether two objects are equal according to name and age.

Two objects p1 and p2 are built in the main method. We store them with HashMap and use the object as the key. Store p1 in the HashMap and obtain it through p2. In principle, since p1 and p2 are equal, the results can be obtained theoretically, but the actual operation results are as follows:

null

Process finished with exit code 0

Students familiar with the principle of 'HashMap' should know that HashMap is composed of an array + linked list structure. The result is that because their hashcodes are not equal, they are placed in different subscripts of the array. When we query according to the Key, the result is null.

We are certainly not satisfied with the results obtained here p1 and p2 Although the memory addresses are different, their logical contents are the same. We think they should be the same.

In order to avoid such problems, a principle is agreed that when you rewrite the equals method, you also need to rewrite the hashCode method. This is a general convention, which includes the following aspects.

Methods must always return the same value.
If two objects are equal according to the equals method comparison, calling the hashCode method in both objects must produce the same integer result.
If the two objects are not equal according to the equals method, the hashCode method in the caller's two objects does not necessarily require the hashCode method to produce different results. However, it is possible to improve the performance of hash table by generating different integer hash values for unequal objects.

Theoretically, if the equals method is overridden without overriding the hashCode method, it violates the second of the above convention. Equal objects must have equal hash values. But the rules are a tacit agreement. If we like to take an unusual path, we are rewriting them If the hashCode method is not overridden after the equals method, serious consequences will result.

Problem summary

After comprehensive analysis, the correct answer to this question is as follows

If two objects have the same value, there may be different hashcodes. The specific implementation method is to rewrite only the equals method and not the hashCode.
There are risks in this processing method. In actual development, we must follow the principle of rewriting both the equals method and the hashCode method. Otherwise, there will be a null problem in the Java hash collection class operation.

JVM interview question series

JVM interview question series: new String("abc") creates several objects

If this article is helpful to you, you are welcome to pay attention and like it; if you have any suggestions, you can also leave comments or private letters. Your support is the driving force for me to insist on creation. If you need JVM related materials, you can get it free by stamping the small plug-in below

Posted by keefe007 on Tue, 26 Oct 2021 23:30:32 -0700

Programmer Group