How does HashSet guarantee element uniqueness?

Keywords: Java

Sets in Collection are divided into HashSet and TreeSet
The elements in the Set are disordered, that is, the order of storage and extraction is not necessarily the same, and the elements can not be repeated. The underlying data structure of HashSet is a hash table, and the order in which elements are stored is sorted by hash values.
So how does HashSet guarantee element uniqueness?

First, let's look at the hash value intuitively.

 class Demo{
}

public class HashSetTest {

    public static void main(String[] args) {
        Demo d1=new Demo();
        Demo d2=new Demo();
        sop(d1);
        sop(d2);
    }
    public static void sop(Object obj){//Define this sop method for fast output, especially in the case of more output.
        System.out.println(obj);
    }
}

HashCode looks like this

How did these HashCode s come into being?
HashSet has a method called int hashCode(), which produces an unordered hexadecimal hashcode value by default.
In order to embody it, we can rewrite it directly:
Override hashCode() method in Demo class

 class Demo{
     public int hashCode()
     {
         return 60;
     }
}

public class HashSetTest {

    public static void main(String[] args) {
        Demo d1=new Demo();
        Demo d2=new Demo();
        sop(d1);
        sop(d2);
    }
    public static void sop(Object obj){
        System.out.println(obj);
    }
}

The effect of rewriting is that, in any case, the hashCode() method returns only one hashCode value, 60 (down to 3c in the hexadecimal system), so the result is as follows:

So the question arises, since HashSet s are sorted by hash values, what happens if there is the same hash value?

Again, for example, we use a more robust custom class, Person, which has two elements, String name and int age, as well as getName(),getAge(), two basic methods, plus a duplicated hashCode() method. Define HashSet object hs.

class Person
{
    private String name;
    private int age;
    Person(String name,int age)
    {
        this.name=name;
        this.age=age;
    }

    public String getName()
    {
        return name;
    }
    public int getAge()
    {
        return age;
    }
    public int hashCode()
    {
        System.out.println(this.name+"......hashCode");
        return 60;
    }
}

public class HashSetDemo {

    public static void sop(Object obj)
    {
        System.out.println(obj);
    }
    public static void main(String[] args) {
        HashSet hs=new HashSet<>();

        hs.add(new Person("a1",11));
        hs.add(new Person("a2",12));
        hs.add(new Person("a2",12));
        hs.add(new Person("a3",13));

        Iterator it=hs.iterator();
        while(it.hasNext()){
            Person p=(Person)it.next();
            sop(p.getName()+"::"+p.getAge());
        }
    }

}
/*Train of thought:
 * 1,Describing people, encapsulating data into people's objects
 * 2,Define containers to store people
 * 3,take out
*/

The first four sentences of the running result indicate that the overwritten hashCode() method is called to calculate the hash value at each step. The last four sentences indicate that a1,a2,a3 are written into the new container, and A2 is written twice.

This situation is that all a objects (transformed to hashSet) have the same hashcode, which is still unavoidable for them to write, without guaranteeing the uniqueness of the elements.

Because HashSet() guarantees the uniqueness of elements through two methods of elements, hashCode() and equals().
If the HashCode value of the element is the same, the equals will be judged to be true.
If the HashCode value of the element is different, equals is not called.

Because the original equals method is only applicable to String's judgment, and can not recognize the combination of name and age, duplicate element writing will occur.

Let's rewrite an equals method for Person

public boolean equals(Object obj)
    {

        if(!(obj instanceof Person))
            return false;
        Person p=(Person)obj;
        System.out.println(this.name+"...."+p.name);
        return this.name.equals(p.name) && this.age == p.age;
    }

This method ensures that both name and age are treated as duplicate elements. The equals method is automatically called when the hashCode() method runs

First sentence: calculate a1 hash value and save it
The second sentence: calculating a2 hash value, we find that 3c is the same as a1.
Third sentence: a2 and a1 compare equals, different, save in
The fourth sentence: the second a2 comes, and calculating hash value is 3c
Fifth sentence: The second a2 and the first a2 equals are the same, they are not deposited.
The sixth sentence: a3 calculates hash value, 3c
The seventy-eight sentences are a3 and a1 with the same hash value, a2 is equal.
Nine, ten and eleven are the results of deposit, without duplication.
In fact, this is a rather awkward state, because all elements have the same hash value, which means that the second element should be compared with the first one, the third with 1, 2, 3, the fourth with 1, 2, 3, and the fifth with 1, 2, 3, 4 with a lower efficiency.

So for optimization, return 60 of hashCode() method is replaced by return name.hashCode()+age*37; (37 is optional value)

Complete code

import java.util.*;

class Person
{
    private String name;
    private int age;
    Person(String name,int age)
    {
        this.name=name;
        this.age=age;
    }
    public int hashCode()
    {
        System.out.println(this.name+"......hashCode");
        return name.hashCode()+age*37;
    }

    public boolean equals(Object obj)
    {

        if(!(obj instanceof Person))
            return false;
        Person p=(Person)obj;
        System.out.println(this.name+"....equals...."+p.name);
        return this.name.equals(p.name) && this.age == p.age;
    }
    public String getName()
    {
        return name;
    }
    public int getAge()
    {
        return age;
    }
}

public class HashSetDemo {

    public static void sop(Object obj)
    {
        System.out.println(obj);
    }
    public static void main(String[] args) {
        HashSet hs=new HashSet<>();

        hs.add(new Person("a1",11));
        hs.add(new Person("a2",12));
        hs.add(new Person("a2",12));
        hs.add(new Person("a3",13));

        Iterator it=hs.iterator();
        while(it.hasNext()){
            Person p=(Person)it.next();
            sop(p.getName()+"::"+p.getAge());
        }
    }

}

Operation results:

Repeated comparisons are avoided.

Conclusion:

Set: Elements are out of order (the order of storage and extraction is not necessarily the same), and elements cannot be repeated.

  • HashSet: The underlying data structure is a hash table.

  • How does HashSet guarantee element uniqueness?
    It is accomplished by two methods of elements, hashCode and equals.
    If the HashCode value of the element is the same, the equals will be judged to be true.
    If the HashCode value of the element is different, equals is not called.

Note that the HashCode and equals methods of the element are the dependent methods for determining whether the element exists or not, and for deleting operations.
First judge whether hashcode is the same, and then judge equals if it is the same

!! (ArrayList judges and deletes elements only on equals)

Posted by Obadiah on Thu, 04 Apr 2019 10:36:29 -0700