Writing Spark using Java Api If PairRDD's key value is a custom type, you need to override hashcode and equals methods, otherwise you will find that the same Key value is not aggregated.
For example: Use User type as Key
public class User { private String name; private String age; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAge() { return age; } public void setAge(String age) { this.age = age; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((age == null) ? 0 : age.hashCode()); result = prime * result + ((name == null) ? 0 : name.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; User other = (User) obj; if (age == null) { if (other.age != null) return false; } else if (!age.equals(other.age)) return false; if (name == null) { if (other.name != null) return false; } else if (!name.equals(other.name)) return false; return true; } }
In general, eclipse can automatically generate hashcode and equals methods of types without special treatment by itself.
If we encounter a special case, we can use the HashCodeBuilder and EqualsBuilder tool classes in the commons-lang3 package to generate the corresponding methods.
package run.aaa.spark; import org.apache.commons.lang3.builder.EqualsBuilder; import org.apache.commons.lang3.builder.HashCodeBuilder; public class User { private String name; private String age; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAge() { return age; } public void setAge(String age) { this.age = age; } @Override public int hashCode() { return HashCodeBuilder.reflectionHashCode(this); } @Override public boolean equals(Object obj) { return EqualsBuilder.reflectionEquals(this, obj); } }