HashSet not removing all duplicate entries

I am attempting to use a HashSet to make sure data I read in from a .txt file are unique.

Below is the sample data;

999990  bummer
999990  bummer
999990  bummer
999990  bummer
99999   bummer
999990  bummerr

Which is read in using Java.io.File and Java.util.Scanner and stored as an Object of Term as such;

Reading in terms;

while (rawTerms.hasNextLine()){
    String[] tokens = rawTerms.nextLine().trim().split(delimiter);
    if (tokens.length == 2) {               
        uniqueSet.add(new Term(Double.parseDouble(tokens[0]), tokens[1])); //add the term to set
    }
    else {
      rawTerms.close();
      throw new Exception("Invalid member length: "+ tokens.length);
    }           
}

allTerms = new ArrayList<>(uniqueSet); //Covert set into an ArrayList

Term class using Guava;

public Term(double weight, String theTerm){
    this.weight = weight;
    this.theTerm = theTerm;
}


@Override
public boolean equals(final Object obj) {
    if (obj instanceof Term){
        final Term other = (Term) obj;
        return Objects.equal(this.weight, other.weight)
                && Objects.equal(this.theTerm, other.theTerm);
    }
    else {
        return false;
    }
}

@Override
public String toString(){
    return toStringHelper(this).addValue(weight)
            .addValue(theTerm).toString();

}

@Override  
public int hashCode() {  
    return Objects.hashCode(this.weight, this.theTerm);  
}

However, when I run a test to check the size of the array the entries are stored in, I get 3 entries instead of 1 which I am aiming for. I would like any new entry with either the same weight or term as previously added entries to be considered a duplicate.

All help is appreciated!

Matt

Jon Skeet
people
quotationmark

I would like any new entry with either the same weight or term as previously added entries to be considered a duplicate.

That's not how equality works. Equality has to be transitive - so if x.equals(y) returns true, and y.equals(z) returns true, then x.equals(z) has to return true.

That's not the case in your desired relation.

Note that it's also not what your equals method checks at the moment:

return Objects.equal(this.weight, other.weight)
    && Objects.equal(this.theTerm, other.theTerm);

That only returns true if the weight and term match, which is normal for an equality relation. That's why you're getting three entries in your set - because when viewed in that way, you do have three distinct enties.

Fundamentally, HashSet and all the other collections dealing with equality won't help you in a simple way. You'll need to have three separate collections:

  • A set of weights
  • A set of terms
  • A set (or list) of entries.

If the entry you're considering has a weight in the set of weights or a term in the set of terms, you should skip it - otherwise, you should add an entry to each of the three collections.

people

See more on this question at Stackoverflow