I am attempting to use a HashSet to make sure data I read in from a .txt file are unique.
Below is the sample data;
999990 bummer
999990 bummer
999990 bummer
999990 bummer
99999 bummer
999990 bummerr
Which is read in using Java.io.File and Java.util.Scanner and stored as an Object of Term as such;
Reading in terms;
while (rawTerms.hasNextLine()){
String[] tokens = rawTerms.nextLine().trim().split(delimiter);
if (tokens.length == 2) {
uniqueSet.add(new Term(Double.parseDouble(tokens[0]), tokens[1])); //add the term to set
}
else {
rawTerms.close();
throw new Exception("Invalid member length: "+ tokens.length);
}
}
allTerms = new ArrayList<>(uniqueSet); //Covert set into an ArrayList
Term class using Guava;
public Term(double weight, String theTerm){
this.weight = weight;
this.theTerm = theTerm;
}
@Override
public boolean equals(final Object obj) {
if (obj instanceof Term){
final Term other = (Term) obj;
return Objects.equal(this.weight, other.weight)
&& Objects.equal(this.theTerm, other.theTerm);
}
else {
return false;
}
}
@Override
public String toString(){
return toStringHelper(this).addValue(weight)
.addValue(theTerm).toString();
}
@Override
public int hashCode() {
return Objects.hashCode(this.weight, this.theTerm);
}
However, when I run a test to check the size of the array the entries are stored in, I get 3 entries instead of 1 which I am aiming for. I would like any new entry with either the same weight or term as previously added entries to be considered a duplicate.
All help is appreciated!
Matt
I would like any new entry with either the same weight or term as previously added entries to be considered a duplicate.
That's not how equality works. Equality has to be transitive - so if x.equals(y)
returns true, and y.equals(z)
returns true, then x.equals(z)
has to return true.
That's not the case in your desired relation.
Note that it's also not what your equals
method checks at the moment:
return Objects.equal(this.weight, other.weight)
&& Objects.equal(this.theTerm, other.theTerm);
That only returns true if the weight and term match, which is normal for an equality relation. That's why you're getting three entries in your set - because when viewed in that way, you do have three distinct enties.
Fundamentally, HashSet
and all the other collections dealing with equality won't help you in a simple way. You'll need to have three separate collections:
If the entry you're considering has a weight in the set of weights or a term in the set of terms, you should skip it - otherwise, you should add an entry to each of the three collections.
See more on this question at Stackoverflow