Although hash values must be equal for objects to be equal, the converse need not be true. Two objects hashing to the same value does not necessarily mean the objects are equal. For reasons of efficiency, object hash values of two objects are first compared to test for equality, before a more expensive comparison is made. If two objects do not hash to the same value, they cannot be equal, so the second, more expensive test can be avoided. Because most objects are not equal when compared, the quick hash comparison, based on the comparison of two longs, can be used to determine that two objects are not equal.
Two criteria must be met in order for two objects to be considered equal. First both objects must produce the same hash value. If they do not produce the same hash value they can not be equal. Second, the comparator calls its Compare function with each of the two objects being compared as arguments. The comparators Compare function in turn calls the Compare member function defined for the particular class of objects in the collection.
However, the comparator's Compare function is only called if the comparator's GetHash function has returned the same value for both objects. If Compare returns a value that indicates equality, the objects are considered equal. Thus if you subclass, TComparator, you must override both GetHash and Compare in a way that meets these two criteria. You must assure that objects which you consider equal will return identical hash values. Also, the hashing algorithm selected is important as an algorithm which provides good distribution of objects which should not be considered equal. Calls to a potentially costly implementation of Compare can be replaced by a simple comparison of the two long values returned by the hash function for each of the elements being compared.
Consider a program which requires a non-standard comparator for a collection. For example, you might want to use comparator to sort a collection of text elements in reverse lexical order. You could do this be creating a special class of comparator.
If hash values are
not equal, elements
are not equal
Defining a comparator
to reverse sort order
Whenever you subclass TComparator or one of its descendants you must override the definitions of Compare and GetHash. GetHash is implemented inline in this definition and returns the hash value defined by TStandardText::Hash for the specific elements being compared. Here's a way to implement compare so that it reverses the normal order for sorting a collection of TStandardText objects. class TReversedComparator : public TMOrderableCollectibleComparator<TStandardText> {
public:
virtual EComparisonResult Compare(const TStandardText& leftObj,
const TStandardText& rightObj) const;
virtual HashResult GetHash(const TStandardText& obj) const
{ return obj.Hash(); }
};
Note that TReversedComparator was not implemented as a Template class, though it could have been if you were required to use TReversedComparator with different types of MOrderableCollectible objects. Not making the comparator class a template simplifies its syntax when used in declarations. EComparisonResult TReversedComparator::Compare(const TStandardText& leftObj,
const TStandardText& rightObj) const
{
qprintf("ENTER comparator\n");
if (leftObj.IsLessThan(&rightObj))
return kGreaterThan;
if (leftObj.IsGreaterThan(&rightObj))
return kLessThan;
return kEqual;
}
Attaching this comparator to a TSortedSequenceOf<TStandardText> will cause collection elements to be maintained internally in reversed lexical order. TReversedComparator* comparator = new TReversedComparator;
Here's a complete program.
Sample program
using reversed-order comparator
The output shows the reversed order of TStandardText objects within the collection. // Copyright (c) 1994 Taligent, Inc. All Rights Reserved.
#ifndef Taligent_STANDARDTEXT
#include <StandardText.h>
#endif
#ifndef _H_STDIO
#include <stdio.h>
#endif
void printCollection(TCollectionOf<TStandardText>* collection)
{
int i = 0; // tracks order of elements
TIteratorOver<TStandardText>* iterator = collection->CreateIterator();
for (TStandardText* text = iterator->First();
text != NIL;
text = iterator->Next(), i++) {
////////////////////////////////////////////////////
// buffer must be large enough for
// longest possible string + zero terminator
////////////////////////////////////////////////////
static const int kBufferSize = 256;
static char extractBuffer[kBufferSize];
text->Extract(TTextRange::GetMaximumRange(),
extractBuffer,
kBufferSize - 1);
printf("\n%3d> %s", i, extractBuffer);
}
printf("\n");
delete iterator;
}
class TReversedComparator : public TMOrderableCollectibleComparator<TStandardText> {
public:
virtual EComparisonResult Compare(const TStandardText& leftObj, const TStandardText& rightObj) const;
virtual HashResult GetHash(const TStandardText& obj) const { return obj.Hash(); }
};
EComparisonResult TReversedComparator::Compare(const TStandardText& leftObj,
const TStandardText& rightObj) const
{
if (leftObj.IsLessThan(&rightObj))
return kGreaterThan;
if (leftObj.IsGreaterThan(&rightObj))
return kLessThan;
return kEqual;
}
int main()
{
TReversedComparator* comparator = new TReversedComparator;
TSortedSequenceOf<TStandardText>* collection =
new TSortedSequenceOf<TStandardText>(comparator, NIL);
collection->Add(new TStandardText("a"));
collection->Add(new TStandardText("b"));
collection->Add(new TStandardText("c"));
collection->Add(new TStandardText("a"));
collection->Add(new TStandardText("b"));
collection->Add(new TStandardText("c"));
printCollection(collection);
delete collection;
return 0;
}
1> c 2> b 3> b 4> a 5> a
In the following slightly contrived example, consider the comparator class that lets you monitor calls to the comparator's Compare member function.
If the definition of the hash function were changed to always return a value of 0, you'd have the worst kind of hash function possible, as shown in the following definition for GetHash.
Class for monitoring compare operations
The TIntComparator::Compare simply returns the value of one integer that is being compared against a second integer. This function provides the ideal hash for longs because there is a unique hash value for each of the possible values of a TCollectibleLong. Assuming we have a printCollection function that works with collections of TCollectibleLongs a main function might be built as follows. ////////////////////////////////////////////////////////////
// class TIntComparator (declaration)
////////////////////////////////////////////////////////////
class TIntComparator : public TMCollectibleComparator<TCollectibleLong> {
public:
virtual EComparisonResult Compare(const TCollectibleLong& leftObj,
const TCollectibleLong& rightObj) const;
virtual HashResult GetHash(const TCollectibleLong& obj) const
{ return obj.GetValue(); }
};
////////////////////////////////////////////////////////////
// class TIntComparator (definition)
////////////////////////////////////////////////////////////
EComparisonResult TIntComparator::Compare(const TCollectibleLong& leftObj,
const TCollectibleLong& rightObj) const
{
qprintf("ENTER comparator\n");
if (leftObj.IsEqual(&rightObj))
return kEqual;
else
return kNotEqual;
}
Most important is that three duplicate items are added to the set. The output from this program shows that the comparator's Compare function is called exactly twice. int main()
{
TIntComparator *comparator = new TIntComparator;
TSetOf<TCollectibleLong>* collection =
new TSetOf<TCollectibleLong>(comparator, NIL);
collection->Add(new TCollectibleLong(0));
collection->Add(new TCollectibleLong(1));
collection->Add(new TCollectibleLong(2));
collection->Add(new TCollectibleLong(1));
collection->Add(new TCollectibleLong(1));
printCollection(collection);
delete collection;
return 0;
}
This makes sense because only in the case of the second and third additions of TCollectibleLong(1) elements does GetHash produce values that collide. Only when hash values collide is your Compare function called to see if two elements are equal. ENTER comparator
ENTER comparator
0> 0
1> 1
2> 2
All elements will map to the same hash value and your TIntComparator::Compare must be called each time a new element is added to the set to check for duplicate members. With this change the above program will produce the following output. class TIntComparator : public TMCollectibleComparator<TCollectibleLong> {
public:
...
virtual HashResult GetHash(const TCollectibleLong& obj) const { return 0; }
};
When possible avoid the overhead associated with calls to Compare by defining GetHash to produce an evenly distributed range of hash values for your collection elements. ENTER comparator
ENTER comparator
ENTER comparator
ENTER comparator
ENTER comparator
ENTER comparator
ENTER comparator
0> 0
1> 2
2> 1
[Contents]
[Previous]
[Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.