Comparators for identifying collection elements

As explained under "Types of collections", the primary division between collections is between unordered collections (sets and dictionaries) and ordered collections (deques, sorted sequences and arrays). It follows that there are two primary types of comparators: comparators to be used with unordered collections (TMCollectibleComparator) and comparators to be used with ordered collections(TMOrderableCollectibleComparator).

Dictionaries turn out to be a special case which requires a special comparator (TMCollectibleKeyValuePairComparator). This makes sense as the elements within dictionaries are key / value pairs. For the time being, ignore dictionaries and TMCollectibleKeyValuePairComparator.

TMCollectibleComparator for sequences not depending on order

The easiest type of comparator to use is an instance of TMCollectibleComparator. The Compare member function for TMCollectibleComparator returns an enumerated type with one of two possible values: kEqual, or kNotEqual. Compare receives two objects as arguments which are intended to be compared. Both will be objects from your collection. Refer to the first argument as the left object and the second object as the right argument. Compare invokes the member function IsEqual for the left object. IsEqual takes one argument which will be the right object. Thus the result of a given comparison depends on both the type of comparator specified (ordered or unordered) and on the way the elements of the collection define IsEqual. (Actually Compare is not even called unless both the left object and the right object produce the same hash values.) IsEqual returns a bool. So the actual statement which tests for comparison inside TMCollectibleComparator::Compare might look like this in pseudo code

          if (leftObject->IsEqual(rightObject)
              return kEqual;
          else
              return kNotEqual;
This is an oversimplification, but the algorithm holds true for the purpose of this discussion. From this, you can see that the primary concern of an instance of TMCollectibleComparator is to test for equality in the same way as the C++ operators ==, and !=.

Much of the time TMCollectibleComparator does the job. If you are using sets, order is undefined so TMCollectibleComparator is all you need. Deques, indexed sequences, and arrays are generally not concerned with the kind of internal ordering required by a sorted sequence, so in most cases TMCollectibleComparator will do the job for these kinds of collections as well.

Before looking at comparators for sorted sequences, consider the following example programs that illustrate the role of comparators in collection protocol for adding elements, searching for elements, and querying collection state.

Sets without comparators appear to allow duplicate elements

First consider what happens when you have a set of elements where no comparator is supplied. The example program uses instances of TStandardText for elements. The only special protocol that you need to be familiar with for TStandardText (which supports a Unicode representation of text) is that an instance of TStandardText is created by giving supplying a string argument to the TStandardText constructor. The following expression, for example create an instance TStandardText on the heap and returns a pointer to it.

        new TStandardText("one")
The contents of the resulting object will be the Unicode representation of the string "one". An instance of TStandardText can be created and added to a collection in a single statement.

        collection->Add(new TStandardText("one"));
You need to extract the text inside TStandardText instances in order to print, and identify the values inside of collection elements. For this purpose, TStandardText::Extract is used. Rather than explain the details of the Extract function, code which makes use of it is presented inside the generic (polymorphic) function printCollection which iterates over any kind of collection containing instances of TStandardText and prints the internal strings contained in those instances.

The following program creates a collection that is an instance of TSetOf and adds six elements to the collection. The second group of items would appear to be duplicates of the first set of three items. The output of the program shows however, that the set does not consider the second group to be duplicates of the first.

Sample program using no comparator

      // Copyright (c) 1994 Taligent, Inc. All Rights Reserved.
      
      #ifndef Taligent_STANDARDTEXT
      #include <StandardText.h>
      #endif
      
      #ifndef _H_STDIO
      #include <stdio.h>
      #endif
      
      void printCollection(TCollectionOf<TStandardText>* collection)
      {
          int i = 0;                          // tracks order of elements
          TIteratorOver<TStandardText>* iterator = collection->CreateIterator();
          for (TStandardText* text = iterator->First();
               text != NIL;
               text = iterator->Next(), i++) {
      
              ////////////////////////////////////////////////////
              // buffer must be large enough for
              // longest possible string + zero terminator
              ////////////////////////////////////////////////////
              static const int kBufferSize = 256;
              static char extractBuffer[kBufferSize];
      
              text->Extract(TTextRange::GetMaximumRange(),
                            extractBuffer,
                            kBufferSize - 1);
              printf("\n%3d> %s", i, extractBuffer);
          }
          printf("\n");
      
          delete iterator;
      }
      
      int main()
      {
          TSetOf<TStandardText>* collection =
              new TSetOf<TStandardText>(NIL, NIL);
      
          collection->Add(new TStandardText("one"));
          collection->Add(new TStandardText("two"));
          collection->Add(new TStandardText("three"));
      
          collection->Add(new TStandardText("one"));
          collection->Add(new TStandardText("two"));
          collection->Add(new TStandardText("three"));
      
          printCollection(collection);
      
          delete collection;
          return 0;
      }
Here's the program output.

      0> one
      1> two
      2> two
      3> three
      4> one
      5> three

Sets elements are not ordered

Two items are worth noting in the output. First of all, the elements of the collection do not appear in the same order in which they are inserted into the collection. This is a fundamental property of sets; they do not maintain any sort of order. If you need to maintain elements in order and you do not require random access to elements you probably want to use a deque. Deques can be used to represent queues, stacks, as well as double-ended queues.

Without a comparator, objects created with new can't be equal

The second item is that strings which would seem to compare as equals (strcmp would consider them equal) actually appear more than once. This is because each of the strings occupies a different region of heap memory, so each of the pointers that reference each of the six strings, has a different value. If you do not specify a comparator, the default Compare operation for collections is based on the implementation of IsSame as defined for collection elements. IsSame uses pointer addresses to determine if two objects are identical.

If the intention of the program were to eliminate duplicate strings defined as strings composed of identical characters, you would need to create a comparator and assign it to the collection when the collection is created. As the program now stands, no comparator is specified in the first argument to the constructor.

          TSetOf<TStandardText>* collection =
              new TSetOf<TStandardText>(NIL, NIL);

A comparator for text objects

Therefore the default comparison mechanism for
TSetOf is used by this collection. The default comparator for TSetOf calls the MCollectible::IsSame function when it needs to compare two elements for equality.

You can create a comparator for weeding out duplicate strings as follows.

          TComparator<TStandardText>* comparator =
              new TMCollectibleComparator<TStandardText>();
This code is inserted in the program just prior to the constructor. Now the constructor can be invoked with comparator as the first argument.

          TSetOf<TStandardText>* collection =
              new TSetOf<TStandardText>(comparator, NIL);
With these two changes, the resulting output now shows that duplicate strings are eliminated.

      0> one
      1> two
      2> three
This is closer to the behavior usually associated with sets. For example if you allowed a user to select a set of files from a dialog box with the intention of deleting them or copying them to another location, you might want to divide files within a directory into two sets which are shown in the dialog in a way that gives the user appropriate feedback as to which files have been selected. Gathering the files into a set not only allows you use an iterator to perform the deletion, but also assures that a file is not inadvertently deleted twice because it can't possible appear twice in the same set if equality files is properly defined for the collection.

TMCollectible Comparator for sets

The reason that program now works as expected is because the comparator created by the expression

        TMCollectibleComparator<TStandardText>()
automatically calls MCollectible::IsEqual (instead of IsSame) each time a comparison is required. For sets, this comparison may be required each time an element is added. To avoid unnecessary comparisons, use a hashing function to first determine whether or not a full comparison is required. Thus there is a relationship between the hash value returned by MCollectible objects and the Compare function defined for those objects. This relationship is explored further in the section titled "Hashing, equality, and comparison."


[Contents] [Previous] [Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker