LOGO

Hash Table Package

Title:
Hash Table Package
Language:
C++
Author:
Philip J. Erdelsky
Date:
March 3, 2001
Usage:
Public domain; no restrictions on use
Portability:
Any C++ compiler
Keywords:
hash, table, compiler
Abstract:
A hash table package that does not use templates or exceptions and does not make any assumptions about the nature of table items
Source:
htab.txt

Compilers and other similar applications often use large tables whose items must be found quickly. One way to do this is to break a large table down into smaller tables. This is effective only if it can be determined quickly which smaller table contains a particular item.

The package resides in the header file HTAB.H, which must be included in every module that calls the package, and the file HTAB.CPP, which must be compiled and linked to an application that uses the package. The two files are combined in HTAB.TXT. You will have to separate them before using them.

A standard technique is to compute a number called a hash code for each item and put it into the table corresponding to the hash code. That is what this package does.

This package makes no assumptions about the nature of the items in the table. Items are simply non-NULL pointers with two operations defined by the user.

The first operation determines whether two items match. The user supplies it, and the package calls it as follows:

     result = (*match)(p, q, pointer);

     htab_match match; pointer to function

     const void *p;    pointer to item

     const void *q;    pointer to item

     void *pointer;    user-defined pointer

     bool result;      true if the items match

The package defines "htab_match" to be a pointer to a funcion of this type.

Of course, the matching operation must have the usual properties of equality:

  1. SYMMETRY: Any item matches itself.
  2. REFLEXIVITY: If *p matches *q then *q matches *p.
  3. TRANSITIVITY: Items that match a third item also match each other.
Two examples might be as follows:
     // for a case-sensitive table
     bool match(const struct item *p, const struct item *q, void *pointer)
     {
       return strcmp(p->name, q->name) == 0 && p->type == q->type;
     }

     // for a case-insensitive table
     bool match(const struct item *p, const struct item *q, void *pointer)
     {
       return strcmpi(p->name, q->name) == 0 && p->type == q->type;
     }

The second operation computes a hash code. The user supplies it, and the package calls it as follows:

     n = (*code)(p, pointer);

     htab_code code;   pointer to function

     const void *p;    pointer to item

     void *pointer;    user-defined pointer

     unsigned n;       hash code of item

The hash code function must have the following properties:

  1. The hash code is in the range from 0 to 255, inclusive.
  2. Matching items produce the same hash code.

The most efficient hash code functions are those that assign roughly equal numbers of items to each hash code. The package supplies two functions that can be used for this purpose with named items:

     htab::code(const char *name);  // case-sensitive
     htab::codei(const char *name); // case-insensitive

The first will return equal hash codes for identical names. The second one will return equal hash codes for names that match except for case.

A hash table is initialized by calling its constructor. Typecasts are generally needed on the first two parameters, since the package expects the match() and code() functions to take (const void *) arguments, whereas the definitions of these functions typically take pointers to a user-defined structure. Here are two ways to create and initialize a hash table:

     htab h((htab_match) match, (htab_code) code, pointer);

     htab *ph = new htab((htab_match) match, (htab_code) code, pointer);

The third argument is a user-defined pointer which will be passed to the match() and code() functions without change. It can be useful if the match() and code() functions for two or more hash tables have a lot of common code and differ in only a few parameter values.

The functions need not be called match() and code(), but the names used in their definitions must match the names supplied to the constructor.

The following member function finds an item in the table which matches a specified item:

     q = h.first(p);

     htab h;           hash table

     const void *p;    pointer to specified item

     void *q;          pointer to matching item, or NULL if there is none

If two or more items in the table match the specified item, this function returns a pointer to one of them. It is not guaranteed to be the first one inserted.

The following member function can be called repeatedly to find other matching items:

     q = h.next();

     htab h;           hash table

     void *q;          pointer to item matching the specimen supplied
                       to the most recent call on h.first() for the
                       same table, or NULL if there are no more

Repeated calls on h.next() after a successful call on h.first() will return pointers to all other matching items. The order in which they are returned is undefined, but each will be returned once, and only once.

The following member function inserts an item into the table after h.first() and h.next() have been called. The item inserted is the one submitted to h.first().

     h.insert();

     htab h;           hash table

When the table is no longer needed, the items in it should be destroyed to prevent memory leaks. The following member function removes an item from the table:

     q = h.remove();

     htab h;           hash table

     void *q;          pointer to item, or NULL if there are no more items

Repeated calls on this function will remove all items in the table. The items are not removed in any particular order, but each of them will be returned once, and only once.

Finally, the destructor will release any memory allocated for the table by the package itself.