Hashtables, damn

Sun, 28 September, 2008

I've been working on a hashtable implementation in C because, well, everyone should do it once, right? And in principle, it should be easy, right? I've spent more than a few hours getting this far, and more than a few to go.

Some features:

  • The getHashCode function returns a char not an int. This keeps memory use low, and in my experience hash tables commonly have dozens, not thousands, of records.
  • Simple chaining is used to resolve collisions.
  • Since traversing the keys is a common thing to do in a hash table, I have an array to hold the keys.
  • For collision resolution, each value in the table is preceded by a pointer to the key in the key array.

The library includes some array helper functions for using re-sizable, null-terminated arrays. I was inspired by the dietlibc [pdf] philosophy of favoring arrays over linked lists and other more complicated data structures, and using realloc() for re-sizing the arrays. A side-effect of realloc, however, is that it may move your array to make enough room, which made life somewhat difficult for me.

I won't print the whole thing here, because:

  • I haven't implemented remove(), clear(), or free() yet.
  • I have a good, general-purpose getHashCode() function in mind, not yet implemented.
  • I don't care to be someone's homework helper. (Please e-mail me, address in the margin, and I'll be happy to send you the latest version).

Here's the hashPut function, though, just for the purpose of showing the amount of work involved.

 
void hashPut(hash * h, void * key, void * value) {

assert(key);
assert(value);
assert(h);

size_t szPtr = sizeof(void *);

if(h->keys) {
	// Couldn't find an existing equivalent key, so we add
	if(!arrSeek(h->keys, key, h->szKey, h->keyComparator)) {
		void * oldKeysArray = h->keys;
		h->keys = arrAdd(h->keys, key, h->szKey);

		// Since realloc (in arrAdd) may have moved the array,
		// we will have to correct all the pointers to the keys.
		off_t keyArrayMoved = h->keys - oldKeysArray;
		if(keyArrayMoved) {
			size_t szPtrPlusValue = szPtr + h->szValue;
			char *** oneValue = h->values;
			int i = 0;
			for(; i < 256; oneValue++, i++) {
				if(*oneValue) {
					**oneValue += keyArrayMoved;
				}
			}
		}
	}
// No existing key array, create it
} else {
	h->keys = arrNew(key, h->szKey);
	
	h->values = calloc(256 , szPtr);
}

void * keyInArray = arrSeek(h->keys, key, h->szKey, h->keyComparator);

char hashCode = h->hashFunc(key);
void ** table = h->values;
void * chain = table[(unsigned char)hashCode];

/* Chain data is stored as an array of an implied
 * struct type:
 * 	{
 * 		void * ptrToKeyInKeyArray;
 * 		char[szValue] value;
 * 	}
 *
 * Since values have variable size, the struct can't
 * be explicitly declared.  Just know that's what's going
 * on here.
 */

if(chain) {
	void * arr = chain;
	void * existing = arrSeek(arr, value, h->szValue, 
		h->valueComparator);

	// Replace an existing value in the chain
	if(existing) {
		memcpy(existing,  key, szPtr);
		memcpy(existing + szPtr, value, h->szValue);

	// A new item in the chain
	} else {

		char ptrPlusValue[szPtr + h->szValue];
		*ptrPlusValue = (int)keyInArray;
		memcpy(ptrPlusValue+szPtr, value, h->szValue);
		table[(unsigned char)hashCode] = 
			arrAdd(arr, &ptrPlusValue, szPtr + h->szValue);

	}
// No chain exists at this location in the values array
} else {

	char ptrPlusValue[szPtr + h->szValue];
	*ptrPlusValue = (int)keyInArray;
	void * copyValueHere = ptrPlusValue + szPtr;
	memcpy(copyValueHere, value, h->szValue);

	void * newChain = arrNew(&ptrPlusValue, szPtr + h->szValue);
	table[(unsigned char)hashCode] = newChain;
}

}

"A hash table? Simple!" I thought. "I could probably do it with macros," I thought.

About Me

Erik Mackdanz is a software developer in Austin, Texas, along with everybody else.

Links