Code archives/Miscellaneous/Hashtable type for BMax
This code has been declared by its author to be Public Domain code.
Download source code
| |||||
| The format for this hash table is: key: any string value: any object THashtable.Constructor( size:int ) is the only way to create a hash table. You'll get an error if you try to create one with new. So an example call to insert a value into a new hash table might be: myObject:object = new object myHash:THashtable = THashtable.Constructor( 50 ) myHash.insertEntry( "myKey", myObject ) I tried to document the major functions in the code itself, and clarify some of the usage in the test portion of the code. Be warned however that the test area is some quick and ugly code! :) | |||||
''Constructor( capacity:int ) - returns a new Hashtable object with a maximum capacity = to
'' the parameter provided. Utilize this function to create
'' any and all hashtable objects.
''
''
''InsertEntry(name:string,o:object) - add an object to the table, the name is used to
'' to generate an index 0-Capacity, selecting the list.
'' multiple entries with the same name are permitted using
'' this method. If you want to guarantee that only one
'' entry exists per a given name, use the
'' insertUniqueName( ) method
''
''insertUniqueName( name:string, o:object) - this method guarantees that the object specified
'' will be the only object for that particular name.
''
''insertUniqueNamedObject( name:string, obj:object) - this method guarantees that there will only be
'' one copy of the object specified by the obj parameter
'' in the hashtable. There may be multiple entries for
'' the name however. Using this instead of the
'' standard InsertEntry() method when you want to make
'' sure the same object (assuming it is the same name)
'' is not duplicated in the table.
''
''RemoveNamed(name:string) - Removes all entries with the name specified from the hash table.
'' Note, this does not return those objects.
''
''RemoveNamedObject(name:string, obj:object) - removes any entries from the bucket specified by the name
'' parameter whose object value matches the object specified
'' by the obj parameter. This method lets you quickly remove
'' a single object without removing all objects that share the
'' same name (such as RemoveNamed() does.)
''
''RemoveObject( obj:object ) - searches every bucket in the hash table for the specified object and removes
'' all traces of it from the hash table. Note: This method can be relatively slow
'' on large tables, and if possible you should use RemoveNamedObject() instead.
''
''GetEntry:object(name:string) - grab the first entry for a given name. You must cast it back to its original
'' type before using it,myobject=myType(GetEntry(name:string))
''
''GetMultipleEntries:object [](name:string) - returns an array of objects (entries) that match the
'' given name parameter. This is useful when you intend
'' to store a list of objects with a given name index
''
''Grow( growthsize:int ) - Increases the number of buckets in the hash array by growthsize.
''
''removeAllEntries() - removes all objects from the hash table
''
''removeNamed( name ) removes all entries with the given name from the hash table
''
''
'' Regarding Unique Entries
'' ------------------------
''If you want to ensure there is only one entry for a given name key, there are two approaches you can take.
''If you want or need to ensure there is no entry for a given name key before you do an insert - or if there
''is an entry already present you want to handle it in some manner you should
''call getEntry( name ). If this is null, it's safe to insert. If this returns an object calling InsertEntry()
''will insert another entry with the same name. While you can access multiple entries with the same name via
''the method GetMultipleEntries(name), doing so is not ideal if you simply want a single entry per name.
''So the first approach is to call getEntry, and deal with the object returned (if any) and then do your
''insertion.
''The second approach comes into play if you don't need access to any entry that might already be stored
''under the insert name, but simply want to guarantee the entry you are inserting is the only one for
''that name. Calling insertUniqueName(name) will erase any entries already stored under "name" and guarantees that
''there is only one unique entry for the name provided.
'''''''''''''''''''''''''''
''
''HASHTABLE OBJECT
''
''- M.Laurenson/Defoc8 2006
''- S.Hofslund/SculptureOfSoul 2006
''
''- major modifications: the introduction of a capacity parameter, getMultipleEntries(),
'' insertUniqueName(), insertUniqueNamedObject(), removeNamed(), removeNamedObject(), removeObject(), and
'' Grow() methods and the Constructor() function some error checking code, document modification
'' as well as a new and much more effective hash algorithm added by S.Hofslund (SculptureOfSoul)
''''''''''''''''''''''''''''
Strict
Type THashTable
Field _table:TList[]
Field _capacity:Int
?debug
Global _indirectconstruct:Int
?
Method New()
?debug
Assert _indirectconstruct, "Use THashtable.Constructor() to create a new hash table."
?
EndMethod
Function Constructor:THashTable( capacity:Int )
?debug
_indirectconstruct = True
?
Local retobj:THashTable = New THashTable
retobj._capacity=capacity
retobj._table=retobj._table[..capacity]
For Local n:Int=0 Until capacity
retobj._table[n]=New TList
Next
?debug
_indirectconstruct = False
?
Return retobj
EndFunction
Method Grow( growthsize:Int )
If growthsize <= 0 Return
Local oldtable:TList[]
oldtable = _table
_capacity = (_capacity + growthsize)
_table = New TList[_capacity]
For Local i:Int = 0 Until _capacity
_table[i] = New TList
Next
'regenerate indexes for and reinsert all of the old tables entries
For Local n:Int = 0 Until oldtable.length
For Local entry:gHashEntry = EachIn oldtable[n]
InsertEntry( entry.name, entry.obj )
Next
Next
EndMethod
Method genIndex(name:String)
Local val:Int=0
Local temp:Int
For Local n:Int=0 Until name.length
'the following is commented out because it is neither as fast or as efficient as the one used below
' val:+ (name[n])^2 + (name[n]*(n^2)) + (name[n]Mod 3 * name[n])
val= (val Shl 3) + val + name[n]
'commented out for the same reason
' val:+ (name[n])^(((name.length + 1 - n)Mod 2) + 2)
Next
'call Abs on the index because it might be negative (in the case of an integer overflow).
Return Abs(val Mod (_capacity ) )
EndMethod
Method insertEntry(name:String ,obj:Object)
Local index:Int=genIndex(name$)
Local entry:gHashEntry=New gHashEntry
entry.name=name
entry.obj=obj
entry.link=_table[index].AddLast(entry)
EndMethod
'object must have an overriden compare method!
Method insertSortedEntry(name:String,obj:Object,ascending:Int=True)
Local index:Int=genIndex(name$)
Local entry:gHashEntry=New gHashEntry
entry.name=name
entry.obj=obj
entry.link=_table[index].AddLast(entry)
_table[index].Sort(ascending)
EndMethod
Method sortEntriesNamed(name:String,ascending=True)
Local index:Int=genIndex(name)
_table[index].sort(ascending)
EndMethod
Method sortAll(ascending:Int=True)
For Local iter:Int=0 Until _table.length
_table[iter].sort(ascending)
Next
EndMethod
Method insertUniqueNamedObject( name:String, obj:Object )
Local index:Int = genIndex(name)
internal_removeNamedObject( obj, index )
Local entry:gHashEntry = New gHashEntry
entry.name = name
entry.obj = obj
entry.link = _table[index].Addlast(entry)
EndMethod
Method insertUniqueName(name:String,obj:Object)
Local index:Int=genIndex(name$)
internal_removeNamed( name, index )
Local entry:gHashEntry=New gHashEntry
entry.name=name
entry.obj=obj
entry.link=_table[index].AddLast(entry)
EndMethod
Method getEntry:Object(name:String)
Local index:Int=genIndex(name)
Local link:TLink = _table[index]._head
Local entry:Object
Rem
the old for eachin variety of the loop, replaced below by the hand rolled loop. Uncomment this and comment the
the below code to see for yourself the speed difference it makes (only noticeable when doing 1000's of operations though)
For Local entry:gHashEntry=EachIn _table[index]
If(entry.name=name)
Return(entry.obj)
EndIf
Next
EndRem
'Print "_table[index].count() =" + _table[index].count()
For Local iter = 0 Until _table[index].count()
'handrolling the loop as it turns out to be much faster than a For..Eachin
entry = link._succ._value
'do the keys match?
If gHashEntry(entry).name = name
Return gHashEntry(entry).obj
EndIf
link = link._succ
'Return(entry.obj)
'EndIf
Next
Return Null
EndMethod
Method getMultipleEntries:Object[](name:String)
Local index:Int = genIndex(name)
Local retarray:Object[] = New Object[getEntryCount(index)]
Local objectcount:Int
For Local entry:gHashEntry=EachIn _table[index]
If entry.name = name
retarray[objectcount] = entry.obj
objectcount:+ 1
EndIf
Next
'resize the array to objectcount elements.
retarray = retarray[..objectcount]
Return retarray
EndMethod
Rem
The below are leftovers from a variety of tests. I figured someone might find these of interest. Anyhow, except in
certain special situations, I found them to be slower than the code I'm using for getMultipleEntries. Feel free
to experiment though.
Method getMultipleEntries2:Object[](name:String)
Local index:Int = genIndex(name)
Local objectcount:Int
For Local entry:gHashEntry=EachIn _table[index]
If entry.name = name
objectcount:+ 1
EndIf
Next
Local retarray:Object[] = New Object[objectcount]
objectcount = 0
For Local entry:gHashEntry = EachIn _table[index]
If entry.name = name
retarray[objectcount] = entry.obj
objectcount:+ 1
EndIf
Next
'resize the array to objectcount elements.
'MemCopy( retarray, retarray, SizeOf(retarray[0]) * objectcount)
'retarray = retarray[..objectcount] 'above method is faster
Return retarray
EndMethod
EndRem
Rem
Method getMultipleEntries3:Objwrapper(name:String)
Local index:Int = genIndex(name)
Local retarray:Object[] = New Object[getEntryCount(index)]
Local objectcount:Int
Local wrapper:objwrapper = New objwrapper
For Local entry:gHashEntry=EachIn _table[index]
If entry.name = name
retarray[objectcount] = entry.obj
objectcount:+ 1
EndIf
Next
'resize the array to objectcount elements.
'MemCopy( retarray, retarray, SizeOf(retarray[0]) * objectcount)
wrapper.objarray = retarray
wrapper.length = retarray.length
Return wrapper
EndMethod
EndRem
'simply a slightly faster version of removeNamed. Faster because the index is provided
'and doesn't need to be generated. This method is called by insertUniqueName
Method internal_removeNamed( name:String, index:Int )
For Local entry:gHashEntry = EachIn _table[index]
If(entry.name = name)
entry.link.Remove()
EndIf
Next
EndMethod
Method removeNamed:Int( name:String ) 'returns the # of objects removed
Local index:Int = genIndex(name)
Local remove_count:Int = 0
For Local entry:gHashEntry = EachIn _table[index]
If(entry.name = name)
entry.link.Remove()
remove_count:+ 1
EndIf
Next
Return remove_count
EndMethod
Method internal_removeNamedObject( obj:Object, index:Int )
For Local entry:gHashEntry = EachIn _table[index]
If(entry.obj = obj)
entry.link.Remove()
EndIf
Next
EndMethod
Method removeNamedObject:Int( name:String, obj:Object ) 'returns # removed
Local index:Int = genIndex(name)
Local remove_count:Int = 0
For Local entry:gHashEntry = EachIn _table[index]
If(entry.obj = obj)
entry.link.Remove()
remove_count:+ 1
EndIf
Next
Return remove_count
EndMethod
'this loops through every bucket in the hash table
Method removeObject:Int( obj:Object ) 'returns # removed
Local remove_count:Int = 0
For Local iter = 0 Until _table.length
For Local entry:gHashEntry = EachIn _table[iter]
If entry.obj = obj
entry.link.remove()
remove_count:+ 1
EndIf
Next
Next
Return remove_count
EndMethod
Method removeAllEntries()
For Local n:Int=0 Until _capacity
_table[n].clear()
Next
EndMethod
Method getEntryCount:Int(index:Int)
If (index >= 0) And (index < _capacity)
Return _table[index].count()
EndIf
EndMethod
EndType
Type gHashEntry
Field name:String
Field obj:Object
Field link:TLink
Method Compare(pHashEntry:Object)
'Try
?debug
Print "gHashEntry.compare called"
?
' Print "Compare result: " + obj.Compare2(gHashEntry(pHashEntry).obj)
Return obj.Compare(gHashEntry(pHashEntry).obj)
'Catch ex:Object
'EndTry
EndMethod
EndType
'--------------------------------------------'Test Section Below--------------------------------------------
'these are included just to demonstrate the usage and speed of the table. Comment them out at your leisure.
'-----------------------------------------------------------------------------------------------------------
Global HashTable:THashTable = THashTable.Constructor( 1024 )
Global testint:intobj = New intobj
Global tempint:Int
Global str1$
Global str2$
Global obj:Object
'this type is used as a wrapper for ints allowing them to be treated as objects (and thus stored in the hash table.)
Type intobj
Field val:Int
EndType
'str1 = our key string, str2 = our object to be inserted.
str1 = "Strength"
str2 = "3"
testint.val = 55
Hashtable.insertEntry( str1, str2)
'----------------------Test: 900,000 lookups and assignments
Local starttime:Int = MilliSecs()
For Local iter = 0 To 900000
obj = Hashtable.getentry( str1 )
'the following assignment works because strings are objects. If you wanted to assign a numeric 4 you'd have to wrap it
'in an object (as is done below with the intobj type.)
obj = "4"
Next
Local endtime:Int = MilliSecs()
Print "~n"
Print "*********************************************************************************************"
Print "Millisecs to execute 900,000 lookups and assignments:" + (endtime - starttime)
Print "*********************************************************************************************"
'-------------------------------------------------------------------
'clear the table for the next test.
HashTable.RemoveAllEntries()
'---------------Test: Insert 1024 entries into a hash table of size 1024 to see then print out the
'---------------distribution list to see how well the hash algorithm is working
Print "~n"
Print "*********************************************************************************************"
Print "The following is a list of the # of entries per hash table element given 1024 entries in a hash table with 1024 elements."
Print "The string used as a key is the following: ~qc:\documents And settings\random\testing\directory\~q plus up to 8 random letters."
Print "This test should prove that the distribution, even given extremely similar strings, is still quite uniform."
Print "The format of the list below is N:x where X is the element # in the hash table (0-1024 here since the table has 1024 elements."
Print "And the entry count is the # of entries stored at that hash element (ideally there will be as few duplicates as possible.)"
Print "If you are interested try altering the hash algorithm and see how it affects the distribution and duplication count."
Print "Also note that I'm not currently seeding the random generator anywhere, so you'll get the same results each time even though"
Print "in a real world situation the results will vary (possibly quite a bit) with a different seed in place, so feel free to seed the random"
Print "generator to get more real world results. That, or try altering the test string that is used in the key - this can lead to drastically different"
Print "results as well, and a wide range of tests are needed to verify that the algorithm is efficient with it's distribution given a wide range of inputs."
Print "My tests have shown the algorithm above to be better than those I've commented out (and many others I tried). YMMV."
Print "*********************************************************************************************"
Global teststr:String
'generate and insert the entries. We start with the base string and add up to 8 random characters before inserting.
For Local iter = 0 Until 1024
teststr = "c:\documents and settings\random\testing\directory\"
'generate up to 8 random characters to add to the test string
For Local iter2 = 0 Until 8
If Rand(2) = 1
teststr:+ Chr(Rand( 65,125 ))
EndIf
Next
HashTable.insertEntry( teststr, testint )
Next
'this loops through the entire hash table, counting the # of objects stored at each index and also the total # of
'duplicates. A duplicate is equivalent to a hash collision - a duplicate ID generated from different inputs.
Local duplicatecount:Int
For Local iter6 = 0 Until hashtable._table.length
Print "N:" + iter6 + "....entry count:" + HashTable._table[iter6].count()
If HashTable._table[iter6].count() > 1
duplicatecount:+ HashTable._table[iter6].count() -1
EndIf
Next
Print "number of duplicates:" + duplicatecount
'clear the table for the next test.
HashTable.RemoveAllEntries()
'Insert 1024 entries with the key "testXX" where XX is a number from 0-49. The object inserted is the string "entryXX" where
'XX is again a number from 0-49.
Print "~n"
Print "*********************************************************************************************"
Print "Inserting 1024 entries with the key ~qtestXX~q where XX is a number from 1-50. The Object inserted is the String ~qentryXX~q where"
Print "XX is again a number from 1-50 that may or may not match the key number."
Print "*********************************************************************************************"
Print "~n"
'insert 1024 entries with a key of testXX where XX = a number from 1-50. Obviously, there will be multiple entries
'for any given key.
For Local iter = 0 Until 1024
tempint = Rand(50)
HashTable.insertentry("test" + tempint, "entry" + Rand(50) )
Next
Print "*********************************************************************************************"
Print "Retrieving multiple entries for the key ~qtest25~q. The entry #'s were randomly generated during the insertion and are from 1-50."
Print "*********************************************************************************************"
'this is how you retrieve multiple entries for a given key! The entries are returned as an array of
'objects that you'll need to cast before using.
Local objarray:Object[]
objarray = HashTable.getMultipleEntries( "test25" )
For Local objiter = 0 Until objarray.length
'since we know the object stored is a string, we cast to string and then print.
Print String(objarray[objiter])
Next
Print "~n"
Print "*********************************************************************************************"
Print "The following is the time it took to call ~qgetMultipleEntries~q 900,000 times."
Print "*********************************************************************************************"
'This just demonstrates the speed of "getMultipleEntries." It is considerably slower than getEntry() - especially
'when there are many duplicate entries, but is obviously necessary if you need to retrieve more than one entry.
starttime = MilliSecs()
For Local iter5 = 0 Until 900000
objarray = HashTable.getMultipleEntries( "test25" )
Next
Print "Millisecs:" + (MilliSecs() - starttime)
Print
'And now to prove that InsertUniqueName() works, we'll call it with the key "test25" which we've already shown
'above to have multiple entries stored. The entries that were stored with the key "test25" will be deleted and
'only the entry inserted with "InsertUniqueName()" will exist.
Print "*********************************************************************************************"
Print "Now calling Hashtable.InsertUniqueName( ~qtest25~q, ~qthis is the only entry that exists now!~q)."
Print "This function will guarantee that the key provided - in this case ~qtest25~q will only have one unique entry"
Print "We know from the above test that there are currently multiple entries stored under the key ~qtest25~q - they"
Print "will be deleted by this function and replaced by the object provided in this function call."
Print "*********************************************************************************************"
Print "~n"
Hashtable.insertUniqueName( "test25", "this is the only entry that exists now!" )
objarray = HashTable.getMultipleEntries( "test25" )
For Local iter4 = 0 Until objarray.length
Print "The object at key ~qtest25~q:" + String(objarray[iter4])
Next
Print "So it is clear that insertUniqueName does indeed work."
Print "~n~n"
Print "***Scroll to the top to read all the test results!***"
'------------------------------------------------deprecated--------------------------------------------
'commented the lines below out because the objwrapper type is only used in a version of getMultipleEntries that I was testing
'but decided not to go with.
Rem
Global wrapper:objwrapper = New objwrapper
Type Objwrapper
Field objarray:Object[]
Field length:Int
EndType
'Version 3 is faster when there are few duplicates
Rem
Print "Time b. v3:" + MilliSecs()
For Local iter = 0 Until 900000
wrapper = HashTable.getMultipleEntries3( "test25" )
Next
Print "Time a. v3:" + MilliSecs()
'version 1 is faster when there are multiple duplicates
Rem
Print "Time b. v1:" + MilliSecs()
For Local iter = 0 Until 900000
objarray = HashTable.getMultipleEntries( "test25" )
Next
Print "Time a. v1:" + MilliSecs()
EndRem
'--------------------------------------------------------------------------------------------------------- |
Comments
None.
Code Archives Forum