Interesting map performance 1 big vs many small

Monkey Forums/Monkey Programming/Interesting map performance 1 big vs many small

Skn3	(Posted 2014) [#1]

I wanted to know about the performance of a single large map vs multiple smaller maps.

Import mojo

Const ITEM_TOTAL:Int = 100000
Const FETCH_TOTAL:Int= 10000

Function Main:Int()
	RunTest(ITEM_TOTAL,1,10000)
	RunTest(ITEM_TOTAL,10,10000)
	RunTest(ITEM_TOTAL,20,10000)
End

Function RunTest:Void(itemTotal:Int,groupTotal:Int,fetchTotal:Int)
	Local store:= New StringMap<StringMap<Item>>
	Local itemId:String
	Local item:Item
	Local itemIndex:Int
	Local groupId:String
	Local lastGroupId:string
	Local group:StringMap<Item>
	Local groupIndex:Int
	
	Local perGroupTotal:Int = itemTotal / groupTotal
	
	'create data
	Local timestamp:Int = Millisecs()
	
	Print "---"
	Print "running test: fetching "+fetchTotal+" items from "+groupTotal+" groups each containing "+perGroupTotal+" items (total items: "+itemTotal+")"
	Print "creating data..."
	For groupIndex = 0 Until groupTotal
		'create the group
		groupId = "group_"+groupIndex
		group = New StringMap<Item>
		store.Insert(groupId,group)
		
		'create teh groups items
		For itemIndex= 0 Until perGroupTotal
			itemId = "item_" + itemIndex
			group.Insert(itemId, New Item(itemId))
		Next
	Next
	Print "took " + (Millisecs() -timestamp) + " ms"
	Print ""
	
	'fetch data
	Print "fetching " + fetchTotal + " items..."
	timestamp = Millisecs()
	For Local index:= 0 Until fetchTotal
		groupId = "group_" + Int(Rnd(0, groupTotal - 1))
		itemId = "item_" + Int(Rnd(0, perGroupTotal - 1))
		
		'can we reuse the previous group?
		If groupId <> lastGroupId
			group = store.ValueForKey(groupId)
		Endif
		
		'fetch item from group
		item = group.ValueForKey(itemId)
		lastGroupId = groupId
	Next
	Print "took " + (Millisecs() -timestamp) + " ms"
	Print ""
	Print ""
End

Class Item
	Field id:String
	
	Method New(id:String)
		Self.id = id
	End
End

I tested this on Windows, iPhone5 and HTML5

On Windows:

---
running test: fetching 10000 items from 1 groups each containing 100000 items (total items: 100000)
creating data...
took 96 ms

fetching 10000 items...
took 15 ms

---
running test: fetching 10000 items from 10 groups each containing 10000 items (total items: 100000)
creating data...
took 85 ms

fetching 10000 items...
took 14 ms

---
running test: fetching 10000 items from 20 groups each containing 5000 items (total items: 100000)
creating data...
took 81 ms

fetching 10000 items...
took 14 ms

On iOS:

---
running test: fetching 10000 items from 1 groups each containing 100000 items (total items: 100000)
creating data...
took 1304 ms

fetching 10000 items...
took 146 ms

---
running test: fetching 10000 items from 10 groups each containing 10000 items (total items: 100000)
creating data...
took 1103 ms

fetching 10000 items...
took 146 ms

---
running test: fetching 10000 items from 20 groups each containing 5000 items (total items: 100000)
creating data...
took 1052 ms

fetching 10000 items...
took 148 ms

On HTML5:

---
running test: fetching 10000 items from 1 groups each containing 100000 items (total items: 100000)
creating data...
took 197 ms

fetching 10000 items...
took 18 ms

---
running test: fetching 10000 items from 10 groups each containing 10000 items (total items: 100000)
creating data...
took 122 ms

fetching 10000 items...
took 15 ms

---
running test: fetching 10000 items from 20 groups each containing 5000 items (total items: 100000)
creating data...
took 114 ms

fetching 10000 items...
took 15 ms

It seems that generally speaking (on these targets at least) that 1 large map is the way to go for ease of use vs performance.

computercoder

(Posted 2014) [#2]

That is pretty interesting and nice to know SKN3 :) Thanks for posting!

SLotman

(Posted 2014) [#3]

The bonus of having 1 large resource is to avoid texture swapping, which kills the frame rate on everything 3d. Drawing should be much faster with 1 big map, than several small ones.

Nobuyuki

(Posted 2014) [#4]

This makes sense to me. The way lookups are performed on a Map is a red-black tree binary search operation. One big map means that the data is ordered the way the data structure wants it to be for fastest access. By splitting the data up into multiple smaller Maps, what you're doing is essentially forcing multiple lookups.

Insertion to a Map has the same cost as pulling from a map, because of the way a Map is organized. That cost is O(log n). So maybe you're speeding up the insertion time by using multiple smaller maps, but why would you want to do that?

Skn3	(Posted 2014) [#5]

Hmm well the results seem to favour multiple smaller lists fractionally, so Id imagine in some cases it might be more suitable. Considering that it is doing two requests per lookup instead of one, that is interesting that it still is quicker.... also interesting that the difference is negligable enough not to worry about either way for most things.

Speeding up anything related to potentially large chunks of data is important. If you shave even just 16 ms by using multiple maps you are giving yourself and extra 1fps to do other stuff with...

maltic

(Posted 2014) [#6]

How is this surprising? I would expect them to always perform within a constant factor of each other. Think about it in terms of asymptotics: You're adding a constant factor (lets call it 'm') which is the number of maps. Each map has exactly n/m items in it (here n is the total number of items added to all maps). Hence your lookup is tightly bounded O(lg m + lg n/m), removing all constant factors trivially reduces to O(lg n). Insertion is no different. It only seems better because in this synthetic example you are removing the lg m lookup (still a constant factor mind you), since you create groups with complete foreknowledge of the contents--you never need to 'find' which map to insert into.

Remember that if you really need to shave off a few milliseconds, you could try using a Hashtable for large maps.

muddy_shoes

(Posted 2014) [#7]

I agree with maltic. What you're seeing isn't about the maps. You're seeing the effects of other decisions. Primarily I think you're seeing the impact of the way you've defined your keys. Your keys are strings, formatted like this:

"group_"+groupIndex
"item_"+itemIndex

The first thing to point out is that if you insist on using string keys you'd see a far greater performance boost than your splitting is getting you simply by ensuring that the bit that is different is at the front: groupIndex+"_group" is going to be much faster in comparison than what you have. Using an Int is going to be much faster again and if you need speed then forgoing human readable keys for numbers should be the first choice.

The second thing is that if you create a string with numbers from 0 to 10 they are going to have fewer characters on average than strings using numbers from 0 to 10000. By splitting your maps you're also reducing the length of the keys. This makes the comparisons generally faster too.

You also introduce a layer of caching for your group lookup, which is a legitimate possibility with the split scheme (although probably of minor benefit), but doesn't make for a straight comparison.

I took your code, changed the keys to ints and set up 100,000 items in 1, 10, 100, 1000, 10,000 and 100,000 groups, removed the caching bit and then ran the test 10 times and took the average fetch times:

Here's a normal HTML5 (chrome) run:

Avg for 1 groups: 101.4
Avg for 10 groups: 113.6
Avg for 100 groups: 113.4
Avg for 1000 groups: 119.5
Avg for 10000 groups: 103.3
Avg for 100000 groups: 136.6

Flash:

Avg for 1 groups: 217.3
Avg for 10 groups: 250.5
Avg for 100 groups: 227.8
Avg for 1000 groups: 259.5
Avg for 10000 groups: 244.5
Avg for 100000 groups: 279.8

GLFW:

Avg for 1 groups: 62.100000000000001
Avg for 10 groups: 57.700000000000003
Avg for 100 groups: 62.399999999999999
Avg for 1000 groups: 72.200000000000003
Avg for 10000 groups: 79.099999999999994
Avg for 100000 groups: 87.400000000000006

I doubt the other platforms are much different.

skid	(Posted 2014) [#8]

Surely the Find pass has to search all the buckets? Searching a smaller group at random does not compute.

muddy_shoes

(Posted 2014) [#9]

That's not what Skn3 is doing. He's imagining a scenario where you assign both a bucket key and an individual key. Of course if you really wanted to do that you'd be better off assigning a bucket index and just having an array of maps. That might actually give you a speedup.

Skn3	(Posted 2014) [#10]

Well it was interesting to me as I assumed that performing lookups on a very large singular map would be much slower. I also assumed that performing lookups on smaller maps would be quicker, but when adding in the second lookup it would be slower.

The cache thing I had was simply so that the code I wrote would work with both single map and split map examples. The cache effectively means we lookup the map once and then never again because map "group_"+Rnd(0,TOTAL_GROUPS-1) will always be "group_0" (.. for the TOTAL_GROUPS = 1 example)

I try not to use maps so my experience with them is fairly limited (hence trying to figure things out). I am using them to load images by filepath and cache by the same filepath. I will then skip loading them again if the filepath exists in the cache. Very simple stuff, but it is for a generic cached object type. I wanted to investigate if it was worthwhile having 1 global cache pool for all classes that extend the generic cached object or to have 1 map per extended class.

I didn't know about the order of the key string actually making a difference to lookup speed. Seems obvious now you said it but very helpful!

removing all constant factors trivially reduces to O(lg n). Insertion is no different. It only seems better because in this synthetic example you are removing the lg m lookup (still a constant factor mind you), since you create groups with complete foreknowledge of the contents--you never need to 'find' which map to insert into.

Not sure what you mean here, if I do groups.ValueForKey("group_3") it still has to actually find that group within the map? Should I be testing invalid keys instead to force a complete rescan of the groups map each iteration?

muddy_shoes

(Posted 2014) [#11]

Maltic is pointing out that the split gives you a performance advantage in insertion because you never look for the bucket via the map, you just assign it directly. That gives an obvious advantage as it's just skipping work that the single map has to do. As I mentioned above if you actually had a real situation where you could just assign a bucket id you'd use an array, not a map, to hold your buckets.

Skn3	(Posted 2014) [#12]

Oh right I see what you mean.

So the idea is that I want to have one object caching/lookup system to rule them all. I would ideally create a base object class that can be extended. The base class would have functions for caching and retrieving objects.

So what would be the most efficient way with the following factors:
- infinite number of extended class types that can be inserted into a cache/lookup by "id"
- id is always a string but is user defined. So might be a filepath, a number, a level name, etc
- generally you would expect there not to be 10's of thousands of objects.. at a push a thousand per cache
- need to sometimes lookup indexes say for example when you load a file
- need to sometimes lookup regularly but hopefully not often
- the cached items can sometimes be very static but for another extended class would be changing often

Essentially I am looking for a one size fits all idea...

What would you suggest?

Nobuyuki

(Posted 2014) [#13]

What would you suggest?

A solution tailored to the situation ;)

I feel like knowing how and where to use maps in general would reduce most people's need for an uber-generic object lookup system that would just use maps underneath anyway. Now, if you were to use some other sorta thing (splay tree? Hash map?), that might be different.

Skn3	(Posted 2014) [#14]

Hah very helpful ;)

Well my main usage for the maps is to cache file paths so I am not sure the implications of stuff like hash collisions and limits to the length of the key. Don't really have knowledge of that kind of stuff. Anyone care to enlighten me :D ?

The advantage of wrapping the map in a generic object is that you are free to change how the map stores/retrieves data in the future. If the maps end up being a bottleneck then swap it out for something better. Could even have multiple data storage methods wrapped under one container class, that would be quite handy...

Local cache := new Cache<Item>(CACHE_HASHMAP)
Local cache := new Cache<Item>(CACHE_STRINGMAP)
Local cache := new Cache<Item>(CACHE_ARRAY)
Local cache := new Cache<Item>(CACHE_LIST)
Local cache := new Cache<Item>(CACHE_ETC)
cache.Add(item)
...

Skn3	(Posted 2014) [#15]

Local cache := new Cache<Item>(CACHE_HASHMAP)
Local cache := new Cache<Item>(CACHE_STRINGMAP)
Local cache := new Cache<Item>(CACHE_ARRAY)
Local cache := new Cache<Item>(CACHE_LIST)
Local cache := new Cache<Item>(CACHE_ETC)
cache.Add(item)
...

muddy_shoes

(Posted 2014) [#16]

need to sometimes lookup indexes say for example when you load a file

What index? Do you mean the map key?

the cached items can sometimes be very static but for another extended class would be changing often

What do you mean here? Do you mean the item will be removed and another item added with the same key?

Your example usage code doesn't really make sense. You repeatedly define cache and then assign an item to it. I don't think that's what you intend.

Skn3	(Posted 2014) [#17]

Yes map key, not index.

I mean one type of object lets say MyImageSource might use the cache where the items within the cache change rarely. Another type of object say MyDatabaseRecord might use a separate cache where the items within change often. So I was trying to just test out some aspects of the map to see if it would be worthwhile storing all types of object in one giant universal cache, or have an individual cache per class.

The reason for this is I want to be able to write new runtime object classes that extend from this cached base class. Each of my classes has a custom MyClass.Create() initialiser function which handles stuff like fetching "new" objects from the dead pool, reusing previously cached object, reference counting the number of times the object is used or just flat out going crazy and creating a brand new instance of the class.

e.g.

Class ImageSource Extends SharedObject
	Function Create:ImageSource(path:String)
		' --- create image source ---
		'check to see if a source already exists
		Local source:= cache.Find(path)
		
		'if source exists we can return that
		If source
			'ref count the source
			source.Use()
			
			'return
			Return source
		EndIf
		
		'ok we need to create a new image source
		'this will make a "new" object or reuse a dead one
		source = pool.Make()
		
		'setup
		source.OnSetupImageSource(path)
		
		'perform creation
		source.OnCreate()
		
		'return
		Return source
	End
End

Id be interested to know if in this particular scenario its worth storing the "source" objects in a hashmap, stringmap or something else?

muddy_shoes

(Posted 2014) [#18]

For something with a natural unique key like a URL then a StringMap is fine, assuming a reasonable number of resources and that you don't do something silly like search for the image for every object in every frame.

The idea of a single cache for everything with free form keys isn't a good idea from a performance standpoint (or of any practical purpose?). Any map will take longer to search the larger the population of elements. The point made above isn't that smaller maps aren't faster it's that splitting the search space across multiple maps stored in another map isn't going to be faster. I'm not sure I see the value in creating a generalised caching class to be extended rather than just adding a suitable map to classes that need it but if you're doing it then a separate cache per class is the way to go.

Nobuyuki

(Posted 2014) [#19]

@muddy_shoes
took a long time to come back around to that point, didn't it? :D

on a semi-unrelated note about caching, I still think that Monkey could use a good splay tree implementation. I'll have to think about putting that on the list of things to do when bored....

maltic

(Posted 2014) [#20]

Splay tree are amazing things. Splay sort can give you almost transcendental performance on any kind of partially ordered data, and when I say ordered, I don't mean sorted. 1, 99, 2, 98... etc can be sorted in guaranteed O(n) by Splay sort! They are great for database, or anything needing to cache regularly accessed items. Splay trees, however, are NOT good for real time applications because they are amortized data structures in the truest sense. This means that over any number of insertions or queries, they will always average O(lg n). However, a single query can take up too O(n). Not great if you don't want to drop frames every now and again. This is not a matter of chance or luck. Splay trees will ALWAYS have O(n) look-ups or insertions, and thus your game will ALWAYS have latency spikes. If you want to use one while loading, however, I'd say its a great idea. I just wouldn't use one in my main loop.

For look-ups in your main loop Monkey's Map is great. It may be a bit slow, but it always performs consistently. and predictably..Having a constant but lower framerate looks better than having a high but variable one. Hashmaps can perform much faster, and indeed do not get slower as you add more and more items (they always average O(1) look-ups). They have a nasty risk in their probabilistic underpinnings though. It is possible to get slow O(n) look-ups if you aren't careful with your hash functions.

If you are doing only String look ups, I think you might find that a radix trie will handily beat pretty much anything, unless you have a very optimized Hashtable implementation (which incidentally isn't possible in monkey due to its high level nature).

Just my 2 'cents'.

Skn3	(Posted 2014) [#21]

took a long time to come back around to that point, didn't it? :D

probably not helped in part by my lazy use of terminology confusing matters.. "key" .. "index"... "lettuce"... which is it??

I didn't even know there were so many solutions to map style data storage!

http://en.wikipedia.org/wiki/Radix_tree

This does sound very good...

muddy_shoes

(Posted 2014) [#22]

I'd encourage you not to get hung up on finding a perfect map datastructure. Real world radix tree performance, for example, won't be enormously different to the existing StringMap implementation as long as you don't create a load of keys that have a common prefix (like you did, ahem). Many of the more esoteric data structures are esoteric for a reason and the academic CS concept of algorithmic efficiency doesn't necessarily translate to actual measured implementation speed.

@maltic: What you say about the potential for splay trees to have performance spikes is true, however it's not true that it's inevitable that you'll get an n-depth tree at some point. The shape of the tree depends on the access pattern. If your accesses are mostly hitting the same items (which is why you'd use it) then it may never happen. It occurs if you run through the data in an ordered manner. I'd also suggest that for a tree of a size used in a game it's probably unlikely that an occasional linear search would be much of a big deal anyway, which is a two-edged sword as it also brings into question whether you'd even notice the benefit from the splay tree working well in the first place.

maltic

(Posted 2014) [#23]

muddy_shoes is correct on all counts. I dissembled slightly (but in the best interest of the game programmer) when I talked about Splay trees. Let me elaborate: you are guaranteed to see that O(n) performance splay operation IFF (if and only if) you access every element in the Splay Tree. The use of a magic potential function in the amortized proof by Sleator and Tarjan implies this. Basically for the math to work out, and the cost of k splays to be O(k lg n), you need a linear time splay in there. The order doesn't matter, since nothing happens to a Splay Tree between queries anyway, so any access pattern is sequential.

However, as muddy_shoes says, I don't think there is any point trying to find a perfect Map. You might see a slight improvement from a Radix Trie or a Hashmap. But at the sizes used in your regular game atlas, we're talking about microseconds. May as well spend your time making something Monkey really needs (you've already made some excellent Modules, some of which I use regularly!), than shaving a few microseconds off an already competent built-in data structure.

Nobuyuki

(Posted 2014) [#24]

We had a good conversation.