DrawSubImageRect performance?

BlitzMax Forums/BlitzMax Programming/DrawSubImageRect performance?

GfK	(Posted 2012) [#1]

Just done some basic testing with this, comparing:

1. an image loaded with LoadAnimImage, drawn with DrawImage.
2. an image loaded with LoadImage, drawn with DrawSubImageRect.

I'm just wondering how DrawSubImageRect works, as the source code doesn't reveal an awful lot (to me).

Is it considered safe/good practice to draw images of odd sizes (i.e. 153x67) with DrawSubImageRect? Or are they internally resized to base 2 dimensions anyway? Any known issues with old hardware?

[edit] speed-wise, there's very little in it. Initially I thought method 2 was quicker (til I spotted a bug in my fps code!). If this is a nice way of avoiding having to keep everything base 2 though, it's worth it.

Last edited 2012

Yan	(Posted 2012) [#2]

It's the same as the single surface stuff you've probably used with B3D.

Multiple quads sharing a single texture with their UVs adjusted to use only the relevant section (each sub-image).

This means that the size and shape of the quad (sub-image) is pretty much irrelevant. It's the size of the main texture that counts.

Google 'texture Atlas' for more info.

Before the built in stuff was available I, as many other have done I'm sure, rolled my own system and, TBH, it really didn't seem to make much difference. Bear in mind that I didn't exactly carry out extensive testing so your mileage may vary. Maybe you'd need to be throwing around thousands of 'sprites' for this to come into it's own?

Last edited 2012

Last edited 2012

GfK	(Posted 2012) [#3]

Right, so weirdy-shape safe, then!

Thanks.

CyBeRGoth

(Posted 2012) [#4]

Aye it basically stores the texture offsets and dimensions so when you draw you are drawing a small part of a large image, there are no texture swaps between draws which is what makes it quick, there is no overhead (in either speed or memory) if you want to draw non pow2 images that you have grabbed, as long as the actual spritesheet is a pow2 size.

To really test if a spritesheet is quicker you would have to load in something like 10 different sprites (seperate images not a sprite sheet) then draw hundreds or thousands of them (enough so that in starts to slow on your system) then repeat the same experiment with sprites grabbed from one spritesheet, you should be able to display more with the spritesheet without it slowing.

Last edited 2012

ImaginaryHuman

(Posted 2012) [#5]

It boils down to texture coordinates. When drawing a full texture `image` the texture coords will range from 0,0 to 1,1 ie covering the full texture, mapped onto the vertices of a quad of geometry (which is actually 2 triangles). When you draw a `sub image`, you are just adjusting the texture coordines so that the corners of your quad are referencing a smaller portion of that texture. And like others have said, there are VERY good reasons to do this. Without even testing it, it is highly recommended (as in Unity) that you put as many sprites/tiles into as few actual images/textures as possible because every time Blitz has to tell the hardware to switch (bind) a new image/texture there is significant overhead. You can only usually do that around a few hundred times per frame on desktop machines before it severely affects the framerate. Yet when you minimize texture swaps using larger texture sheets and `subimagerect` you should be able to draw a tonne more stuff. This is why people on the Unity forums have been having an absolute spaz about the built-in GUI being `slow` because it originally had no sprite-sheet-based back-end and instead introduced a new `draw call` for every GUI element, which became slow really quickly (now fixed in Unity 4).

Someone here came up with some kind of graphics driver thing that automatically batched sprites into sprite sheets, you may want to search for it. It's also useful to use some kind of vertex arrays because every time you draw an image or a portion of an image, it is working in `immediate mode` instructions, ie it issues lots of API function calls for every image you draw in sequence, and doesn't optimize at all. If your geometry is static/non animating or even if it is, you will see at least 200% speed increase on most systems using vertex arrays. Again someone came up with a driver thingy for this somewhere around here.

As far as OpenGL/DX goes, basically there is absolutely zero difference between drawing the full image versus drawing part of it. The only thing that varies are the texture coordinates. They should render exactly the same speed, if the caching and number of pixels etc are the same. If one is slower it's due to some blitz overhead.

Last edited 2012

Midimaster

(Posted 2012) [#6]

This sounds very interesting... but is it true?

Can somebody code me a sample, where I can see the advantage?. I coded this on the fly with 999 different "images" compared with one image.

I think 999 of such images in one scene is quit realistic for an avarage 2D game, or?

There is a difference, but only if I compare the first loop with the second. From the second to all further loops there is no significant difference.

SuperStrict

Graphics 800,600
Global Bild:TImage[1000], BildTotal:TImage

;creating 1000 pictures and one big one
BildTotal=CreateImage(1024,1024)
For Local I%=1 To 999
	Bild[i]=CreateImage(Rand(80),Rand(600))
Next

Global Z%,Sum%

;this to compensate the first loop:
DrawImage BildTotal,Rnd(800),Rnd(600)
For local i%=1 To 999
	DrawImage Bild[i],Rnd(800),Rnd(600)
Next

Repeat
	Local i%,Zeit%
	Cls
	Zeit=MilliSecs()
		For i=1 To 999
			DrawImage Bild[i],Rnd(800),Rnd(600)
			Local w%=Rand(80)
			Local h%=Rand(600)
			'DrawSubImageRect BildTotal,Rnd(800),Rnd(600),W,H,Rnd(800),Rnd(600),W,H
		Next
	
	Zeit=MilliSecs()-Zeit
	Sum=Sum+Zeit
	Z=Z+1
	Print Zeit + " " + Int(Sum/Z)
	Flip 0
Until KeyHit(Key_Escape)

ImaginaryHuman

(Posted 2012) [#7]

DrawSubImageRect in your example is for me somewhere around 4-5 times faster than DrawImage. All I did was comment the DrawImage line and uncomment the other one.

I don't know why your first loop number is different, except that when I did a Flip 1 before the second loop it introduced a huge delay that threw your calculation off.

Either way, you have to compile and run your example twice, once with each drawing type, to get numbers you can compare.

On average the DrawImage was giving me 5-7 ms, while DrawImageRect was like 1-2.

ImaginaryHuman

(Posted 2012) [#8]

You should actually see an even bigger performance gain the more items you render.

Yan	(Posted 2012) [#9]

To really test if a spritesheet is quicker you would have to load in something like 10 different sprites (seperate images not a sprite sheet) then draw hundreds or thousands of them (enough so that in starts to slow on your system) then repeat the same experiment with sprites grabbed from one spritesheet, you should be able to display more with the spritesheet without it slowing.

Which is pretty much what I did...AFAIR...It was several years ago though and my brain has a tendency to leak, these days. ;o)

In theory, obviously with less state changes and surfaces being used, it *should* have been quicker, which is why I went to all the trouble in the first place. I just didn't see any major increases in FPS. However, this was using only GL and the original version of the DX7 driver. My video hardware at the time may have been a major factor too.

Ignoring the speed thing for a moment. Another factor to consider, especially if you're using lots of non ^2 images, is the potential vid memory savings over loading individual images.

Last edited 2012

TaskMaster

(Posted 2012) [#10]

When I first wrote ifsoGUI, I was using separate images for all of the pieces of the GUI objects. When I switched it to one image and used DrawSubImageRect, it increased the speed by about 25% since it didn't need to swap images as often. I really believe if you swap images a lot it has a huge impact.

GfK	(Posted 2013) [#11]

Been reading up on using a texture atlas. Very little info here, so I checked out some examples on MonkeyCoder, and I'm confused.

I found two examples there - in both, the images are grabbed from the atlas with LoadImage. This struck me as odd, since isn't the whole point of using an atlas, that you're keeping everything in one texture? But, LoadImage is creating a new one?? So it's even less efficient than just using LoadImage by itself.

Anyway - what I thought about was parsing the atlas XML data into a TMap, and drawing it by referencing the TMap keys, i.e.

DrawAtlasRect(myTMap.ValueForKey("myImage.png"),x,y,whatever)

Are TMaps quick enough for this?

[edit] Just done a quick and dirty test - this code shows the equivalent of over 3 million iterations per second. Obviously this is just pulling random numbers so I'm not factoring in the actual drawing time. Is this encouraging?

Strict

Local map:TMap = New TMap
Local n:Int

For n = 1 To 1000
	map.Insert(String(n), String(n * 100))
Next

Local s:Int = MilliSecs()
For n = 1 To 10000
	Local s:String = String(map.ValueForKey(String(Rand(1, 1000))))
Next
s = MilliSecs() - s
Notify "time taken: " + s

Last edited 2013

ImaginaryHuman

(Posted 2013) [#12]

Yes an atlas should be one texture and multiple quads or whatever geometry should use that texture and simply modify texture coordinates to tell where to pull the texels from it. You want to do as much rendering using that one texture as you can before you switch to another one.