Writepixelfast....way to slow
BlitzPlus Forums/BlitzPlus Programming/Writepixelfast....way to slow| 
 | ||
| Hi Blitzers! Im been doing some programming with the writepixelfast command and basicly it seems to be pretty slow, drawing 19200 dots is using about 10% of my processor time (running at 50fps) on my p4 2.0ghz + gforce fx5600 gfx card, which to me seems pretty excessive, basicly I was wondering if its possible to write directly to video memory? Im fairly certain It aint possible with just blitz+, but I was thinking would it be possible to write a dll in c++ to write some kind of replacement writepixelfast function. Bascily I wanna know if it would be possible to do? and if anybody has done anything like this before? how sucsesful is it? (in terms of speed increase, or decrease). Any help would be greatly appriciated! Ta Mr Brine | 
| 
 | ||
| Are you locking your graphics buffer first?  Try this and see what happens. Also try it full screen mode and compare the differences. Regards | 
| 
 | ||
| http://www.blitzbasic.com/Community/posts.php?topic=26589 | 
| 
 | ||
| I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++. | 
| 
 | ||
| Please Mark, give us more 2D power! fudge 3D! :P Pixel Power to the max! Give us more speed, ungodly speed :D ah well, may never happen, as they no longer care one dang bit about 2D, its all about 3D these days, for shame :( | 
| 
 | ||
| The LockedPixels command was suppose to keep you flatlanders happy. The bottle neck with the following is actually the slow PokeShort command in the UpdateBank function (move call to UpdateBank outside mainloop to see difference). 
; fast bank to backbuffer refresh
; by skidracer
Const DWIDTH=640
Const DHEIGHT=480
Global tick
Function CopyBankToBuffer(bank,buffer)
; lock the image for byte transfer
	LockBuffer buffer
; test locked imagebuffer is 16 bits per pixel
	If LockedFormat(buffer)<>1
		UnlockBuffer buffer
		Notify("this test is 16 bit display mode only")
		End
	EndIf
; cache buffer variables	
	imagebank=LockedPixels(buffer)
	pitch=LockedPitch(buffer)
; copy bank to image line by line	
	For y=1 To DHEIGHT
		CopyBank bank,srcoff,imagebank,destoff,DWIDTH*2
		srcoff=srcoff+DWIDTH*2
		destoff=destoff+pitch
	Next
	UnlockBuffer buffer
End Function
Function UpdateBank(bank)
	argb=tick
	tick=tick+1
	o=0
	For y=1 To DHEIGHT
		For x=1 To DWIDTH
			argb=x+y+argb
			PokeShort bank,o,argb 
			o=o+2
		Next
	Next
End Function
Graphics DWIDTH,DHEIGHT,16
bank=CreateBank(DWIDTH*DHEIGHT*2)
While Not KeyHit(1)
; calc fps
	fcount=(fcount+1)
	If fcount=20
		t=MilliSecs()
		fps#=20000.0/(t-ftime)
		ftime=t
		fcount=0
	EndIf
; update screen
	UpdateBank(bank)
	CopyBankToBuffer(bank,BackBuffer())
	Text 0,0,"fps="+fps
	Flip 0
Wend
End
 | 
| 
 | ||
| In my demo called Internal Disaster I do a Keftale rutine that does 307,200 WPF each frame with 40fps on my 2.4ghz GeforceMX. So its not that slow. Here is a link so you can see it in action. http://zac-interactive.com/demos/internal-disaster-demo.zip If you feel like spending a few minutes then check my website for the other demos I have made in 2D, one of which has a 512x512 pixel realtime rotozoom that runs about 38fps on my machine. Here's the website http://www.zac-interactive.com | 
| 
 | ||
| Thanks to everyone who posted a reply! I tried skid racers routine out on my computer and this seems way faster then the routine I was using, drawing 307200 in less then 9% of availalbe processor time!!! WOW :-) (with out the update bank function) or 27% (with update bank function), this a serious speed improvement. The only prob being that my game is running in 32 bit color mode. But Im sure with a little tinkering I'll solve this problem, so big up skid racer!!!!! youre the dude!!! Thanks Mr Brine | 
| 
 | ||
| here's the 32 bit version: 
; fast bank to backbuffer refresh
; by skidracer
Const DWIDTH=640
Const DHEIGHT=480
Global tick
Function CopyBankToBuffer(bank,buffer)
; lock the image for byte transfer
	LockBuffer buffer
; test locked imagebuffer is 32 bits per pixel
	If LockedFormat(buffer)<>4 
		UnlockBuffer buffer
		Notify("this test is 32 bit display mode only")
		End
	EndIf
; cache buffer variables	
	imagebank=LockedPixels(buffer)
	pitch=LockedPitch(buffer)
; copy bank to image line by line	
	For y=1 To DHEIGHT
		CopyBank bank,srcoff,imagebank,destoff,DWIDTH*4		
		srcoff=srcoff+DWIDTH*4
		destoff=destoff+pitch
	Next
	UnlockBuffer buffer
End Function
Function UpdateBank(bank)
	argb=tick
	tick=tick+1
	o=0
	For y=1 To DHEIGHT
		For x=1 To DWIDTH
			argb=x+y+argb
			PokeInt bank,o,argb 
			o=o+4
		Next
	Next
End Function
Graphics DWIDTH,DHEIGHT,32
bank=CreateBank(DWIDTH*DHEIGHT*4)
While Not KeyHit(1)
; calc fps
	fcount=(fcount+1)
	If fcount=20
		t=MilliSecs()
		fps#=20000.0/(t-ftime)
		ftime=t
		fcount=0
	EndIf
; update screen
	UpdateBank(bank)
	CopyBankToBuffer(bank,BackBuffer())
	Text 0,0,"fps="+fps
	Flip 
Wend
End
 | 
| 
 | ||
| Thanks Skid Racer, If I ever finish me game I'll be sure to credit you for youre help! | 
| 
 | ||
| Hey, learn something new every day... I didnt even know about those lockedpitch etc commands...sweet! Nice Amiga style Demo Skidracer.. I used to love downloading those ;) Skully | 
| 
 | ||
| Great code Skidracer. | 
| 
 | ||
|  I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++.  If you need raw speed, what are you doing using a BASIC dialect anyway? Right tool for the right job, dude. | 
| 
 | ||
| Whoa, Simon -- nice! | 
| 
 | ||
|  If you need raw speed, what are you doing using a BASIC dialect anyway? Does that even make sense to you? | 
| 
 | ||
|  Does that even make sense to you?   Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC? | 
| 
 | ||
|  Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC?  Surely it depends on the *compiler*, not the language. I will argue that equally badly written compilers will compile equally slow code, regardless of the language. The reason C is percieved as a 'fast' language is because for the better part of 20 years, everyone who's written a thesis in compiler design, has done so using C (and more specificly the UNIX cc or GNU gcc) as a point of reference, not because the language is inherintly 'faster'. Oh, and by the way, every modern BASIC I can think of is 'natively compiled'. | 
| 
 | ||
| Actually, if you use an array the FPS is almost as fast as when you take UpdateBank() out of the loop. 
; fast bank to backbuffer refresh 
; by skidracer 
Const DWIDTH=800 
Const DHEIGHT=600 
Global tick 
Graphics DWIDTH,DHEIGHT,32 
Dim image(DWIDTH,DHEIGHT) 
While Not KeyHit(1) 
	; calc fps 
	fcount=(fcount+1) 
	If fcount=20 
		t=MilliSecs() 
		fps#=20000.0/(t-ftime) 
		ftime=t 
		fcount=0 
	EndIf 
	; update screen 
	UpdateArray() 
	CopyArrayToBuffer(BackBuffer()) 
	Text 0,0,"fps="+fps 
	Flip 
Wend 
End 
Function CopyArrayToBuffer(buffer) 
	; lock the image for byte transfer 
	LockBuffer buffer 
	; test locked imagebuffer is 32 bits per pixel 
	If LockedFormat(buffer)<>4 
		UnlockBuffer buffer 
		Notify("this test is 32 bit display mode only") 
		End 
	EndIf 
	; cache buffer variables 
	imagebank=LockedPixels(buffer) 
	pitch=LockedPitch(buffer) 
	; copy bank to image line by line 
	
	For y=0 To DHEIGHT-1 
		yoff=y*pitch 
		For x=0 To DWIDTH-1 
			PokeInt imagebank,yoff+(x*4),image(x,y) 
		Next 
	Next 
	
	UnlockBuffer buffer 
End Function 
Function UpdateArray() 
	argb=tick 
	tick=tick+1 
	For y=0 To DHEIGHT-1 
		For x=0 To DWIDTH-1 
			argb=x+y+argb 
			image(x,y)=argb 
		Next 
	Next 
End Function 
 | 
| 
 | ||
| Arrays it is then! Now all we need is a userlib version of CopyArrayToBuffer... | 
| 
 | ||
| Hmm, that array version is 30 FPS slower here (55 vs 85)... I feel compelled to state that I'm temporarily (honest) using a GF2MX here. The CPU's an Athlon 2600... | 
| 
 | ||
| BlitzSupport: I've just noticed that in the array version I posted I left the resolution set to 800x600 whereas the origional was 640x480 which could account for the difference you are seeing. Although, when I tested I got about 26 FPS with banks at 640x480 and 38 FPS with arrays at 800x600. This is on a laptop with Rage Mobility and 1ghz P3. | 
| 
 | ||
| Good stuff, It should definitly be in the manual as well. No sense in hiding Blitz+ best features is there? I'm trying to figure out how to convert x and y to the location I need to plot into the bank. I thought it would be very easy, but I was wrong. I must be missing something obvious. Can anyone clue me in and put me out my misery? | 
| 
 | ||
| Simon S: It's in this bit of code: imagebank=LockedPixels(buffer) pitch=LockedPitch(buffer) ; copy bank to image line by line For y=0 To DHEIGHT-1 yoff=y*pitch For x=0 To DWIDTH-1 PokeInt imagebank,yoff+(x*4),image(x,y) Next Next For the "Y increment" you need to use the pitch (LockedPitch(buffer)) of the screen buffer instead of Graphicswidth()*4 as these two values can vary. This is the problem I ran into when first experimenting with LockedPixels. | 
| 
 | ||
| If you are poking directly into the buffer's bank you need to take into account the LockedPitch- PokeInt(lockedbank,y*pitch+x*4,argb) If you are poking into the user bank then use PokeInt(bank,(y*DWIDTH+x)*4,argb) For 16 bit change the *4's to *2. | 
| 
 | ||
| Sorry, Bryan, I didn't notice that it was in 800 x 600 -- doh! In fact, the two methods appear to be exactly the same speed here (hovering between 84-85 FPS)... | 
| 
 | ||
| you could speed it up a little more by replacing the *4's with "x shl 2" and the *2's with "x shl 1" where x is the value you wanna multipy | 
| 
 | ||
| On arrays, how about using fixed arrays? (square brackets, not curvy ones)  Does that make it any faster? I've never actually seen a massive speed increase with converting to SHL. I've suspected Blitz might optimise that out for you... | 
| 
 | ||
| Aren't Blitz arrays only available from within a type?  If so, I think it would be quite a bit slower. | 
| 
 | ||
| And as Mark T mentions, I'm sure Mark Sibly said the compiler automatically makes all power of 2 multiplications and divisions into SHL SHR operations. Can anyone else confirm this? | 
| 
 | ||
| I did some tests 'shift v's multiplcation / division' and I'd have to agree that blitz optimizes for powers of 2.  I performed a math operation on a variable 50000 times using a for next to control the loop.  The test was performed at 50fps.  Please note theirs a another thread 'force refresh rate' that disputes the way I force the fps.  Any how on with the results: cw = cw / 3 (13%) cw = cw / 4 (5%) cw = cw shr 2 (5%) cw = cw sar 2 (5%) cw = cw * 5 (5%) cw = cw shl 2 (5%) cw = cw * 3 (6%) | 
| 
 | ||
| So all this time, changing powers of 2 on over to shl/shr, ect ect, was just a big waste of time? and made the code a little uglier and a little more cryptive all for nothing!?! uh oh | 
| 
 | ||
| Yup. Well, no, actually I think sometimes SHR and SHL make good sense - if you're in a binary kind of mindset at the time. But not as an optimisation. | 
| 
 | ||
| Hmm, I'm not sure I follow. I've tried the LockedPixels and LockedPitch in BlitzPlus and it still doesn't come close to the performance of WritePixelFast on Locked buffers in Blitz2D. Where am I going wrong? | 
| 
 | ||
| Yes I know this topic is 2 years old but I have to say: THIS IS FANTASTIC - I've had blitzplus for a while, and have only started using it recently having become interested in 2d and losing interest in 3d, but never understood the limited documentation for 'lockedpixels' (ie the lack of examples) In my current isometric engine which uses a zbuffer I could handle 30,000 pixels drawn per frame whereas with this method I can draw changing 800x600 screen 10 times per frame at a decent frame rate (50fps) . Boy I feel silly for not finding this before. That is 4,800,000 pixels per frame that can be drawn! --->160 x number of pixels from before Certainly changes things for my program quite a bit... |