Quick acess to pixels - PBO
BlitzMax Forums/BlitzMax Programming/Quick acess to pixels - PBO
| ||
I finally got some time on my hands so I laid the time to try to implement an PBO in BMax. It got slow. I´m not sure why I´m sure I missed something in the implementation because it's not easy to remember the OpenGL implementation correctly. I need to work some more. But it´s a good update and step forware that I thought I'll share. ' Init Graphics Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit ' Init Variables Local pixels:Int[w*h] , pointer:Byte Ptr , pbo:Int ' Init PBO, Note: Uses single PBO, which makes the greatest difference, 2 PBO's gives a bit more boost, and 3 PBO's seem not to give much additional speed at all. 1 or 2 PBO's does. glGenBuffers 1,Varptr pbo glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Varptr pixels[0],GL_STREAM_DRAW ' You could use other hints such as GL_STATIC_DRAW, GL_DYNAMIC_DRAW. glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null ,GL_DYNAMIC_DRAW ' Upload empty buffer to prevent stall later. Repeat ' --------------------------------------------------------------- ' TRIG DATA TRANSFER glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Pointer(0),GL_STREAM_DRAW gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,Null ' This returns immediately, it triggers an asynchrounous DMA transfer, ' Null should actually be an integer offset when used liked this, BMax seem Not To like it. ' --------------------------------------------------------------- ' Do a frame worth of work here while data is transferred. Delay 1 ; Flip 1 ' --------------------------------------------------------------- ' WAY 1, ACCESSING DATA using pointer ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null,GL_STATIC_DRAW ' pointer = glmapbuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY) ' GL_WRITE_ONLY is one of the hints available. ' a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pointer[x*4 + y*4*w] = c ; Next ; Next ' glunmapbuffer GL_PIXEL_UNPACK_BUFFER ' --------------------------------------------------------------- ' --------------------------------------------------------------- ' WAY 2, ACCESSING DATA using GLBUFFERSUBDATA a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1 glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0] ' --------------------------------------------------------------- glbindbuffer GL_PIXEL_UNPACK_BUFFER,0 Until KeyHit(KEY_ESCAPE) |
| ||
Hi Casaber: It's definitely running smoothly. Have you benchmarked it against the classics of writepixel(), plot, set array.pixels[] and company ? |
| ||
Simple test to see the difference Normal writepixels ' Init Graphics Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit ' Init Variables Local pixels:Int[w*h] Repeat For temp=1 To 5000000 ; Next ' Payload a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1 gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels ' Write fullscreen 4 times gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels Delay 1 ; Flip 1 Until KeyHit(KEY_ESCAPE) Here's the same, using PBO, there should be a visible boost pushing it up. ' This example is geared towards iMac's 2010, but you could try it and if you got a more powerful machine just increase the payload and the number of screen writes on both just about till about the normal pixelwrite dies. Me personally I get almost x2 boost with this single PBO. And most of all there's much higher chance to keep smoothness as the OS won´t interfere with anything as much. No jitter. Bonus is that you may use this for threads, update graphics in a separate thread suits this perfectly. This is valuable and I'm sure lot of you know what I´m talking about. It will proabably be the perfect match for Monkey to get smooth graphics. ' Init Graphics Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit ' Init Variables Local pixels:Int[w*h] , pointer:Byte Ptr , pbo:Int ' Init PBO glGenBuffers 1,Varptr pbo glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Varptr pixels[0],GL_STREAM_DRAW ' You could use other hints such as GL_STATIC_DRAW, GL_DYNAMIC_DRAW. glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null ,GL_DYNAMIC_DRAW ' Upload empty buffer to prevent stall later. Repeat ' --------------------------------------------------------------- ' TRIG DATA TRANSFER glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Pointer(0),GL_STREAM_DRAW gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,Null ' This returns immediately, it triggers an asynchrounous DMA transfer, ' Null should actually be an integer offset when used liked this, BMax seem Not To like it. ' --------------------------------------------------------------- ' Do a frame worth of work here while data is transferred. For temp=1 To 5000000 ; Next ' Payload Delay 1 ; Flip 1 ' --------------------------------------------------------------- ' WAY 1, ACCESSING DATA using pointer ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null,GL_STATIC_DRAW ' pointer = glmapbuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY) ' GL_WRITE_ONLY is one of the hints available. ' a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pointer[x*4 + y*4*w] = c ; Next ; Next ' glunmapbuffer GL_PIXEL_UNPACK_BUFFER ' --------------------------------------------------------------- ' --------------------------------------------------------------- ' WAY 2, ACCESSING DATA using GLBUFFERSUBDATA a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1 glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0] glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0] glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0] glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0] ' --------------------------------------------------------------- glbindbuffer GL_PIXEL_UNPACK_BUFFER,0 Until KeyHit(KEY_ESCAPE) |
| ||
If you want, make a benchmark, I think this is mighty interesting for everyone. Especially for all money making app programmers who wants smooth graphics. This could be your ticket. |
| ||
I´m not at all happy with WAY1 * though so don´t be alarmed about that one if you try it, it has some bug in it. Dead slow and not even visually the same as it should, but I´m closing down for today. I´ve been debugging for hours, it's one of those stupid misstakes where you need a break to see it. |
| ||
Hi Casaber. Have a good rest. I've been busy working on this. Sat down and wrote a pretty comprehensive Benchmark. Now, because you are using GLGraphicsDriver(), no normal plotting commands are available, so you can't see the frames per second in realtime. Yet, I am calculating it. If you hit [ESC] you can see the final results. With this code, we can finally see what is what ! Here was one of the messages I got, still trying to post this one bit of code. ![]() Definitely something screwy with the server here. |
| ||
Going to have to give up, every time I try to paste the whole thing and hit UPDATE I get this: ![]() An excess of 14-times today so far ! I have no idea what's going on. Find the source HERE: https://www.mediafire.com/?jmqb5s742xx3x9z |
| ||
Your post might screw the server. There are some blitzmax codes which you cannot post here without erroring out a 500... Do this three times (literally) and the server crashes...which leads to an outtime of 20-30 minutes. Its surely known but I doubt that the webmasters really care (else we would have jsbased syntax highlighting, dynamically adjusted widths of codeboxes... standard stuff for more than just 5 years now) Bye Ron |
| ||
Lovely. Well, now I know. I get one 500 page and I'll be posting my code to Mediafire instead, Ron. Thanks for the info ... |
| ||
dw817, if it's just snippets, use GitHub. The gist section is perfect. Sign up for an account so you can edit them later. https://gist.github.com/ |
| ||
pastebin.com will do too (if you do not care for syntax highlighting). @github you could even create a simple "mytests"-project, and within "issues" (create new issue) you could drop media files which are automatically uploaded then. After this upload you get a useable http(s)-link for the media. You could use this to publish the needed media for snippets you provide. (saves the hassle to use imguri, abload,...) bye Ron |
| ||
Okay, I'm on GitHub. Here is the link: https://gist.githubusercontent.com/dw817/e75ebd49c9a3ab822a0d/raw/4152dc1297b0b7344f857d1e07d6baf3e7ce40cf/Benchmark%2520Fastest%2520Pixel%2520%28BlitzMAX%29 I'm curious now. Can you post ANY text there ? What is the limit on length ? Suppose it was just UUEncode ? Would GitHub complain about that ? |
| ||
Dw817 Okay you need text? I'll put it together with my VBO Shader becuase that one can use the GLMAX2d driver easly. I think it would be the perfect mix. PBO extended VBO's and Shaders. That would allow to draw primitives, quads, images and still have quick pixelaccess (both CPU and GPU via shaders) everywhere. The important bit is the pixels though. That benchmark crashed my iMac unfortuntley when I downloaded it, I need to check it up. |
| ||
https://help.github.com/articles/what-is-my-disk-quota/ They are pretty lax. I'm sure you'll get an email if you get excessive. BTW, if you call it "Benchmark Fastest Pixel.bmx" instead of "Benchmark Fastest Pixel (BlitzMAX)" you should get syntax highlighting. Also, link to the gist page instead of the raw file. That way others might fork and improve it. https://gist.github.com/dw817/e75ebd49c9a3ab822a0d |
| ||
I tried your benchmark Dw817, but im not seeing stellar results. And i have a beast of a machine and gpu, though it might be the resolution that kills it.. RES: 1920x1080 typ1 = 0 typ2 = 0 FPS: 1.177 typ1 = 0 typ2 = 1 FPS: 10.1 typ1 = 0 typ2 = 2 FPS: 12.88 typ1 = 0 typ2 = 3 FPS: 13.33 typ1 = 1 typ2 = 0 FPS: 1.166 typ1 = 1 typ2 = 1 FPS: 10 typ1 = 1 typ2 = 2 FPS: 12.88 typ1 = 1 typ2 = 3 FPS: 13.33 EDIT: Just to add, i dont really like the windowed "fullscreen". It doesnt play well with the taskbar on Windows 10, i suspect its size is hardcoded? |
| ||
Well Grable. Here's the point. I don't think it gets any faster than this. If you want to write a benchmark program to check your own resolution and you can get faster than 28.5714 frames per second (which is what I got), then that would be of considerable interest. Your resolution is set pretty high. Knock it down to 1024x768 (where I tested mine) and see if that helps to increase the FPS. And yes, plotting random dots all over the screen will ALWAYS slow down a system. That's why I wrote this program, to see which method does it the quickest. And if you know of a method faster than Casaber's GlBufferSubData(), I am certainly willing to look at it. |
| ||
If all you want is random dots, you could use shaders. Run this and choose any picture. Use the up/down cursor keys to vary how much picture vs random dot shows. |
| ||
At 1024x768 i got as high as 35 which seems about maximum i can get seeing as the loop that sets random pixels take roughly 30 milliseconds to complete. And there is only 1 fps difference between TPixmap.Pixels and the GLSUbBuffer thing too... I tried an old directx 7 sample i had and it was even slower :/ It seems that the only way to even approach 60 fps one would need to use the GPU for everything as the CPU just cant keep up at higher resolutions. TomToads recent foray into shaders shares no such limitation :) |
| ||
That benchmark is not accurate. Usually I get 60 fps even on 1024x1024 Pixmaps though it might stutter at times but it's mainly 60fps. Here in the benchmark I get 5-20 fps and tops 40 fps on anything. It's not easy to make a good benchmark that's why I usually use my eyes. I think I should build a number crunching benchmark though but I doubt I will use it much. Right now I´m trying all kinds of full HD screen fillers, and Bmax with CPU is capabable to keep up with most shaders on a 2012 machine. 1.5+Ghz and DDR3 is a must though, or you will shoot in the dark and hope for the best. |
| ||
Casaber, you may have a faster computer. Either that or write the model that shows different FPS numbers. This is the best I can do (my coding above) w my current knowledge of programming. It is also possible that the random number generator is slowing it down. If you want to bench test your graphics, you should take into account that someone somewhere is going to place non-uniform data into your screen. It can't always be a simple formula or calculation. You also have the added advantage of skipping the routine that displays the FPS as your graphics mode won't allow me to use DrawText(), so you really are getting a bit more juice in your FPS calculation. And I wouldn't complain. So far your routine is definitely the fastest one out there and you are to be commended on this ! :D TomToads recent foray into shaders shares no such limitation :) TomToad, if you would kindly post a model where I can use a For/Next loop to stuff random pixels into your 'screen' and display it, I can certainly add your method to the foray and determine its speed compared w the others. |
| ||
dw817 Boy do I have something for you next time !! :) About the comparison of the normal pixelwriter and PBO both should should work on an average machine. As long as they have DD3 or better memoeries for main ram. Computers with integrated computer graphics card are forced to have good speed on their RAM memories, so Integrated chips are always PERFECT for softwarendering. Maybe DD2 memoery will do but not with high resolutions , that's the reason why some Samsung phones have really bad pixelsaccess (they need the bus and memoryspeed its nothing wrong with their GPU). Samsung and other brands quickly changed it into DDR3 on their new mobiles. Iphone used software pixel blitting to get their smooth scrolling at the beginning, and they still use a mix of hardware and software. Android copied them with JellylBean onward I'm sure. Hardware software mix is the way to go. :) And that's what I have for you ;) You'll love it. I have no doubt that this is the way to go at least until Vulcan. From there on I guess maybe there have to be a drastic change, or not. I just need to get some bugs out of the way and perhaps a nice demo this time ? I' lazy about those things. I want to get to the essentials as quick as possibl. So I have CPU rendering + GPU shader going on at the same time with text scaling rotation, textand primitives works I guess I could write a Quad for you aswell and then all holes you mentioned would be filled. I'm need to find out the OpenGL bible for the bugs though first. I really need to try get some kind of demo on this. I''m not sure what to do |
| ||
Looking forward to seeing it, Casaber ! I'm still experimenting with writing a 6-bit encryption routine. I have need of it in the 750k Carryall program. |