OpenGL code optimization

Sage
Posts: 1,403
Joined: 2005.07
Post: #1
Whats faster?

glLoadIdentity()
gostuff()

or

glPushMatrix()
dostuff()
glPopMatrix()

they both do the same in the context of my program but It will be dont maybe 50 times a frame so I want to do everything as fast as possible.

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Sage
Posts: 1,482
Joined: 2002.09
Post: #2
50 times a frame really isn't that much. If you were doing things 1000 times a frame, then you might want to worry.

Don't worry about speed until it is a problem. While noble, not many people are going to care that it will run smoothly on a 6 year old machine.
Quote this message in a reply
Sage
Posts: 1,403
Joined: 2005.07
Post: #3
I guess im just a speed freak!

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Member
Posts: 28
Joined: 2005.04
Post: #4
unknown Wrote:I guess im just a speed freak!
Knowing when to optimize is an even greater skill then knowing how to optimize.
Quote this message in a reply
Sage
Posts: 1,403
Joined: 2005.07
Post: #5
Boths better,
Does anyone know which is faster?

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Moderator
Posts: 365
Joined: 2002.04
Post: #6
You're focussing in irrelevant issues of micro-efficiency which won't have any impact on the speed of your game. Don't worry about it until it becomes an issue (which it won't).

Whether or not you use glPush/PopMatrix() depends largely on whether or not you need to use them. If you don't need to use the matrix stack because you're performing your own matrix manipulation and calling glLoadMatrix(), then by all means don't use glPush/PopMatrix(). If you're relying on GL's matrix stack, then you must call glPush/PopMatrix() wherever it's necessary to preserve the parent's transform.

Which is faster? Well, in either case the matrix calculations have got to happen somewhere, so you're not saving any time either way.

Again, don't worry about the efficiency of these basic functions and concentrate on things which do make a difference: the amount of geometry you're drawing, your fillrate, the complexity of your drawing process and so on.

Neil Carter
Nether - Mac games and comic art
Quote this message in a reply
Sage
Posts: 1,403
Joined: 2005.07
Post: #7
Actually on drawing 300 sprites each frame the framerate has dropped considerably and any optimization would probably provide noticeable improvment. I wouldnt call choosing the fastest matrix operations micro efficiency.

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Moderator
Posts: 365
Joined: 2002.04
Post: #8
unknown Wrote:I wouldnt call choosing the fastest matrix operations micro efficiency.
I would. Again, unless you're doing zillions of matrix push/pop operations every frame, the cost of the operations simply won't register in a profile.

Since you're doing sprites, you're probably hitting a fill rate limitation instead, in which case fiddling around with subtly different combinations of matrix operations won't do you much good.

How are you drawing your sprites? If you're doing a lot of blending with largely transparent images, you may be able to draw more quickly if you apply alpha testing at the same time (see glEnable(GL_ALPHA_TEST) and glAlphaFunc()). This helps the GPU to draw less pixels, which can speed up this kind of thing quite a lot. You only need to reject all totally transparent pixels to see a benefit.

Also, since 300 sprites isn't a lot, I'm inclined to think there's something else afoot. Have you tried profiling in Shark? Are you spending lots of time in your own code instead of OpenGL? Also, what's the frame rate when drawing 300 sprites? If it's still over your target frame rate, there's no problem.

Incidentally, what CPU and graphics card do you have?

Neil Carter
Nether - Mac games and comic art
Quote this message in a reply
Sage
Posts: 1,403
Joined: 2005.07
Post: #9
I just got a school G4 that got thrown out and my graphic card is the default nVidia GeForce2 MX 32 MB VRAM, which is a pretty high range card ... 5 years ago.

Im doing glAlphaFunc(GL_GREATER, 0.1f); and then glEnable(GL_ALPHA_TEST) to draw textured quads from display lists (im still not sure if its faster not using lists).

What is fill rate, ive never heard of the before?

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Moderator
Posts: 365
Joined: 2002.04
Post: #10
unknown Wrote:I just got a school G4 that got thrown out and my graphic card is the default nVidia GeForce2 MX 32 MB VRAM, which is a pretty high range card ... 5 years ago.
That card's probably a bit of a problem for extremely fill rate heavy stuff. In this instance, you can only deal with the speed issue by drawing less pixels, unfortunately.

Quote:Im doing glAlphaFunc(GL_GREATER, 0.1f); and then glEnable(GL_ALPHA_TEST) to draw textured quads from display lists (im still not sure if its faster not using lists).
Your use of alpha seems fine. If you're not using blending anywhere or everywhere, turn it off when you don't need it - it's quite expensive.

With the display lists thing, having just one quad in a display list is a waste of time, and it'll probably be slower like that. You really want to draw a whole load of similar (same texture/attributes) quads in a batch with vertex arrays or something. See glDrawArrays() et al.

Quote:What is fill rate, ive never heard of the before?
Fill rate is the speed at which your GPU can draw pixels. It's the total size of the area you're drawing that's significant in this case (rather than the complexity of the shapes). It's also affected by the complexity of the drawing operation, and things like blending reduce the fill rate further and make everything slower.

Neil Carter
Nether - Mac games and comic art
Quote this message in a reply
Sage
Posts: 1,403
Joined: 2005.07
Post: #11
That sheds light on quite a few things I didnt know, thanks

heres some results from running it
24 fps avg with 300 sprites with lists
54 fps avg with 150 sprites with lists
22 fps avg with 300 sprites without lists
43 fps avg with 150 sprites without lists

Thats seriously not good, i was hoping to put some particles in later too but thats basically out of the question with such low frame rates.
Maybe I shouldnt be using OpenGL for 2D, what do you think?

Actually i just thought of somthing, im using 128x128 textures but there only being draw on quads that are like 40x40, so would it be a big speed increase to use several sizes of texture for scaling the sprites?

Sir, e^iπ + 1 = 0, hence God exists; reply!
Quote this message in a reply
Moderator
Posts: 771
Joined: 2003.04
Post: #12
What size are those sprites? I was able to draw 50 quake II models (about 200 triangles each) at 50fps on a 16MB card, on immediate mode(!)

Edit: Hmm, seems like you posted while I was typing... Yes, try using 64x64 textures and/or enabling mip mapping.
Quote this message in a reply
Moderator
Posts: 365
Joined: 2002.04
Post: #13
unknown Wrote:heres some results from running it
24 fps avg with 300 sprites with lists
54 fps avg with 150 sprites with lists
22 fps avg with 300 sprites without lists
43 fps avg with 150 sprites without lists

Thats seriously not good, i was hoping to put some particles in later too but thats basically out of the question with such low frame rates.
That's wacky. I'm surprised you're seeing any speed increase for using display lists. I've never seen any difference one way or the other.

Particles may not be as much of a problem as you think. They could be quite small, and you can draw them as a batch with glDrawArrays().

Quote:Maybe I shouldnt be using OpenGL for 2D, what do you think?
It'll most likely be slower if you do it with something like Quartz or QuickDraw.

Quote:Actually i just thought of somthing, im using 128x128 textures but there only being draw on quads that are like 40x40, so would it be a big speed increase to use several sizes of texture for scaling the sprites?
In my experience, when drawing with alpha testing (rather than blending), you only pay for the pixels which actually got drawn. With that in mind, drawing a 128x128 sprite which only has a 40x40 area containing opaque pixels would seem to be a bit silly, but shouldn't slow you down massively. It still makes sense to use a more optimal texture size if you can.

Of course, there are valid reasons for only wanting a small part of a larger texture; for example, if you want to use UV coordinates to extract just part of the texture as a sprite animation frame or background tile.

Remember also that fill rate is about the size of the area you're drawing, so if you draw the sprites bigger than their normal size, or if you have a large screen resolution, you're increasing the area and potentially decreasing the speed. (This mainly applies to older hardware, although I notice my Radeon 9600 XT mysteriously gets slower at higher resolutions, even when it really shouldn't.)

I think you should look at using glDrawArrays()... it won't cure fill rate problems, but it might speed up some of the fundamental drawing operations a bit. Don't forget to draw batches instead of single quads with glDrawArrays().

If you want to see my attempt at a sprite/tile based game in OpenGL, take a look at Yoink. I wrote it a couple of years ago, and I do many of the things I said you shouldn't do. Wink However, it draws quite a lot of quads in a very silly, slow way, so if it's reasonably fast on your machine, it might give you a sense of whether it's the way you're drawing that's slow, or if it's just down to your GPU. (You can enable a frame rate counter in the Programmer's Tools dialog.)

Neil Carter
Nether - Mac games and comic art
Quote this message in a reply
Oldtimer
Posts: 834
Joined: 2002.09
Post: #14
Seriously, I don't think your drawing is the problem - that would be very strange indeed. El Ballo runs fine on a G4 with that GF2MX, and we push a lot more pixels than that. Have you tried Shark? That will probably shed more light on what is taking all the time.

To put you on a benchmark here - El Ballo has roughly 4-500 sprites onscreen at all times, and they cover a lot of pixels. There's one that covers the entire screen, then decors that cover most of the screen again, and then about thirty enemies, bullets, powerups, particle and dust effects... and then even more overdraw for the front plane and the HUD. It chugs along just fine, and that's without display lists or vertex arrays - just immediate mode.

Please, run Shark and tell us what's taking most of the time. You'll be surprised (I hope). Smile
Quote this message in a reply
Member
Posts: 509
Joined: 2002.05
Post: #15
Such a speed increase from display lists makes me think it could be a CPU limit rather than a GPU. From what I understand display lists just precompile lots of CPU work. 150-300 sprites is relatively small considering most games render thousands of triangles each frame. You do need to do more profiling before you do micro optimization, like Neil said.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Simple ray-face intersect optimization NYGhost 8 6,223 Aug 17, 2007 12:01 PM
Last Post: NYGhost
  NEED HELP! OpenGL code for my exam doesn't work mr02077 2 2,797 Feb 9, 2007 05:44 PM
Last Post: stevejohnson
  SDL/OpenGL Initialization Code Nick 15 7,130 Sep 7, 2005 07:28 AM
Last Post: Nick
  OpenGL Source Code Generator Leisure Suit Lurie 2 4,032 Jul 5, 2005 11:55 AM
Last Post: Cochrane
  bus error with opengl code mnorton 2 3,853 Jan 21, 2005 02:53 PM
Last Post: ThemsAllTook