Fast OpenGL Sprites

Member
Posts: 142
Joined: 2002.11
Post: #1
I'm trying to find a way to optimize my sprite drawing in OpenGL.

Right now for each sprite I am binding a texture, and drawing a quad. It works, and it looks perfect, but I have a feeling I can make things much faster. The code is below:

Quote:-(void)drawFrame:(int)frame size:(NSSize)size repeatTexture:(BOOL)repeating {

GLTexture *image = [framesArray objectAtIndex: frame];

[image bind];

if (coordMode == GLSPRITE_CENTER)//move origin for differing coord modes
glTranslatef(-size.width / 2.0f,-size.height / 2.0f,0.0f);

//use the image size for texturing or pattern it?
NSSize texCoordSize = repeating ? size : [image size];

//is our image non-power-of-two? If not we'll need to adjust the texture coords
if (![image textureIsRectangle]){
NSPoint imageSize = [image size];
texCoordSize.width /= imageSize.width;
texCoordSize.height /= imageSize.height;
glDisable(GL_TEXTURE_RECTANGLE_EXT);
}
else
glEnable(GL_TEXTURE_RECTANGLE_EXT);

glBegin(GL_QUADS);

glTexCoord2f(0.0,0.0);
glVertex2f(0,0);
glTexCoord2f(texCoordSize.width,0.0);
glVertex2f(size.width,0.0);
glTexCoord2f(texCoordSize.width,texCoordSize.height);
glVertex2f(size.width,size.height);
glTexCoord2f(0.0,texCoordSize.height);
glVertex2f(0.0f,size.height);

glEnd();

}


Since it is not allowed to bind textures (or for that matter perform translations) between glBegin( ... ) and glEnd, I'm wasting 50% (according to OpenGL profiler) of my time calling glBegin(). This is pretty unacceptable. The next contender for GL time is CGLFlushDrawable with 23.28%, and glVertex2f with 3.69%.

I tried using glDrawPixels(), but presumedly since this uses a pointer to the texture data in RAM and not VRAM actually turned out much slower than using a quad.

Does anyone have any suggestions of how to get this faster? Keep in mind I don't want to get in a nit picky discussion about which function is 1% faster or slower than whatever function, I'm talking about major optimizations only.

P.S. Depth sorting is not an issue as I'm doing it manually and not using the depth buffer.
Quote this message in a reply
Moderator
Posts: 916
Joined: 2002.10
Post: #2
question, how many sprites are you drawing?
Quote this message in a reply
Member
Posts: 233
Joined: 2003.05
Post: #3
Have you tried display lists?

"Pay no attention to that man behind the curtain." - Wizard of Oz
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #4
skyhawk Wrote:question, how many sprites are you drawing?

Usually between 20 and 80 each frame.

On a sidenote, the game runs waaaay slower in 10.3.5 than it did in 10.4 (yes, you heard me right. I think Apple really optimized some things)
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #5
aaronsullivan Wrote:Have you tried display lists?

*hits self in head*

forgot about that one, I'll do this.
Quote this message in a reply
Sage
Posts: 1,234
Joined: 2002.10
Post: #6
How big (in on-screen pixels) are the sprites? What else is drawn per frame (background)? What does Shark say is the bottleneck?

Drawing, for example, a few dozen 16x16 sprites and nothing else is not very taxing even on an ancient iMac. Draw a few thousand and profile that instead-- you could be CPU bound, submit bound, fillrate bound, you can't tell from your description yet.

In general though, you want to avoid:
* immediate mode
* state changes (texture binding)
* obj-C method overhead

Alternate approaches include
* display lists (but you'll have to make one list per size, and try to cache them all up-front since compiling them takes non-trivial time)
* computing all vertex/tex coords into an array and submitting in a batch with glDrawArrays. You could order batches by texture if you reuse the same texture a lot (though this interferes with your Z order) or merge your sprite textures into one large texture atlas and fix your coordinates (thus avoiding rebinding completely.)

If you can't get away from immediate mode, definitely use the cglMacros.
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #7
arekkusu Wrote:How big (in on-screen pixels) are the sprites? What else is drawn per frame (background)? What does Shark say is the bottleneck?

Drawing, for example, a few dozen 16x16 sprites and nothing else is not very taxing even on an ancient iMac. Draw a few thousand and profile that instead-- you could be CPU bound, submit bound, fillrate bound, you can't tell from your description yet.

In general though, you want to avoid:
* immediate mode
* state changes (texture binding)
* obj-C method overhead

Alternate approaches include
* display lists (but you'll have to make one list per size, and try to cache them all up-front since compiling them takes non-trivial time)
* computing all vertex/tex coords into an array and submitting in a batch with glDrawArrays. You could order batches by texture if you reuse the same texture a lot (though this interferes with your Z order) or merge your sprite textures into one large texture atlas and fix your coordinates (thus avoiding rebinding completely.)

If you can't get away from immediate mode, definitely use the cglMacros.

The sprites are an average of 128x128
Tiles are being drawn in the background (about 120), but this is very optimized as they are in a grid and are using a single texture.

I've switched the sprites to using display lists when a size matching the image dimensions is chosen (this is 99% of the time) and to just go through the longer, dynamic process otherwise. New profiler statistics:

CGLFlushDrawable: 98.89% of GL Time, 2% of total application time. All other functions have been shoved into nothingness. It looks like I should focus on optimizing game logic now. Time to leave OpenGL profiler and see what Shark says.

And regarding Obj-c overhead, I've found it to be pretty insignificant even when going through large loops and sending the same method over and over (not that I do that in my game, just giving an example).
Quote this message in a reply
Sage
Posts: 1,234
Joined: 2002.10
Post: #8
Those stats look much better. Wink If you're using blending, you'll probably be fillrate limited on old machines first.

For 100 sprites this Obj-C overhead is negligible, but I wouldn't want to call a half-dozen methods per particle in a 10,000 particle system...
Quote this message in a reply
Member
Posts: 142
Joined: 2002.11
Post: #9
arekkusu Wrote:Those stats look much better. Wink If you're using blending, you'll probably be fillrate limited on old machines first.

For 100 sprites this Obj-C overhead is negligible, but I wouldn't want to call a half-dozen methods per particle in a 10,000 particle system...

True. Actually right now shark is measuring my objective-c overhead at about 8%; So it's not negligable. I can cut that in half with a little optimizing, but any more optimizing won't be worth it.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Fast rendering with OpenGL 1.1 dotbianry 4 3,865 Dec 18, 2012 03:58 AM
Last Post: dotbianry
  OpenGL Point Sprites Talyn 10 13,797 Jan 18, 2009 05:55 PM
Last Post: Talyn
  OpenGL Texture Loading & Sprites corporatenewt 2 11,151 Jan 30, 2008 12:39 PM
Last Post: ynda20
  Problems getting on the fast path in OpenGL Sea Manky 15 9,972 Jun 10, 2007 01:43 PM
Last Post: OneSadCookie
  fast billboards reubert 2 2,853 Oct 7, 2004 04:41 AM
Last Post: OneSadCookie