Optimization

Member
Posts: 509
Joined: 2002.05
Post: #1
I was wondering, if I have all my textures mipmapped, and some are excessively larger than they should be, will this hurt my performance much, or will it just take up a lot of VRAM? There are a bunch of textures I probably shouldn't mipmap, and some I could lower the resolution, but if it won't even help I will just leave it the same. Also, is there a way to speed up my transparencies, when I render my trees it slows the game down SO MUCH (and they are only 6 triangles, using vertice arrays)!
Quote this message in a reply
Member
Posts: 177
Joined: 2002.08
Post: #2
They won't hurt performance until you have so many textures they don't all fit in VRAM at once. Then the system will have to swap them in and out during render.

You should use Shark or the OpenGL profiler on the trees, it's a bad idea to just guess where the main load is (especially since I don't know anything about your code). Of course, if you're trying to draw a couple thousand trees, those 6 triangles do add up Smile
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #3
If you're drawing trees as intersecting billboards using alpha-masked textures, your speed hit is probably due to the blending. The GPU needs to read each pixel from the frame buffer before blending/writing a texel of the billboard. You can lessen the hit somewhat (depending on your textures) by enabling alpha test, so the read is skipped for fully transparent texels (or texels with an alpha lower than whatever you set the alpha test limit to.)

Fillrate eaten by blending is still a pretty big limitation even on modern (i.e. Radeon 9600) cards. I can only reliably get about 21x896x600x60 pixels/second out of my PowerBook. So for some applications you will need to redesign things to reduce the amount of overdraw.
Quote this message in a reply
Member
Posts: 509
Joined: 2002.05
Post: #4
arekkusu Wrote:If you're drawing trees as intersecting billboards using alpha-masked textures, your speed hit is probably due to the blending. The GPU needs to read each pixel from the frame buffer before blending/writing a texel of the billboard. You can lessen the hit somewhat (depending on your textures) by enabling alpha test, so the read is skipped for fully transparent texels (or texels with an alpha lower than whatever you set the alpha test limit to.)

Here is my code for the trees


PHP Code:
glEnable(GL_ALPHA_TEST); 
        
glAlphaFunc(GL_GREATER0.05);
        
glAlphaFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);  
        
glColor4f(1.0,1.0,1.0,1.0);
        
glDisable(GL_LIGHTING);
        
        for (
i=0treeCount i++)
        {
            [
lowTree1 render0 xtrees[i].x ytrees[i].y ztrees[i].z xr0.0 yrtrees[i].spin zr0.0 sender:sender];
        }
        
        
glEnable(GL_LIGHTING);
        
glDisable(GL_ALPHA_TEST); 


so I think I am using the alpha test correctly. Also, i have a glColor3f(1,1,1) in my code before the trees, will that slow things down? (I think I had to put it there because I used a different color before that and it was screwing things up).

GL Profiler is being weird, I can get EVERYTHING to work except the Show Stats to work, which is what I need the most.
Quote this message in a reply
Member
Posts: 509
Joined: 2002.05
Post: #5
Ok, OSC gave me some good tips, like drawling from front to back, and then getting rid of undrawn trees. I will try that and report back today
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #6
Your second glAlphaFunc is going to produce GL_INVALID_ENUM, you meant to type glBlendFunc.

Sorting by draw order will save you some blending if foreground trees come out mostly opaque. But you pay some CPU for sorting tree submission by Z.

How big is treeCount? Ten? A billion?

How are trees submitted to GL inside render:? You say you are using vertex arrays, are you using VAR? How many vertices per array submission? Submitting six triangles at a time is no good...
Quote this message in a reply
Member
Posts: 509
Joined: 2002.05
Post: #7
OK, I will fix that blend function thing asap.

The tree count right now is anywhere from 30 to 200, but I would like that number to increase without dropping FPS

Wow.. I just realized thats my problem, sending 6 triangles at a time. That will be REALLY easy to fix though. What's VAR?
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #8
Jake Wrote:OK, I will fix that blend function thing asap.

The tree count right now is anywhere from 30 to 200, but I would like that number to increase without dropping FPS

Wow.. I just realized thats my problem, sending 6 triangles at a time. That will be REALLY easy to fix though. What's VAR?
VAR is Apple's sorry attempt at making vertex arrays faster. It is really hard to use and really frustrating to try to use it...
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #9
First off, it sounds like you still need to figure out where your bottleneck is. CPU? Vertex submission? Transform? Fillrate? Use Shark, GLProfiler, and play with your code increasing the number of polygons / size of the polygons until you have a better feeling for what is "slow".

If you see that it is fillrate, do what OSC said about changing draw order to reduce blending cost.

if you see that it is vertex submission/transform, first optimize your regular vertex arrays. You can ask the GPU for a hint about it's maximum element size, just query GL_MAX_ELEMENTS_VERTICES. It's usually something like 150,000. I'm getting OK results submitting around 32k vertices at a time.

Once vertex arrays work, you'll still be wasting some time during vertex submission, so then look at VAR. VAR is Vertex Array Range, it is a simple extension to regular vertex arrays that maps your array into AGP space so the GPU can DMA copy the data instead of the CPU pushing it all. See Apple's sample code. There was also a thread on this board where I showed exactly how to set up double buffered VAR, but Carlos seems to have nuked it in the big forum shuffle.

Also, you might want to test your code on some different machines. The bottleneck will be different on different GPUs. Plus! You'll discover all sorts of bugs! Because! The ATI/nvidia drivers! Don't! Work! The! Same! >:(
Quote this message in a reply
Member
Posts: 509
Joined: 2002.05
Post: #10
Allright, I fixed the problem with calling too many drawElements (I had 1 per tree before) into 1 big drawElements. I was using GL Profiler, its pretty cool, I am going to do some more optimization first (like drawing trees in order from front to back). I read about that VAR on the NeHe tutorials (well the PC equivalent), do most all video cards support it, because if its only the new ones it is probably a useless optimization.

I can't wait to get into driver problems, as if my own project builder problems aren't enough Sad
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #11
VAR is supported on all Quartz Extreme-capable GPUs, so no Rage128 or software renderer support.

btw I'm putting together a better at-a-glance reference page trying to mirror i.e. delphi3d but I need another trip to the lab to test a few more cards and dump some more implementation limits...
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #12
Interestingly, despite it not being on that list, I've had success using VAR on the Rage 128, and a decent performance improvement from it (better than CVA).

I wonder what's stopping it being officially supported?
Quote this message in a reply
Member
Posts: 116
Joined: 2002.04
Post: #13
jabber Wrote:VAR is Apple's sorry attempt at making vertex arrays faster. It is really hard to use and really frustrating to try to use it...

VAR certainly isn't an Apple-specific thing. It's used on the PC as well.

VAR is pretty ugly, but it's one of the fastest ways to get things drawn until the new ARB replacement for VAR comes out (the name of which escapes me at the moment).

Wade
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #14
wadesworld Wrote:VAR certainly isn't an Apple-specific thing. It's used on the PC as well.
That may be, but PC users have VBOs which are (arguably) easier and faster so they don't have to use VAR.
Quote this message in a reply
Mars_999
Unregistered
 
Post: #15
jabber Wrote:That may be, but PC users have VBOs which are (arguably) easier and faster so they don't have to use VAR.

Ditto and its a GL1.5 requirement. Whoo hoo! Bring on 1.5 Apple.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Simple ray-face intersect optimization NYGhost 8 6,327 Aug 17, 2007 12:01 PM
Last Post: NYGhost
  OpenGL code optimization unknown 38 13,722 Jul 28, 2005 10:22 PM
Last Post: unknown
  vertex array optimization reubert 1 2,784 Aug 27, 2004 03:05 PM
Last Post: OneSadCookie
  Lightmaps and Optimization BobimusPrime 7 3,858 Dec 10, 2003 06:57 AM
Last Post: David