Optimization
I was wondering, if I have all my textures mipmapped, and some are excessively larger than they should be, will this hurt my performance much, or will it just take up a lot of VRAM? There are a bunch of textures I probably shouldn't mipmap, and some I could lower the resolution, but if it won't even help I will just leave it the same. Also, is there a way to speed up my transparencies, when I render my trees it slows the game down SO MUCH (and they are only 6 triangles, using vertice arrays)!
They won't hurt performance until you have so many textures they don't all fit in VRAM at once. Then the system will have to swap them in and out during render.
You should use Shark or the OpenGL profiler on the trees, it's a bad idea to just guess where the main load is (especially since I don't know anything about your code). Of course, if you're trying to draw a couple thousand trees, those 6 triangles do add up
You should use Shark or the OpenGL profiler on the trees, it's a bad idea to just guess where the main load is (especially since I don't know anything about your code). Of course, if you're trying to draw a couple thousand trees, those 6 triangles do add up
If you're drawing trees as intersecting billboards using alpha-masked textures, your speed hit is probably due to the blending. The GPU needs to read each pixel from the frame buffer before blending/writing a texel of the billboard. You can lessen the hit somewhat (depending on your textures) by enabling alpha test, so the read is skipped for fully transparent texels (or texels with an alpha lower than whatever you set the alpha test limit to.)
Fillrate eaten by blending is still a pretty big limitation even on modern (i.e. Radeon 9600) cards. I can only reliably get about 21x896x600x60 pixels/second out of my PowerBook. So for some applications you will need to redesign things to reduce the amount of overdraw.
Fillrate eaten by blending is still a pretty big limitation even on modern (i.e. Radeon 9600) cards. I can only reliably get about 21x896x600x60 pixels/second out of my PowerBook. So for some applications you will need to redesign things to reduce the amount of overdraw.
arekkusu Wrote:If you're drawing trees as intersecting billboards using alpha-masked textures, your speed hit is probably due to the blending. The GPU needs to read each pixel from the frame buffer before blending/writing a texel of the billboard. You can lessen the hit somewhat (depending on your textures) by enabling alpha test, so the read is skipped for fully transparent texels (or texels with an alpha lower than whatever you set the alpha test limit to.)
Here is my code for the trees
PHP Code:
glEnable(GL_ALPHA_TEST);
glAlphaFunc(GL_GREATER, 0.05);
glAlphaFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);
glColor4f(1.0,1.0,1.0,1.0);
glDisable(GL_LIGHTING);
for (i=0; i < treeCount ; i++)
{
[lowTree1 render: 0 x: trees[i].x y: trees[i].y z: trees[i].z xr: 0.0 yr: trees[i].spin zr: 0.0 sender:sender];
}
glEnable(GL_LIGHTING);
glDisable(GL_ALPHA_TEST);
so I think I am using the alpha test correctly. Also, i have a glColor3f(1,1,1) in my code before the trees, will that slow things down? (I think I had to put it there because I used a different color before that and it was screwing things up).
GL Profiler is being weird, I can get EVERYTHING to work except the Show Stats to work, which is what I need the most.
Ok, OSC gave me some good tips, like drawling from front to back, and then getting rid of undrawn trees. I will try that and report back today
Your second glAlphaFunc is going to produce GL_INVALID_ENUM, you meant to type glBlendFunc.
Sorting by draw order will save you some blending if foreground trees come out mostly opaque. But you pay some CPU for sorting tree submission by Z.
How big is treeCount? Ten? A billion?
How are trees submitted to GL inside render:? You say you are using vertex arrays, are you using VAR? How many vertices per array submission? Submitting six triangles at a time is no good...
Sorting by draw order will save you some blending if foreground trees come out mostly opaque. But you pay some CPU for sorting tree submission by Z.
How big is treeCount? Ten? A billion?
How are trees submitted to GL inside render:? You say you are using vertex arrays, are you using VAR? How many vertices per array submission? Submitting six triangles at a time is no good...
OK, I will fix that blend function thing asap.
The tree count right now is anywhere from 30 to 200, but I would like that number to increase without dropping FPS
Wow.. I just realized thats my problem, sending 6 triangles at a time. That will be REALLY easy to fix though. What's VAR?
The tree count right now is anywhere from 30 to 200, but I would like that number to increase without dropping FPS
Wow.. I just realized thats my problem, sending 6 triangles at a time. That will be REALLY easy to fix though. What's VAR?
Jake Wrote:OK, I will fix that blend function thing asap.VAR is Apple's sorry attempt at making vertex arrays faster. It is really hard to use and really frustrating to try to use it...
The tree count right now is anywhere from 30 to 200, but I would like that number to increase without dropping FPS
Wow.. I just realized thats my problem, sending 6 triangles at a time. That will be REALLY easy to fix though. What's VAR?
First off, it sounds like you still need to figure out where your bottleneck is. CPU? Vertex submission? Transform? Fillrate? Use Shark, GLProfiler, and play with your code increasing the number of polygons / size of the polygons until you have a better feeling for what is "slow".
If you see that it is fillrate, do what OSC said about changing draw order to reduce blending cost.
if you see that it is vertex submission/transform, first optimize your regular vertex arrays. You can ask the GPU for a hint about it's maximum element size, just query GL_MAX_ELEMENTS_VERTICES. It's usually something like 150,000. I'm getting OK results submitting around 32k vertices at a time.
Once vertex arrays work, you'll still be wasting some time during vertex submission, so then look at VAR. VAR is Vertex Array Range, it is a simple extension to regular vertex arrays that maps your array into AGP space so the GPU can DMA copy the data instead of the CPU pushing it all. See Apple's sample code. There was also a thread on this board where I showed exactly how to set up double buffered VAR, but Carlos seems to have nuked it in the big forum shuffle.
Also, you might want to test your code on some different machines. The bottleneck will be different on different GPUs. Plus! You'll discover all sorts of bugs! Because! The ATI/nvidia drivers! Don't! Work! The! Same! >:(
If you see that it is fillrate, do what OSC said about changing draw order to reduce blending cost.
if you see that it is vertex submission/transform, first optimize your regular vertex arrays. You can ask the GPU for a hint about it's maximum element size, just query GL_MAX_ELEMENTS_VERTICES. It's usually something like 150,000. I'm getting OK results submitting around 32k vertices at a time.
Once vertex arrays work, you'll still be wasting some time during vertex submission, so then look at VAR. VAR is Vertex Array Range, it is a simple extension to regular vertex arrays that maps your array into AGP space so the GPU can DMA copy the data instead of the CPU pushing it all. See Apple's sample code. There was also a thread on this board where I showed exactly how to set up double buffered VAR, but Carlos seems to have nuked it in the big forum shuffle.
Also, you might want to test your code on some different machines. The bottleneck will be different on different GPUs. Plus! You'll discover all sorts of bugs! Because! The ATI/nvidia drivers! Don't! Work! The! Same! >:(
Allright, I fixed the problem with calling too many drawElements (I had 1 per tree before) into 1 big drawElements. I was using GL Profiler, its pretty cool, I am going to do some more optimization first (like drawing trees in order from front to back). I read about that VAR on the NeHe tutorials (well the PC equivalent), do most all video cards support it, because if its only the new ones it is probably a useless optimization.
I can't wait to get into driver problems, as if my own project builder problems aren't enough
I can't wait to get into driver problems, as if my own project builder problems aren't enough
VAR is supported on all Quartz Extreme-capable GPUs, so no Rage128 or software renderer support.
btw I'm putting together a better at-a-glance reference page trying to mirror i.e. delphi3d but I need another trip to the lab to test a few more cards and dump some more implementation limits...
btw I'm putting together a better at-a-glance reference page trying to mirror i.e. delphi3d but I need another trip to the lab to test a few more cards and dump some more implementation limits...
Interestingly, despite it not being on that list, I've had success using VAR on the Rage 128, and a decent performance improvement from it (better than CVA).
I wonder what's stopping it being officially supported?
I wonder what's stopping it being officially supported?
jabber Wrote:VAR is Apple's sorry attempt at making vertex arrays faster. It is really hard to use and really frustrating to try to use it...
VAR certainly isn't an Apple-specific thing. It's used on the PC as well.
VAR is pretty ugly, but it's one of the fastest ways to get things drawn until the new ARB replacement for VAR comes out (the name of which escapes me at the moment).
Wade
wadesworld Wrote:VAR certainly isn't an Apple-specific thing. It's used on the PC as well.That may be, but PC users have VBOs which are (arguably) easier and faster so they don't have to use VAR.
Possibly Related Threads...
| Thread: | Author | Replies: | Views: | Last Post | |
| Simple ray-face intersect optimization | NYGhost | 8 | 4,643 |
Aug 17, 2007 12:01 PM Last Post: NYGhost |
|
| OpenGL code optimization | unknown | 38 | 11,570 |
Jul 28, 2005 10:22 PM Last Post: unknown |
|
| vertex array optimization | reubert | 1 | 2,531 |
Aug 27, 2004 03:05 PM Last Post: OneSadCookie |
|
| Lightmaps and Optimization | BobimusPrime | 7 | 3,673 |
Dec 10, 2003 06:57 AM Last Post: David |
|

