Regarding GL profiling - Printable Version
+- iDevGames Forums (http://www.idevgames.com/forums)
+-- Forum: Development Zone (/forum-3.html)
+--- Forum: Graphics & Audio Programming (/forum-9.html)
+--- Thread: Regarding GL profiling (/thread-1093.html)
Regarding GL profiling - TomorrowPlusX - Jun 25, 2009 10:08 AM
I'm having a severe performance problem, and am hoping you guys can give me some pointers.
I've implemented foliage, yet again. For the first time I've implemented a system which can generate TONs of foliage very, VERY quickly. The generation of foliage hauls *ss. But the rendering performance is terrible.
Scenes which ran at 30fps without foliage, becoming a slideshow with it! Gah.
Now, I noticed that when the foliage quads were very small, my performance remained at 30, but as soon as the quads got bigger it slowed down considerably.
Now, here's a screenshot so you can see it's really not pushing that many pixels.
(also, please pardon the poor quality texture, since I'm still experimenting)
So, I assume this means that fill rate is the problem.
To accommodate this, I've done a few things:
1) I'm using a high alpha test level, so I can render the quads order independant. I'm using a trick from this article (http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter01.html) to fade foliage out at distance
2) I've simplified the shadow mapping algorithms used by foliage to be faster and lighter, since high quality isn't really important there.
3) I billboard on the GPU
4) I'm using interlieved arrays, but not VBOs, since individual foliage patches come and go pretty quickly, but I don't think that the vertex throughput is the problem ( though it may be )
I don't know what else to do. I can post code for you to pore over, since I have to assume my GLSL is too expensive.
Regarding GL profiling - Fenris - Jun 25, 2009 01:23 PM
What does the GL profiler say? I'm seeing on the order of 10^3 quads here, and not a lot more than perhaps 10^6 pixels in total - say a megapixel of fillrate for the foilage. That shouldn't be a problem unless you were fill-bound already, and in which case you shouldn't drop half of your frames... My gut feeling is saying that you're somehow falling back to software, or incurring state changes that stall your pipeline. Still, increasing foilage size does imply fill-rate as your culprit. Then again, that drastic falloff in framerate can't well be caused by such a small amount of extra fill, and so on, back and forth.
My point here is that your pipe is behaving oddly, and you're not likely to crack the nut without asking the driver for some answers.
Regarding GL profiling - macnib - Jun 25, 2009 02:43 PM
Wow! That looks awesome! Love seeing those screenshots.
If you turn glsl off when rendering the grass is it much faster? Ya, I know the fixed pipeline really is programmed but its still the fast baseline case.
Also, I've noticed that sending a glsl script uniform values is a real slowdown. Figure it has to travel from cpu to card. ( for each plant )I try to hardcode that in the glsl as much as possible and that sped things up for me. And I try to order what I draw such that it minimizes any material setting and glsl program loading. So, lump all the grass together and set the material once at the beginning. I vaugley recall I use to set values into the texture matrix because all the scripts could access that .... but I think that method is deprecated in newer versions of opengl.
Mipmapping might help because there are so many small ones drawn in the background. Probably doing that...
Regarding GL profiling - TomorrowPlusX - Jun 25, 2009 03:03 PM
Fenris Wrote:What does the GL profiler say? I'm seeing on the order of 10^3 quads here, and not a lot more than perhaps 10^6 pixels in total - say a megapixel of fillrate for the foilage. That shouldn't be a problem unless you were fill-bound already, and in which case you shouldn't drop half of your frames... My gut feeling is saying that you're somehow falling back to software, or incurring state changes that stall your pipeline. Still, increasing foilage size does imply fill-rate as your culprit. Then again, that drastic falloff in framerate can't well be caused by such a small amount of extra fill, and so on, back and forth.
I just ran it under GL profiler with "Break on SW Fallback" and "Breakon Error" checked, but no dice. Apparently, it's all running in HW, and without complaint.
I'm really boggled -- I've seen much more foliage being rendered before without this kind of hit.
I'm going to give a stab at better batching of the foliage, and to use VBOs just in case. But I'm reasonably confident this is a fillrate problem, I just don't understand why the fill is so expensive.
@macnib -- I can't drop GLSL. I have a pretty thorough pipeline for fog, shadowing, etc.
It's pretty clearly a matter of fillrate...
With large textures:
And with small:
I think I will have to put a lot of love into the GLSL to try to speed it up.
Regarding GL profiling - Bachus - Jun 25, 2009 03:26 PM
Off the top of my head, some sanity checks:
1) Make sure the invisible grass at distance isn't being sent to the card. Just a sanity check to make sure you aren't still sending the verts and having the shader run on 0.0 alpha pixels.
2) There seems to be a couple thousand grass quads in view. I'd try combining multiple grass clumps into a single, larger quad.
3) I'd have to guess that fillrate is the issue, so I'd just try fiddling with anything that can possibly affect that. Try without alpha-test/blending turned on. Turn off depth-writes to save a tiny bit of fillrate? Smaller texture map / make sure you've got mip-maps. Just to track down where the bottleneck might be.
Regarding GL profiling - aBabyRabbit - Jun 25, 2009 03:36 PM
Would you care to produce those screenshots in wireframe mode so we can get a better idea of the sizes and complexity of the grass and terrain (and of course some numbers in terms of vertices, etc would help).
[edited] If you think its the GLSL performance then replace the grass shader with one that does nothin except draw the quad, and see how it changes...
We'd love to see your code :-)
Regarding GL profiling - Frogblast - Jun 25, 2009 04:31 PM
I generally agree with the rest of the advice:
a) Do the standard 'change the framebuffer' dimensions trick to verify a fillrate issue
b) Try disabling alpha test (without bothering to sort)
c) Try disabling blending (without bothering to sort)
d) If it is a fragment shader, post the GLSL itself?
The wireframe test will also be very interesting. How many pixels are being processed, only to be discarded?
Also, I notice that shadows are being cast on the grass. Can you describe what you're drawing to handle that? What if you remove the grass from the shadow function, and only render it into the final color buffer (ie, if there is a fill limit, is it happening during shadow map creation?).
Regarding GL profiling - TomorrowPlusX - Jun 25, 2009 04:39 PM
Turning off all blending, I get the following rather surreal image:
But, you'll notice I'm getting better FPS. Which is interesting.
Bachus Wrote:Off the top of my head, some sanity checks:
I did -- I'm only drawing those patches which are within the visibility radius -- it was one of the first sanity checks I did.
Quote:2) There seems to be a couple thousand grass quads in view. I'd try combining multiple grass clumps into a single, larger quad.
I will consider this, but I'm not certain how I could pull it off -- seems like it would be very hard to do in a manner which doesn't look wonky. Plus, as far as I can tell this isn't limited by # vertices, but rather fill.
Quote:3) I'd have to guess that fillrate is the issue, so I'd just try fiddling with anything that can possibly affect that. Try without alpha-test/blending turned on. Turn off depth-writes to save a tiny bit of fillrate? Smaller texture map / make sure you've got mip-maps. Just to track down where the bottleneck might be.
I can't do this without depth writes, since I don't want the overhead of depth sorting. Also, the texture's pretty small -- it's a small texture, atlas 1024x128
( 4 * 256x128 ) textures. I do have mipmaps too.
This is wigging me out! Is there any way to use Instruments' OpenGL tool to get an idea of what's going on? I can't figure out how to use Instruments effectively, and as such I spend all my optimizing time in Shark.
The good news: The code that generates points for the foliage is very fast. I can produce a comical density in real time and it's awesome! It's also being used to place boulders and other related greebling.
EDIT: I'll post GLSL presently -- I've got to walk the dogs.
Regarding GL profiling - TomorrowPlusX - Jun 25, 2009 05:05 PM
The problem seems to stem from my having used a very wide texture atlas of 1024x128 -- when I switched to a single square texture at 128x128, I get the expected performance.
I assume this has to do with the mipmaps pyramid being balanced along width & height. So, I have learned something today!
( now, I just need to come up with a non-awful texture atlas! )
Regarding GL profiling - TomorrowPlusX - Jun 26, 2009 07:11 AM
Well, while the egregious problem is solved, I'm still having performance problems.
I've just run the app under gl profiler's "Statistics" mode, and here's the results:
This doesn't tell you a lot but the hotspots as shown in the app are:
CGLFlushDrawable which takes 3.27 % of app time
glDrawArrays which takes 17.88%,
and glFlush which takes 19.74%
The only place I'm using glFlush is in FBOs, and I'm going to see if they still work without it.
Also, Shark tells me I'm spending 32% of CPU time drawing foliage, most of which is in glDrawArrays
Regarding GL profiling - Fenris - Jun 26, 2009 09:39 AM
Right, let's drop lower into your GL for a bit. Hit the GL profiler, and from the Views menu, hit the Driver Monitor. A new app will launch, without a window. From the menu bar, select Monitors->Driver monitors and select whichever GPU you're running off. (Cmd-1 should be the right one).
Now, you get a graph window that tells you, well, nothing. Press the Parameters button in the lower left corner. From here, you can drag and drop stuff you want to keep an eye on into the empty space below the graph.
I suggest you monitor the following:
There is a ton of parameters to monitor, but these are the ones I drop in when stuff is acting weird. Run your app and see if any values look insane. (Also, this is a system-wide monitor, so kill all other apps before running).
Regarding GL profiling - TomorrowPlusX - Jun 26, 2009 12:02 PM
When I get home from work I'll give all this a stab. Can I post the results here for interpretation?
Regarding GL profiling - Fenris - Jun 26, 2009 12:30 PM
Please do, I'd be intrigued to see what you find.
Regarding GL profiling - Fenris - Jun 26, 2009 12:51 PM
I just now noticed that your Statistics profile shows one thing as pretty borked:
Quote:glActiveTexture; 705,468; 55670; 0.08;0.12;0.06
Judging by the #calls value to CGLFlushDrawable (635) you've rendered roughly 600 frames. During those frames, you've performed 700,000 calls to glActiveTexture, changed the alpha function state 271,000 times and bound buffers over a million times. (I'm not sure if this is as strange as it sounds, perhaps Frogblast or Arekkusu could chime in - but it sure looks funny to me.)
Now, these calls usually don't take a lot of time themselves, which is why they don't top the profile; but AFAIK they stall the pipeline since they muck up GLs state machine. (I'd defer to anyone who disagrees on this one, I'm not 100% on it). If I were you, I'd take a long hard look at whence you call glActiveTexture, glAlphaFunc and glBindBuffer.
If you can't spot anything fishy, run the GL Profiler with "Include Backtraces" checked, open the Trace panel, and start the trace by clicking the Play/Pause button. Trace for a couple of seconds. Now, you can examine the call flow of those seconds. If you see any of these functions crop up too darn often, you have the call stack for each call to examine, to see who's knocking the door.
Still, I'd be intrigued to see what your code is doing, so take these hints for a spin and let us know what you come up with. Cheers!
Regarding GL profiling - TomorrowPlusX - Jun 26, 2009 02:39 PM
OK, first, the text export, which seems voluminous and cryptic.
And a screenshot of GL Driver Monitor
Any bells ring for you here?