Quick question, will glCopyTexSubImage2D with with RECT textures?

Sage
Posts: 1,199
Joined: 2004.10
Post: #16
akb825 Wrote:I would imagine that if you simply have a box for your frustum culling, which you extrude for the shadow, it won't outweigh the benefits of using z-pass. It wouldn't be that much more math, AFAIK. It would also be the equivalent of drawing your model 2 less times. With enough models, it can really add up. (in my case, the simple scene I have would likely be 1/3 faster)

Criminy. It just occurred to me, the only reason z-fail is really necessary is for when the camera lies inside the shadow extrusion. All you have to do is use z-fail for objects whose shadow bounds ( the AABB I already calculate ) enclose the camera. All the others can be drawn z-pass.

Obviously, the bounds enclose a larger area than the actual volume, so you'll be using z-fail liberally, but it strikes me as being worthwhile. Now, to dig up my old z-pass code... since I've used z-fail for a couple years now.

Yow!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #17
akb825 Wrote:For example, for my conditional blur for soft shadows, it would have been too easy to use an if statement rather than multiplying the kernel by 0. If I did that, then BOOM: no hardware shaders on anything but the X1600. (on the Mac)

Not (necessarily) true -- the GLSL compiler will optimize most if statements into a form that can execute on non-SM3 hardware. It will also unroll loops with a fixed number of iterations such that they work on non-SM3 hardware, if that fits within the instruction limit of the card.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #18
OneSadCookie Wrote:Not (necessarily) true -- the GLSL compiler will optimize most if statements into a form that can execute on non-SM3 hardware. It will also unroll loops with a fixed number of iterations such that they work on non-SM3 hardware, if that fits within the instruction limit of the card.
Hm, I admit I don't know much about the compiler, but I wonder if this is something it could handle. If I were to do it in a high level language, I would do something like

Code:
//get the depth of the next piece
if (abs(thisDepth - currentDepth) < threshold)
{
   //take the blurred texture lookup
   //multiply by the kernel and add to the current sum of collors
   //add the kernel to the sum
}

The way I do it now is I use SLT for testing the depth, then multiply the kernel by the result. At the end, I sum the final kernel for normalizing the resulting color.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #19
It should be the case that anything you can write with an assembly language shader by computing both paths through the if, then multiplying by 1 or 0 to "make the decision" is optimizable by the GLSL compiler. Of course, certain constructs might be too hard for it to analyze...

I'd be very surprised if things of this form were not optimized:

Code:
if (condition)
{
    // don't write to any variables not declared in this scope; except
    x = something;
}
else
{
    // don't write to any variables not declared in this scope; except
    x = something_else;
}
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #20
Not to hopelessly derail this back on topic Rasp ... but I got my readback of the color and depth buffers working this morning. A few things to discuss...

Now, I did some profiling with Shark and noticed something interesting. I tried both glCopyTexImage2D and glCopyTexSubImage2D ( the latter set up with coords to readback the whole buffer into the texture ). When I profiled with the former ( glCopyTexImage2D ) I saw 7.2% of CPU time in the driver doing some sort of internal query on mipmap levels.

When I used glCopyTexSubImage2D, CPU use was ~0.2%. Works exactly the same, albeit obviously a trifle faster.

So my question is, why? I'm not complaining... I'm just curious why the former method would have such overhead, particularly since I was specifying mipmap level 0, so I can't see why there's be any extra work by the GL driver.

OK, question two. I was trawling the mac-opengl mailing list and saw a note saying the GL_FLOAT was the fastest 'type' to specify to glTexImage2D when creating a GL_DEPTH_COMPONENT texture. I tried it -- it works, I can see the depth buffer correctly in GL Profiler -- but I'm curious if that's comulent. Will that result in any subtleties or peculiarities in my shader? I would assume the values will still be 0 to 1 when read in my shader code.

FInally, part three, which isn't a question but an observation. I read akb825's thread about getting the depth buffer and seeing only white. I saw this too, this morning, and I tried what he suggested about brining the near plane out a little ( I had mine at 0.15, brought it out to 1.0 ). Now, I noticed that when I took a screenshot of the depth buffer ( from gl profiler's texture view ) and examined its histogram in photoshop, it -- even though it appeared completely white -- actually had data in it, albeit packed in the top almost-white range. When I adjusted levels, bringing up black, I saw a correct looking image for the scene. So, what I'm getting at, akb825, is that you might actually not need to bring out your near plane; all you're doing by moving out the near plane is increasing the precision a bit for the depth buffer.

EDIT:
Ooh, one more thing. Regarding soft shadows, I just found this: http://graphics.stanford.edu/papers/allfreq/

My mind is blown.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #21
The way my conditional shadow blur shader worked, I wanted to have a bit more precision. Otherwise, it would "halo" when the objects were closer than I would have liked.

And thanks for the link. I'll check it out.

Edit: after looking at their presentation, they ended up taking the equivalent of a cube map for the object they were shadowing with. Not the best solution IMO. Rasp
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #22
As a word of warning: checking the framebuffer status may destroy performance. On my PowerMac with a Radeon X800, I increased my framerate of soft shadows by around 20% by removing the checks. However, on my MacBook Pro, it made no difference.
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #23
akb825 Wrote:Edit: after looking at their presentation, they ended up taking the equivalent of a cube map for the object they were shadowing with. Not the best solution IMO. Rasp

I don't see what's so bad about that. It produces a correct penumbra and supports ( as far as I could tell ) an arbitrary number of lights, and was real-time.

Seems pretty awesome, to me.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #24
The *TexImage* functions (re)create the texture, the *TexSubImage* update an existing one. That's why they're so much faster.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #25
TomorrowPlusX Wrote:I don't see what's so bad about that. It produces a correct penumbra and supports ( as far as I could tell ) an arbitrary number of lights, and was real-time.

Seems pretty awesome, to me.
But you get to the question: just how many rendering passes would it require in order to shadow an entire scene?

For TexImage vs. TexSubImage, I noticed that my one call for TexImage does eat a lot of time compared to TexSubImage. Unfortunately, it causes my entire computer to freeze (for both my PowerMac and MacBookPro) when used with the depth texture. Fortunately, I only need the depth texture once, so it shouldn't be that huge of a deal.
Quote this message in a reply
Sage
Posts: 1,199
Joined: 2004.10
Post: #26
OneSadCookie Wrote:The *TexImage* functions (re)create the texture, the *TexSubImage* update an existing one. That's why they're so much faster.

Aha! You learn something new every day.

Quote:For TexImage vs. TexSubImage, I noticed that my one call for TexImage does eat a lot of time compared to TexSubImage. Unfortunately, it causes my entire computer to freeze (for both my PowerMac and MacBookPro) when used with the depth texture. Fortunately, I only need the depth texture once, so it shouldn't be that huge of a deal.

I was able to get the depth texture using CopyTexSubImage, no problem. Wacko

Quote:But you get to the question: just how many rendering passes would it require in order to shadow an entire scene?

Valid question. I haven't read the whole PDF.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #27
The CopyfTexSubImage depth texture freeze must be limited to ATI graphics cards. I really need to send that bug report to Apple: so far I have 3 things to report.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #28
akb825 Wrote:The CopyfTexSubImage depth texture freeze must be limited to ATI graphics cards. I really need to send that bug report to Apple: so far I have 3 things to report.

Yes, please report these bugs. A well behaved app should never freeze the system.
Quote this message in a reply
Moderator
Posts: 1,140
Joined: 2005.07
Post: #29
Damn, I tried to reproduce the freeze by grabbing the depth texture in the same way in one of the example programs (in this case the GLSLShowcase), and it worked without a hitch. I then tried to see if I could fix my program, and was able to lengthen the time until it crashed by disabling GL_VERTEX_PROGRAM_ARB before grabbing the texture, but it did still freeze. It doesn't make it any easier to send a bug report to Apple if I can't find a way to re-produce it. Annoyed

Fortunately, for the time before I had it freeze, I noticed there was no framerate difference between using glCopyTexImage from glCopyTexSubImage. Likely because I'm only using it once.
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  glCopyTexSubImage2D vs antialiasing on iPhone? Mark Levin 4 4,827 Jul 16, 2010 01:46 AM
Last Post: Bersaelor
  Trouble with glCopyTexSubImage2D when using multisampled rendering TomorrowPlusX 11 5,241 Nov 1, 2006 02:54 PM
Last Post: OneSadCookie
  Question about how textures are applied to quads which are scaled unevenly TomorrowPlusX 9 3,938 Feb 9, 2006 12:12 PM
Last Post: Chris Ball
  quick opengl 2d quad question dave05 2 3,678 Jun 9, 2005 06:09 PM
Last Post: arekkusu
  glCopyTexSubImage2D pkraft 3 3,899 Mar 8, 2005 01:59 PM
Last Post: pkraft