How to take advantage of HW T&L?

Member
Posts: 23
Joined: 2002.08
Post: #1
Honestly, I'm a little slow at picking all of this back up again, so please be patient.

When I stopped doing OpenGL programming a number of years ago, hardware T&L was just being introduced on a few of the brand-new, high-end graphics cards, so I never really played with it much. Now I'm about to write a mesh object which I'll add skeletal support to later and am wondering how to (or whether I should, for that matter) take advantage of hardware T&L.

I know that just using OpenGL's translation and rotation functions will automatically do it in hardware on cards that support HW T&L, but is that actually going to give me a speed improvement when rendering a skeletally-deformed mesh?! Wont having to pull the rendering out of a vertex array and do a lot of transforms and rotates on just a few triangles at a time kill performance? Is it better to just regenerate a mesh every frame by deforming the original using software matrices (which can then be optimized for AltiVec)?

I'll probably have more questions later Blush
Quote this message in a reply
Member
Posts: 177
Joined: 2002.08
Post: #2
There are extensions that allow skeletal animation to be performed by the card (it can also be done with a vertex shader), but I don't know the details.

I doubt your entire scene uses skeletal animation, so you'll still get benefits for the environment.

I'm doing skeletal animation with feedback mode, so I theoretically get transformation acceleration in both stages (mesh generation and final render).

And when it all comes down, there's no way to turn hardware T&L off, so it's academic Rasp
Quote this message in a reply
Zoldar256
Unregistered
 
Post: #3
GL_ARB_vertex_blend will let you do skeletal animation in hardware. This works by specifying the vertices to each have a specified weight for a given transformation matrix.
For examples, you set GL_MODELVIEW to your initial transform and GL_MODELVIEW1_ARB to your destination transform. Weight the vertices to GL_MODELVIEW equal to t. Where t is from 0 to 1 and increases with time. And weight them to GL_MODELVIEW1_ARB according to 1.0-t. This way as t increases the vertices are blended between the two transformations.

Now if you had a single mesh you wanted animated, you could have GL_MODELVIEW be the transformation for the thigh and GL_MODELVIEW1_ARB be the transformation for the calf. Weight the calf vertices more towards GL_MODELVIEW1_ARB and the thigh vertices more towards GL_MODELVIEW and have the vertices around the knees weighted somewhere evenly between the two. And you can deform your skeleton pretty much however you want. Smooth curves like a snake, or sudden ones, just depending on how the weighting is done.

While I bet most cards support this in hardware. All this might be done in software by he driver before it goes to the card. But it would be no slower than any other custom software implementation I bet.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #4
The only problem with GL_ARB_vertex_blend is that you have to keep changing the matrices all the time, which probably is slower than doing the work yourself.

GL_ARB_matrix_palette was designed to fix this problem, but Apple hasn't seen fit to grace us with this extension.

GL_ARB_vertex_program is the ultimate way to do skinning. Unfortunately, it means you'll miss out on hardware T&L on GeForce < 3's and Radeon < 8500's.
Quote this message in a reply
henryj
Unregistered
 
Post: #5
There is a little bit of confusion about what hardware T&L means.

First, if your card and driver support a hardware transformation pipeline you will get the benefit from it. In fact you have no choice.

Secondly, hardware transform doesn't mean glRotate will go faster. It means that all the operations to transform your vertex data from world space to screen space will happen on the card.
There is a lot that goes on between your glVertex call (or passing in your vertex buffer) before the pixel appears on the screen. Hardware T&L moves more of this onto the video hardware.

In the past 3D cards were really hardware rasterisers.
Quote this message in a reply
Zoldar256
Unregistered
 
Post: #6
If you use ARB_vertex_program to do skinning you can't perform any other vertex programs effects like toon shading. Well, unless you had a vertex program that did both skinning and the other desired vertex program effect.

At leas that's my understanding.

I suppose in that case you'd have to fall back to vertex blending. So would you then need two possible render paths like:
1. Vertex program for skinning if no other vertex program is needed.
2. Vertex blending for skinning then the vertex program if there is a needed vertex program effect.

Or is there another method of getting skinning plus and arbitrary vertex program effect all in one?

OneSadCookie:
Thanks for pointing out matrix_pallete, but I don't quite understand it. Let's say you have a mesh moving through the world you wish to skin, so the mesh's center is moving along x let's say. Wouldn't you still need to create new modelview matrices for the pallete each frame anyways? The matrix palette just seems to provide a means to extend the hardcoded 32 or so matrices for vertex_blend, and perhaps reuse of matrices and better organization.

Or does the matrix specified by the palette index associated with a vertex get concated onto the current modelview matrix? (with the weights taken into consideration)
But I mighta missed something in my quick read of the spec.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #7
ARB_vertex_program bypasses ARB_vertex_blend, so you'd have to write one VP to do both effects anyway.

GL2_vertex_program should fix this problem by allowing you to build vertex programs out of smaller independent parts.

ARB_vertex_blend only exposes the number of vertex units the card has, generally around 4. The 32 is the max the extension will ever support on any hardware. What ARB_matrix_palette lets you do is switch the matrices among the blend units per-vertex, which means you can use DrawElements to submit your geometry, which you can't do with just plain ARB_vertex_blend.

You will still have to change all the matrices every frame. No way around that.
Quote this message in a reply
Zoldar256
Unregistered
 
Post: #8
Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.
Werd yo.
Damn... I shoulda known verex program would bypass vertex blend. Since it says right in the docs it replaces the per vertex operations part of the pipeline.

Ah well. I never liked skinning anyways. I only like robots. And robots with skin just looks kinda funky. ;-)
Quote this message in a reply
Member
Posts: 304
Joined: 2002.04
Post: #9
Quote:Originally posted by Zoldar256
GL_ARB_vertex_blend will let you do skeletal animation in hardware. This works by specifying the vertices to each have a specified weight for a given transformation matrix.
For examples, you set GL_MODELVIEW to your initial transform and GL_MODELVIEW1_ARB to your destination transform. Weight the vertices to GL_MODELVIEW equal to t. Where t is from 0 to 1 and increases with time. And weight them to GL_MODELVIEW1_ARB according to 1.0-t. This way as t increases the vertices are blended between the two transformations.

Does anyone have any code examples?

How do you handle normals then? If your verts are changing - dont you have to recalc your normals? Is the card doing that also?
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #10
ARB_vertex_blend just gives you several different modelview matrices, so normals get handled in the "normal" way.

Virtually every code sample using ARB_vertex_blend will also use ARB_matrix_palette, so I'd guess Apple sample code is your best bet...
Quote this message in a reply
Member
Posts: 23
Joined: 2002.08
Post: #11
Quote:Originally posted by henryj
First, if your card and driver support a hardware transformation pipeline you will get the benefit from it. In fact you have no choice.
Why would anybody want to disable HW T&L?

Quote:Secondly, hardware transform doesn't mean glRotate will go faster. It means that all the operations to transform your vertex data from world space to screen space will happen on the card.
Okay, let me make sure I'm gettin' this right, 'cause I'm one of those who has been misinformed or who has misread information and thought that HW T&L moved all the matrix math onto the card. So, HW T&L moves only the world-space to screen-space transforms over to the card?

Quote:There is a lot that goes on between your glVertex call (or passing in your vertex buffer) before the pixel appears on the screen. Hardware T&L moves more of this onto the video hardware.

In the past 3D cards were really hardware rasterisers.
Of course.

So, the general drift that I'm getting here is that I should probably stick to my own matrix math as HW T&L will only transform the data after the translations & rotations into world-space have been completed, and using work arounds such as GL_ARB_vertex_blend will be slower 'cause I'll have to not use my vertex arrays (i.e. glDrawElements). If I wanted to try to use only OpenGL functions in the hopes that everything can be done on the card at some point, I should use GL_ARB_matrix_palette (if and when Apple supports it). Correct?

I need some clarification: GL_ARB_vertex_blend allows you to set the weighting between two Model Views and switch between them as you render vertices (individually, I'm guessing... GL_ARB_matrix_palette is supposed to provide a work around for that). So, all of the vertex information and normal information goes through the regular OpenGL transform & rotation stuff, but what does that get you if HW T&L only does world-space to screen-space transform?
Quote this message in a reply
Zoldar256
Unregistered
 
Post: #12
Well, your card can handle handle all of the opengl transformations. For local to world to screen to viewport. The driver might do the matrix stack and matrix multiplication in software, but I see no reason any driver would. The opengl matrix stack has a fixed implementation specific size, probably so hw designers can happily create hw implementations of all the transformation brew with no worries.

You might be able to cache invariant transforms in sw, but I'm not even sure you'd gain anything from that.

You can use vertex arrays if using vertex_blend also. Or even if you aren't using vertex blend and deforming the mesh in your ow implementation. I'd still use vertex arrays.
Quote this message in a reply
Member
Posts: 177
Joined: 2002.08
Post: #13
Quote: So, HW T&L moves only the world-space to screen-space transforms over to the card?

It moves all matrix operations to the card. However, I don't know if it moves "semi-matrix" operations like generating the rotation matrix to use in glRotate to the card as well.

Quote: So, the general drift that I'm getting here is that I should probably stick to my own matrix math as HW T&L will only transform the data after the translations & rotations into world-space have been completed

It's more that the matrix pipeline is one-way (with the possible exception of feedback mode). You can't use the card as a generic matrix math accelerator, so if you want to do anything that doesn't involve submitting vertices to GL you'll have to do it yourself.

What I got from OSC's explanation is that ARB_vertex_blend allows you to specify 2 modelview matrices and interpolate between them for each vertex (so the vertices have varying transformations, all of which are "between" the two modelviews), while ARB_matrix_pallette allows you to specify a much larger number of matrices and interpolate between *any* 2 of them for each vertex.
Quote this message in a reply
Zoldar256
Unregistered
 
Post: #14
Anouying comment:

Each vertex is transformed by a variable number of transformations. This can be up to an implementation sepcific maximum. (?Though no more than 32 for vertex blend?) The final transformed vertex is then just a weighted sum of all these transformed vertices. So with 2 matrixces it's interpolation, with more than 2 it's... Um... Something...

I think this must be useful for blending animations. tho, handling the transformations with vertex_blend is probably a pain, matrix pallete would probably be much nicer in this case.
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #15
Hardware T&L moves the modelview & projection multiplications onto the card.

Note that this does not make glRotate faster, in fact it's probably slower. However, it does make the processing of each vertex much faster (only relevant if you're in DrawElements, DrawArrays, CallList or some variant).

ARB_vertex_blend allows you to specify n different modelview matrices, and for each vertex, n weights. The transformed vertex is then the weighted sum of the n individual transformed vertices.

This isn't very useful for skinning, however, because for skinning you generally have more than n bones. That means that for different parts of the model, you have to change the matrices, which limits your ability to use DrawElements (it's not impossible, but you have some sorting of your vertices to do, and you can't just call DrawElements once for the whole lot).

ARB_matrix_palette circumvents this by giving you > n modelview matrices, and allowing you to choose n of them per vertex to be used with ARB_vertex_blend. This means that you can just do a single DrawElements call for the whole model.

On the PC side, ARB_vertex_blend and ARB_matrix_palette are at least accelerated by the GeForce2 series, and I think by the Radeon series. Apple doesn't yet support ARB_matrix_palette, though.

ARB_vertex_program gives you the functionality of both ARB_vertex_blend and ARB_matrix_palette. ARB_vertex_program isn't hardware accelerated below GeForce3/Radeon8500, though.

Therefore, I recommend two paths &mdash; an ARB_vertex_program path for newer hardware, and a software path for older hardware (which will still let you take advantage of hardware T&L).
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Advantage of GL_TRIANGLE_STRIP? Bersaelor 21 17,863 Feb 17, 2011 07:14 PM
Last Post: Holmes