Fast face normals

Member
Posts: 49
Joined: 2002.05
Post: #1
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.

I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
Quote this message in a reply
Member
Posts: 345
Joined: 2002.04
Post: #2
This is just a guess but would it be quicker if you transformed the normals the same way you transform the faces?
Quote this message in a reply
rangaroek
Unregistered
 
Post: #3
Quote:Originally posted by Ian Kerr
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.

I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?

Hi, you need to update your normals when your geometry changes (character animation) or when you use scaling transforms. So if both cases are not true for you don't recalculate your normals. For character animation you can precalculate normals for the keyframes of an animation. Then you can interpolate the normals between the keyframes. (it's not exact but fast).

When you want to use scaling transforms use only uniform scaling (sx=sy=sz)
and enable GL_RESCALE_NORMALS so GL recalculates the normals for you.

By the way, when you need to transform your normals to world space (face normals for collision detection etc), you do it by multiplying them with the inverse transpose of the matrix which you use to transform your vertices.

The correct way to calc normals for a vertex is:

1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result

hopes it helps
rangaroek
Quote this message in a reply
Member
Posts: 304
Joined: 2002.04
Post: #4
Quote:Originally posted by rangaroek

The correct way to calc normals for a vertex is:

1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result

hopes it helps
rangaroek

I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors - but it doesnt seem necessary to me.

Hope I just saved you one divide per normal!

Codemattic
Quote this message in a reply
rangaroek
Unregistered
 
Post: #5
Quote:Originally posted by codemattic


I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors - but it doesnt seem necessary to me.

Hope I just saved you one divide per normal!

Codemattic

You are completely right, step three is obsolete ... and I didn't find it in my code ... it was a little mistake of mine ... Smile

bye, rangaroek
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #6
Shouldn't your normalize the normals before you sum them?

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #7
Should you? The un-normalized normal for a face is going to have length roughly proportional to the area of that face.

So if all your faces are roughly similar in size, you probably won't notice the difference, and if some faces are much larger than others, you probably wanted to give them more weight than the smaller faces anyway.

That sound reasonable? (All off the top of my head...)
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #8
Hmm, that seems reasonable actually.

Btw, tip for the original poster: If you're using CW and only targeting 604s and later, you can use __frsqrte(x), which estimates 1/sqrt(x) in like 4 cycles (as opposed to several hundred for the real thing).

__fsqrt(1.000000) = 0.984375 = 1/sqrt(1.031998)
__fsqrt(0.250000) = 1.968750 = 1/sqrt(0.258000)
__fsqrt(4.000000) = 0.492188 = 1/sqrt(4.127992)
__fsqrt(2.000000) = 0.695312 = 1/sqrt(2.068426)
__fsqrt(0.500000) = 1.390625 = 1/sqrt(0.517106)

There's probably some iterative process that can produce more accurate values if you need them.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Member
Posts: 49
Joined: 2002.05
Post: #9
What header file is __fsqrt in?
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #10
gack, I typed the printf wrong Smile

It's __frsqrte(), and it's not in any header file. It's an intrinsic function that compiles to a single instruction. This only works in CodeWarrior.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #11
Or PB/GCC in Jaguar:

#include <ppc_intrinsics.h>

note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #12
Quote:Originally posted by OneSadCookie
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.

It's __frsqrte, and there isn't any __fsqrtes.

You're probably thinking of __fsqrt and __fsqrts. I'm not sure which chips these are implemanted on. If they are on any, they're probably SLOW (though not as slow as sqrt()).

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #13
Ooops, sorry, made the same mistake as you Rasp

The functions in ppc_intrinsics are called __frsqrte and __frsqrtes, but both compile to a single frsqrte instruction. Guess that means it doesn't really matter if you use __frsqrte for floats.
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #14
__frsqrte() wasn't accurate enough, I decided to make an improved version:

float inv = __frsqrte(f);
float better = 0.5*(__fres(f*inv) + inv);

This is derived from the fact that 1-k*x^2=0 when x=1/sqrt(k)

sample results:
Code:
input      frsqete    check      improved   check
1.000000   0.984375   1.031998   1.000000   1.000000
0.250000   1.968750   0.258000   2.000000   0.250000
4.000000   0.492188   4.127992   0.500000   4.000000
2.000000   0.695312   2.068426   0.707153   1.999737
0.500000   1.390625   0.517106   1.414307   0.499934
3.141593   0.558594   3.204851   0.564209   3.141376
2.718282   0.609375   2.692965   0.606445   2.719047
0.318310   1.765625   0.320777   1.772461   0.318307
0.367879   1.648438   0.368006   1.648682   0.367897
314.1592   0.056641   311.7051   0.056419   314.1546
271.8281   0.061523   264.1914   0.060654   271.8226
the check columns are 1/x^2 of the previous column

Here's an ASM version of it:

Code:
static const float gOneHalf = 0.5;

double asm fastSqrtRecip(register double x) {
    frsqrte     fp2,x; // inv = 1/sqrt(x)
    lfs         fp3,gOneHalf(RTOC);
    fmuls       x,x,fp2; // f*inv
    fres        x,x; // 1/(f*inv)
    fadds       x,x,fp2; // 1/(f*inv)+inv
    fmuls       x,x,fp3; // .5*(1/(f*inv)+inv)
    blr;
}

My guess is that this takes around 20 cycles on a G3 or G4 (two cycle throughput for regular instructions, load is free if cached, fres takes 10 (yes, 2 cycle throughput for frsqrte)). Properly pipelined into the normal calculations that could be dramatically reduced.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #15
Here's the complete derivation for those that might be interested:

x = 1/sqrt(k)
x*sqrt(k) = 1
kx^2 = 1
kx^2 - 1 = 0


if we have a guess (g) for a solution of that, we can use a technique who's name I can't remember to refine this solution.

f(x) = kx^2 - 1
f'(x) = 2kx

better guess = g - f(g)/f'(g)
= g - (kg^2 - 1)/(2kg)
= g - (g/2 - 1/2kg)
= g - g/2 + 1/2kg
= g/2 + 1/2kg
= (g + 1/kg)/2

UPDATE! I have been informed that the refinement method I refered to is known is the Newton-Rhapson Method.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Calculate face direction from bvh or 3d skeleton data harisz 3 2,552 May 29, 2013 10:50 AM
Last Post: OneSadCookie
  glDrawElements and Face indices Ashford 8 10,843 Nov 11, 2009 03:03 PM
Last Post: Ashford
  Simple ray-face intersect optimization NYGhost 8 5,819 Aug 17, 2007 12:01 PM
Last Post: NYGhost
  Indexed Face Sets (meshes) wyrmmage 4 3,902 Dec 15, 2006 11:18 AM
Last Post: wyrmmage
  Face Problems when Z-Near down!!! leodeus 5 4,670 Oct 31, 2005 12:14 PM
Last Post: OneSadCookie