Fast face normals
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.
I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
This is just a guess but would it be quicker if you transformed the normals the same way you transform the faces?
Quote:Originally posted by Ian Kerr
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.
I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
Hi, you need to update your normals when your geometry changes (character animation) or when you use scaling transforms. So if both cases are not true for you don't recalculate your normals. For character animation you can precalculate normals for the keyframes of an animation. Then you can interpolate the normals between the keyframes. (it's not exact but fast).
When you want to use scaling transforms use only uniform scaling (sx=sy=sz)
and enable GL_RESCALE_NORMALS so GL recalculates the normals for you.
By the way, when you need to transform your normals to world space (face normals for collision detection etc), you do it by multiplying them with the inverse transpose of the matrix which you use to transform your vertices.
The correct way to calc normals for a vertex is:
1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result
hopes it helps
rangaroek
Quote:Originally posted by rangaroek
The correct way to calc normals for a vertex is:
1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result
hopes it helps
rangaroek
I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors  but it doesnt seem necessary to me.
Hope I just saved you one divide per normal!
Codemattic
Quote:Originally posted by codemattic
I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors  but it doesnt seem necessary to me.
Hope I just saved you one divide per normal!
Codemattic
You are completely right, step three is obsolete ... and I didn't find it in my code ... it was a little mistake of mine ...
bye, rangaroek
Shouldn't your normalize the normals before you sum them?
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
Should you? The unnormalized normal for a face is going to have length roughly proportional to the area of that face.
So if all your faces are roughly similar in size, you probably won't notice the difference, and if some faces are much larger than others, you probably wanted to give them more weight than the smaller faces anyway.
That sound reasonable? (All off the top of my head...)
So if all your faces are roughly similar in size, you probably won't notice the difference, and if some faces are much larger than others, you probably wanted to give them more weight than the smaller faces anyway.
That sound reasonable? (All off the top of my head...)
Hmm, that seems reasonable actually.
Btw, tip for the original poster: If you're using CW and only targeting 604s and later, you can use __frsqrte(x), which estimates 1/sqrt(x) in like 4 cycles (as opposed to several hundred for the real thing).
__fsqrt(1.000000) = 0.984375 = 1/sqrt(1.031998)
__fsqrt(0.250000) = 1.968750 = 1/sqrt(0.258000)
__fsqrt(4.000000) = 0.492188 = 1/sqrt(4.127992)
__fsqrt(2.000000) = 0.695312 = 1/sqrt(2.068426)
__fsqrt(0.500000) = 1.390625 = 1/sqrt(0.517106)
There's probably some iterative process that can produce more accurate values if you need them.
Btw, tip for the original poster: If you're using CW and only targeting 604s and later, you can use __frsqrte(x), which estimates 1/sqrt(x) in like 4 cycles (as opposed to several hundred for the real thing).
__fsqrt(1.000000) = 0.984375 = 1/sqrt(1.031998)
__fsqrt(0.250000) = 1.968750 = 1/sqrt(0.258000)
__fsqrt(4.000000) = 0.492188 = 1/sqrt(4.127992)
__fsqrt(2.000000) = 0.695312 = 1/sqrt(2.068426)
__fsqrt(0.500000) = 1.390625 = 1/sqrt(0.517106)
There's probably some iterative process that can produce more accurate values if you need them.
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
What header file is __fsqrt in?
gack, I typed the printf wrong
It's __frsqrte(), and it's not in any header file. It's an intrinsic function that compiles to a single instruction. This only works in CodeWarrior.
It's __frsqrte(), and it's not in any header file. It's an intrinsic function that compiles to a single instruction. This only works in CodeWarrior.
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
Or PB/GCC in Jaguar:
#include <ppc_intrinsics.h>
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
#include <ppc_intrinsics.h>
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
Quote:Originally posted by OneSadCookie
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
It's __frsqrte, and there isn't any __fsqrtes.
You're probably thinking of __fsqrt and __fsqrts. I'm not sure which chips these are implemanted on. If they are on any, they're probably SLOW (though not as slow as sqrt()).
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
Ooops, sorry, made the same mistake as you
The functions in ppc_intrinsics are called __frsqrte and __frsqrtes, but both compile to a single frsqrte instruction. Guess that means it doesn't really matter if you use __frsqrte for floats.
The functions in ppc_intrinsics are called __frsqrte and __frsqrtes, but both compile to a single frsqrte instruction. Guess that means it doesn't really matter if you use __frsqrte for floats.
__frsqrte() wasn't accurate enough, I decided to make an improved version:
float inv = __frsqrte(f);
float better = 0.5*(__fres(f*inv) + inv);
This is derived from the fact that 1k*x^2=0 when x=1/sqrt(k)
sample results:
the check columns are 1/x^2 of the previous column
Here's an ASM version of it:
My guess is that this takes around 20 cycles on a G3 or G4 (two cycle throughput for regular instructions, load is free if cached, fres takes 10 (yes, 2 cycle throughput for frsqrte)). Properly pipelined into the normal calculations that could be dramatically reduced.
float inv = __frsqrte(f);
float better = 0.5*(__fres(f*inv) + inv);
This is derived from the fact that 1k*x^2=0 when x=1/sqrt(k)
sample results:
Code:
input frsqete check improved check
1.000000 0.984375 1.031998 1.000000 1.000000
0.250000 1.968750 0.258000 2.000000 0.250000
4.000000 0.492188 4.127992 0.500000 4.000000
2.000000 0.695312 2.068426 0.707153 1.999737
0.500000 1.390625 0.517106 1.414307 0.499934
3.141593 0.558594 3.204851 0.564209 3.141376
2.718282 0.609375 2.692965 0.606445 2.719047
0.318310 1.765625 0.320777 1.772461 0.318307
0.367879 1.648438 0.368006 1.648682 0.367897
314.1592 0.056641 311.7051 0.056419 314.1546
271.8281 0.061523 264.1914 0.060654 271.8226
Here's an ASM version of it:
Code:
static const float gOneHalf = 0.5;
double asm fastSqrtRecip(register double x) {
frsqrte fp2,x; // inv = 1/sqrt(x)
lfs fp3,gOneHalf(RTOC);
fmuls x,x,fp2; // f*inv
fres x,x; // 1/(f*inv)
fadds x,x,fp2; // 1/(f*inv)+inv
fmuls x,x,fp3; // .5*(1/(f*inv)+inv)
blr;
}
My guess is that this takes around 20 cycles on a G3 or G4 (two cycle throughput for regular instructions, load is free if cached, fres takes 10 (yes, 2 cycle throughput for frsqrte)). Properly pipelined into the normal calculations that could be dramatically reduced.
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
Here's the complete derivation for those that might be interested:
x = 1/sqrt(k)
x*sqrt(k) = 1
kx^2 = 1
kx^2  1 = 0
if we have a guess (g) for a solution of that, we can use a technique who's name I can't remember to refine this solution.
f(x) = kx^2  1
f'(x) = 2kx
better guess = g  f(g)/f'(g)
= g  (kg^2  1)/(2kg)
= g  (g/2  1/2kg)
= g  g/2 + 1/2kg
= g/2 + 1/2kg
= (g + 1/kg)/2
UPDATE! I have been informed that the refinement method I refered to is known is the NewtonRhapson Method.
x = 1/sqrt(k)
x*sqrt(k) = 1
kx^2 = 1
kx^2  1 = 0
if we have a guess (g) for a solution of that, we can use a technique who's name I can't remember to refine this solution.
f(x) = kx^2  1
f'(x) = 2kx
better guess = g  f(g)/f'(g)
= g  (kg^2  1)/(2kg)
= g  (g/2  1/2kg)
= g  g/2 + 1/2kg
= g/2 + 1/2kg
= (g + 1/kg)/2
UPDATE! I have been informed that the refinement method I refered to is known is the NewtonRhapson Method.
"He who breaks a thing to find out what it is, has left the path of wisdom."
 Gandalf the GrayHat
Bring Alistair Cooke's America to DVD!
Possibly Related Threads...
Thread:  Author  Replies:  Views:  Last Post  
Calculate face direction from bvh or 3d skeleton data  harisz  3  5,178 
May 29, 2013 10:50 AM Last Post: OneSadCookie 

glDrawElements and Face indices  Ashford  8  16,767 
Nov 11, 2009 03:03 PM Last Post: Ashford 

Simple rayface intersect optimization  NYGhost  8  8,248 
Aug 17, 2007 12:01 PM Last Post: NYGhost 

Indexed Face Sets (meshes)  wyrmmage  4  5,140 
Dec 15, 2006 11:18 AM Last Post: wyrmmage 

Face Problems when ZNear down!!!  leodeus  5  6,031 
Oct 31, 2005 12:14 PM Last Post: OneSadCookie 