Float vs Double
An unsigned wide is:
struct UnsignedWide
{
UInt32 hi;
UInt32 lo;
}... But the real reason is that I want to express my times in fractions of a second, so that I can measure all speeds in units/sec. But I guess that you can do time work with an integer valued number of mic-secs, it's just that I don't.
struct UnsignedWide
{
UInt32 hi;
UInt32 lo;
}... But the real reason is that I want to express my times in fractions of a second, so that I can measure all speeds in units/sec. But I guess that you can do time work with an integer valued number of mic-secs, it's just that I don't.
Most of the time these precision problems can be improved by normalizing. Pick a point that should be the origin and adjust all the other data to be centered/displaced from that origin.
For Najdorf the likely candidate would be the ball's position and shift all the other data around it. Scene data would need to be in doubles but for rendering the visible portion of the field can be converted to floats. So stuff that is close to be ball will have the best accuracy while the stuff that is far away and most likely not seen is not as accurate.
For Fenris the easiest would be to get the time at the start of the program and minus that from every time that you get from then on. Possibly periodically resetting the origin time just to keep the accuracy up.
Another option is not to use floats. Use 32 bit integers. Signed they range from (about) -2,000,000,000 to 2,000,000,000 and they are accurate down to the 1 so you have about 10 decimals of precision. Sure no fractions, but you can get arround that by just assuming that the last 3 digits are the decimal. I.E. -2,000,000.000 to 2,000,000.000 with an accuracy of .001. Convertion from int to float/double when you need it may be slower but in the entire run of the program, is this really the bottle neck? Profiling will tell.
One benefit with ints is that all numbers are precisely represented. Unlike floats and doubles which can't represent, for instance, the number 3 without a bit of error.
For Najdorf the likely candidate would be the ball's position and shift all the other data around it. Scene data would need to be in doubles but for rendering the visible portion of the field can be converted to floats. So stuff that is close to be ball will have the best accuracy while the stuff that is far away and most likely not seen is not as accurate.
For Fenris the easiest would be to get the time at the start of the program and minus that from every time that you get from then on. Possibly periodically resetting the origin time just to keep the accuracy up.
Another option is not to use floats. Use 32 bit integers. Signed they range from (about) -2,000,000,000 to 2,000,000,000 and they are accurate down to the 1 so you have about 10 decimals of precision. Sure no fractions, but you can get arround that by just assuming that the last 3 digits are the decimal. I.E. -2,000,000.000 to 2,000,000.000 with an accuracy of .001. Convertion from int to float/double when you need it may be slower but in the entire run of the program, is this really the bottle neck? Profiling will tell.
One benefit with ints is that all numbers are precisely represented. Unlike floats and doubles which can't represent, for instance, the number 3 without a bit of error.
Quote:One benefit with ints is that all numbers are precisely represented. Unlike floats and doubles which can't represent, for instance, the number 3 without a bit of error.Interesting ideas all of them, (I do shift the time now that I identified the problem), but isn't it one of the benefits with doubles that they can represent all integers without error?
No, it's a fundamental issue with the way floats are represented. Obviously, the more bits of precision you have, the less the error...
Personally, I use one of the double-returning timing routines (GetCurrentEventTime or +[NSDate timeIntervalSinceReferenceDate]), and only ever use the "dt" (this frame time - last frame time) in calculations. Since that's sub-1.0, I pass it as a float to avoid lots of conversions <-> double.
Personally, I use one of the double-returning timing routines (GetCurrentEventTime or +[NSDate timeIntervalSinceReferenceDate]), and only ever use the "dt" (this frame time - last frame time) in calculations. Since that's sub-1.0, I pass it as a float to avoid lots of conversions <-> double.
Just a note, cause this is a touchy subject at work for me. When writing statistical packages or anything involving intense math, doubles are the ONLY way. Since most of my work involves in inputting data, crunching, then outputting, accuracy > speed.
I would say games nowadays could start using doubles (even if it seems slightly wasteful), cause the overhead is really quite minor compared to MANY things.
All that being said, just be wise. If you don't need maximum precision, don't use doubles. I would even venture to say, most games get by just fine on floats.
I would say games nowadays could start using doubles (even if it seems slightly wasteful), cause the overhead is really quite minor compared to MANY things.
All that being said, just be wise. If you don't need maximum precision, don't use doubles. I would even venture to say, most games get by just fine on floats.
Nature of the number format doesn't allow for certain numbers to be exact. Ignoring the exponent and sign bits of a float or double and just look at the bits for the actual number (sign and exponent are integer like so nothing fancy there.) So lets say in the case of 3; or for just the number .3 since 3, 30, 300, .03 is just a power of .3; how will it be encoded in the float/double. (Note there is a slight difference between how a float and double are encoded but in general the idea is the same.)
For integers each bit is a power of 2
Bit Value
1 2
2 4
3 8
...
N 2^N
To get the total value of the int you add them up. Similarly with floats but the values are represented differently.
Bit Value
1 1/2
2 1/4
3 1/8
...
N 1/N^2
So (first number is bit value) .3 = 0*1/2 + 1*1/4 + 0*1/8 + 0*1/16 + 1*1/32 + 1*1/64 ... In the end the value is very very near to .3 but may end up being actually .299999[some numbers] or .300000[some numbers]. This also can add to some grief because you may have a threshold or precalculated number, that is very accurate, and you compare that with a nasty calculation which may be less accurate due to the earlier point about precision and or other calculations that continue to propagate the errors of numbers that can't be exactly represented and the two numbers may be a bit off. This is where you may end up adding some fuzzy logic to say, eehh, close enough.
E.G.:
/* This will fail at times.*/
if (doubleValue == 0) {...}
/* This will be a bit better.*/
if (FABS(doubleValue) < SOME_SMALL_DOUBLE_CLOSE_TO_ZERO) {...}
Also when converting to an integer, you need to be aware that 2.9999 will translate to 2 while 3.00001 will translate to 3. You may need to do some rounding to get the right 'integer' value from the float/double.
Weee, fun with floating point types.
For integers each bit is a power of 2
Bit Value
1 2
2 4
3 8
...
N 2^N
To get the total value of the int you add them up. Similarly with floats but the values are represented differently.
Bit Value
1 1/2
2 1/4
3 1/8
...
N 1/N^2
So (first number is bit value) .3 = 0*1/2 + 1*1/4 + 0*1/8 + 0*1/16 + 1*1/32 + 1*1/64 ... In the end the value is very very near to .3 but may end up being actually .299999[some numbers] or .300000[some numbers]. This also can add to some grief because you may have a threshold or precalculated number, that is very accurate, and you compare that with a nasty calculation which may be less accurate due to the earlier point about precision and or other calculations that continue to propagate the errors of numbers that can't be exactly represented and the two numbers may be a bit off. This is where you may end up adding some fuzzy logic to say, eehh, close enough.
E.G.:
/* This will fail at times.*/
if (doubleValue == 0) {...}
/* This will be a bit better.*/
if (FABS(doubleValue) < SOME_SMALL_DOUBLE_CLOSE_TO_ZERO) {...}
Also when converting to an integer, you need to be aware that 2.9999 will translate to 2 while 3.00001 will translate to 3. You may need to do some rounding to get the right 'integer' value from the float/double.
Weee, fun with floating point types.
...but according to the IEEE standard, dobles can represent all integers up to 53^2 exactly?
If that's right then I wasn't aware of it. So my choice of 3 was invalid. Appologies.
Zekaric Wrote:If that's right then I wasn't aware of it. So my choice of 3 was invalid. Appologies.
Not to beat a dead horse, but I haven't been on in a while, and need to address the question of ints/floats/doubles, etc.
All integers (upto a certain value that differs for float and double) can be EXACTLY represented by a float. So, the number 3 can be EXACTLY represented. Additionally, all integer arithmetic that will stay in the limits will also work exactly (provided there would be no remainder from divides). The problems with:
Code:
if (fabs(val) - 2.0 < SOMETHING_SMALL)
//etc...
are when doing math where the results are not exact integers.
Actually, it is not true that integers up to a certain value are exactly represented by floats. They can be, but usually aren't. You can represent integers exactly in a denormalised float, but typically not in a normalised one. Usually, your floats will be normalised, so they are off.
This little annoyance only matters in comparisons, all you have to do is to cast everything to float (eg write 2.0f not 2.0), and then compare. That should give the expected results.
This little annoyance only matters in comparisons, all you have to do is to cast everything to float (eg write 2.0f not 2.0), and then compare. That should give the expected results.
Possibly Related Threads...
Thread: | Author | Replies: | Views: | Last Post | |
float value changes when passed to function | kendric | 5 | 4,035 |
Nov 15, 2009 01:57 PM Last Post: kendric |