Trash Can of Code: 32bit Floats - Integer Accuracy and Interesting Consequences

Sunday, August 22, 2010

32bit Floats - Integer Accuracy and Interesting Consequences

There are big misconceptions I've seen regarding the accuracy of floats.
One professor of mine even went as far as to say, the computer might not store "2" as "2.0", but rather "2.000019" if using floats.
This is not true.

32-bit floats can actually represent quite a large amount of integer values 100% accurately.

The exact integer-range a 32bit float can represent accurately is -16777216 to 16777216.
This means that it is roughly equivalent to a 25-bit signed integer, which has the range -16777216 to 16777215.

If you only care about positive values, then the float is equivalent to a 24-bit unsigned integer, which has the range 0 to 16777215.

I commonly see these values messed up, with people saying a float can only represent a 24-bit signed integer (which would be -8388608 to 8388607, which is wrong).

Below I made a test case to prove the floating point range by exhaustive search.


int testFloatRange(bool pos) {
    volatile int    i = 0;
    volatile float  f = 0.0f;
    volatile double d = 0.0;
    for (;;) {
        volatile double t = (double)(float)i;
        if ((double)f != d || (int)f != i || t != d) break;
        if (pos) { f++; d++; i++; }
        else     { f--; d--; i--; }
    }
    printf("%f != %d\n", f, i);
    return pos ? (i-1) : (i+1);
}

int _tmain(int argc, _TCHAR* argv[]) {
    int p = testFloatRange(1);
    int n = testFloatRange(0);
    printf("Positive Range = 0 to %d\n", p);
    printf("Negative Range = %d to 0\n", n);
    printf("Full Range = %d to %d\n", n, p);
}

Now its pretty interesting what happens once a 32-bit float reaches its limit of 16777216.
If you try to increase the float by 1 when it has this value, the float will actually stay the same. This means if you try to increment a float by 1 in a loop, you will never get past 16777216! It will just get stuck in an infinite loop.

Here is some proof of that:


int _tmain(int argc, _TCHAR* argv[]) {
    volatile float f = 0xffffff-100; 
    for( ; f < 0xffffff+100; f++) {
        printf("Value = %f, Binary Representation (0x%x)\n", f, (int&)f);
    }
}

The programs output is:


...
Value = 16777214.000000, Binary Representation (0x4b7ffffe)
Value = 16777215.000000, Binary Representation (0x4b7fffff)
Value = 16777216.000000, Binary Representation (0x4b800000)
Value = 16777216.000000, Binary Representation (0x4b800000)
... Keeps repeating the last line infinitely...

Admittingly, I didn't know this infinite looping behavior until I made the test-case. This is something you should definitely watch out for.

Oh and btw, you might be wondering why I was using "volatile float" in the above test-cases, instead of just "float". The "volatile" keyword is useful to use when we need to compare floats with their exact precision. I'll probably explain why in another article :)

7 comments:

UnknownSeptember 3, 2010 at 5:51 PM
Wow! Nice research about floating point range.
I've always wanted to see it plain simple.
If you google it, you get a lot of advanced floating point theory, but nothing plain and clear as you did here.

I did know about the infinite loop case.
The reasoning behind it is the meaningful of the change in respect to the relative value of the variable. Once a floating point becomes too large, adding one unit won't change much.

Think of it as money. If you're a billionaire, if you have $999.999.998 or $999.999.999 in your bank account won't affect you at all. Probably the bank stole you a dollar, but you won't care.

But if you're very poor, having $2 in your pocket is much better than having $1

That's why when you have 16777216 + 1, it will give you 16777216 again. Because there's just not enough accuracy, but that doesn't care! (relatively speaking)
More over, I'm pretty sure if you do 16777210 + 0.1, it will give you 16777210 again, even though it's smaller than 16777216; but this happens because there's not enough precision to represent 16777210.1, for floating point the in between number it's either 16777210 or 16777211.

In order to keep increasing 16777216, you'll need to increase it by larger amounts.

The main characteristic of floating points is their ability to shifts precisions towards big numbers or small numbers. If you have small numbers: 0.123877 and 0.23877, adding them will result in an accurate result, but if you then add them to 16777210, all the precision is shifted toward big numbers and the small ones are lost.

As for your teacher saying 2 being represented as 2.000019 is really shameful! Unless we're dealing with 2 bit floats or something!
I want to believe he just said that to make his students aware of the problems of floating point, exaggerating it. Not because he's ignorant (I want to believe.........)

It is true though, due to their base 2 nature, no matter how much precision you use, numbers like "0.1" can't be accurately represented (unless you use base 10 floating points) and will be represented as 0.1000000014901.... wikipedia has a nice article about that.

Cheers
Dark Sylinc
ReplyDelete
Replies
cottonvibesSeptember 4, 2010 at 12:54 AM
Hey Dark Sylinc, nice to hear from you again.

"If you google it, you get a lot of advanced floating point theory, but nothing plain and clear as you did here."

Yeah i remember having googled it before, and never getting a simple answer, as well as seeing a lot of people giving bad information.

"As for your teacher saying 2 being represented as 2.000019 is really shameful! Unless we're dealing with 2 bit floats or something!
I want to believe he just said that to make his students aware of the problems of floating point, exaggerating it. Not because he's ignorant (I want to believe.........)"

Sadly I'm pretty sure that he honestly didn't know. This was my Programming II professor, so i don't blame him too much since he wasn't teaching an advanced programming class.
Sadly the shape of some of my advanced programming professors hasn't been much better, roughly 1/3 of the professors I've had actually had good programming knowledge.

A problem i see with a lot of the professors, is that they just learn enough to teach the class, but don't expand their knowledge to fully understand the subject.
I suppose this is one of the traits that separates a good professor from a mediocre one.
ReplyDelete
Replies
UnknownSeptember 4, 2010 at 5:33 AM
"One professor of mine even went as far as to say, the computer might not store "2" as "2.0", but rather "2.000019" if using floats.
This is not true."
Really this is not true? What you fail to recognize is that you are talking about IEEE754 floats and another implementation could store a float any way it specifies.

"The "volatile" keyword is useful to use when we need to compare floats with their exact precision. I'll probably explain why in another article :)"
Please do explain this comment. Volatile just says to the compiler not to cache the value as it maybe change by some unknown means.
ReplyDelete
Replies
cottonvibesSeptember 4, 2010 at 11:07 PM
"Really this is not true? What you fail to recognize is that you are talking about IEEE754 floats and another implementation could store a float any way it specifies."

Although its true i'm talking about IEEE-754 floats, every architecture I've ever seen follows the same layout for 32bit floats.
I've dealt with architectures that don't follow the IEEE-754 standard, but the layout is still 1-bit sign, 8-bit exponent, and 23-bit mantissa.

It will be foolish to deviate from this pattern, and therefor unless you're dealing with a ridiculous architecture, this post will apply for 32bit floats.

"Please do explain this comment. Volatile just says to the compiler not to cache the value as it maybe change by some unknown means."

When the compiler generates code using the x87 FPU, it will sometimes do operations with the full 80-bit precision the FPU has, even though you're only using 32-bit floats.
If you specify volatile, it forces it to write-back and read from memory, which forces the truncation to 32-bit.
I will make a blog post about this in the future, with some examples to show what I mean.
ReplyDelete
Replies
Martin KunevJune 4, 2012 at 8:04 PM
Very good article. Like others, I just wanted to see this explained simple.

I'm still trying to grasp how the CPU manages to store exact values between -16777216 and 16777216 in 32-bit float (instead of the expected -8388608 to 8388608).
ReplyDelete
Replies
Martin KunevJune 8, 2012 at 7:50 PM
Update: I got it. These numbers are stored in normalized form. This means that the first bit is assumed to be 1 and it's not stored anywhere. This is explained pretty well in wikipedia:
http://en.wikipedia.org/wiki/IEEE_754-1985
ReplyDelete
Replies
cottonvibesJune 18, 2012 at 6:32 AM
Yup that's the reason; sorry for the late reply, glad you were able to figure it out :D
ReplyDelete
Replies

Add comment