One crucial thing to notice for power-of-two integers, is that in their binary representation, they only have 1 bit set.
0001 = 1
0010 = 2
0100 = 4
1000 = 8
... and so on...
So one approach we can do to check if an integer is a power of two, is to just loop through the bits, and check if only 1 bit is set.
bool isPow2(uint n) {
const uint len = sizeof(uint)*8; // 32 for 4-byte integers
uint count = 0;
for(uint i = 0; i < len; i++) {
count += n & 1;
n >>= 1;
}
return count == 1;
}
Now the above approach is pretty obvious and simple; but its not that nice considering we have to loop for every bit (32 times for 4-byte integers).
There's a popular bitwise trick for determining if an integer is a power of 2, and it looks like this:
bool isPow2(uint n) {
return (n & (n-1)) == 0;
}
Now this is a lot nicer than our loop version. Instead of looping 32 times, we do just a few bitwise calculations.
I was playing around with bitwise operations earlier today, and I discovered another bitwise method for checking powers of 2.
bool isPow2(uint n) {
return (n & -n) == n;
}
I was really excited because I found this one out on my own, and I thought I had discovered it. However I did a google search on it, and I found out that this trick is already known :(
The nice thing about this second bitwise version, compared to the more popular version above it, is that its a lot simpler to remember IMO. Speed-wise, they're both around the same, I suppose it depends on your target architecture on which one is actually fastest, but in practical purposes it probably doesn't matter which one you use.
Note:
Its important to mention that both of the bitwise methods mentioned above, will incorrectly treat zero as a power of two.
To fix this behavior you can modify the functions like so:
bool isPow2(uint n) {
return n && ((n & (n-1)) == 0);
}
and...
bool isPow2(uint n) {
return n && ((n & -n) == n);
}
Sadly this adds an extra conditional to our fast bitwise methods, however it still ends up being a lot nicer (and faster) than our initial loop version.
SSE4.2 (nehalem+ (aka core i3/5/7)) and SSE4a (amd 10h) processors support 'popcnt' operation, i.e. population count. Its result is the number of set bits in bit string. This operation might also be supported by other processors, as there is cpuid bit just for that.
ReplyDeleteHaven't checked it myself with documentation, but if you need more than one variable to be exactly a power of 2, this might prove faster. For one shot it would probably be as fast as the final versions you presented, due to branch 'if (hasPopCnt) {...} else {...}'.
Yeh popcnt is probably like ~2 times faster since you can just do something like:
ReplyDeletebool isPow2(uint n) {
return _mm_popcnt_u32(n) == 1;
}
The main benefit is that you're only doing 1 comparison whereas the versions I showed have to do 2 comparisons (they need to check if the value is non-zero to return the correct result when n==0).
Of-course the main drawback is that popcnt is only available on modern machines, and not a lot of people have SSE4.2 or SSE4a currently.
And yes there's some cpuid bit to check for support of this instruction individually, but i wonder what processor would support this instruction and not support SSE4.2 or SSE4a (i guess maybe some cpu aimed for netbooks).
Note that the second method depends on the internal representation of negative integers. In some rare cases (when negative integers are not represented as 2's complement) this method won't work.
ReplyDelete