Friday, December 24, 2010

Append integer to string (c++)

In this blog article we'll be using operator overloading with strings to let us append integers to the end of them (something pretty useful which std::string doesn't do for us already! :/)

We will be using some code described in a previous article to do the actual conversion from int to string.

We will be overloading the "<<" operator so that we can append to a string, and we will use the "+" operator so that we return a copy of the result of appending a string.

So it will work as follows:

int main() {
string str0("hello");
string str1("hello");

cout << str0 << endl; // prints 'hello'
str0 << 123;
cout << str0 << endl; // prints 'hello123'

cout << str1 << endl; // prints 'hello'
str1 + 123; // does not modify str1 (effectively does nothing)
cout << str1 << endl; // prints 'hello'
cout << str1 + 123 << endl; // prints 'hello123'
return 0;
}


Note: The reason we're overloading '<<' instead of '+=', is because '+=' is not an operator that allows for global operator overloading. See this link for more details on that.

So how do we make the above code work?
We need to add the following above the 'main()' function:

template<typename T>
string toString(T t) {
stringstream s;
s << t;
return s.str();
}

string& operator<<(string& s, int i) {
return s.append(toString(i));
}

string operator+(string s, int i) {
return s.append(toString(i));
}


Using the above overloads, the code in the main() function compiles and prints what we wanted it to.

Notice that the '<<' operator overload takes a reference to a string as one of its parameters, this is done so that the actual string is modified. The '+' operator overload takes a copy of the string and then does the appending, so the original string is not modified.

We can extend the above code to work for floats, doubles, and any datatype that our toString() function accepts. One way we do that is to use templates on our operator overloads.

Here's the new code:

template<typename T>
string& operator<<(string& s, T i) {
return s.append(toString(i));
}

template<typename T>
string operator+(string s, T i) {
return s.append(toString(i));
}


Although the above code works and is cool, there is a downside to doing this.
It won't allow us to do stuff like: string("s") + "hello", anymore. If we try to do that the compiler will generate a ambiguity error because it doesn't know which overload to choose. The +(string, char) overload is already defined by standard strings and our above template operator overload also handles this case; so the compiler doesn't know which to use and generates an error.

Our solution then is to not use templates and just manually overload the operators for the specific types on the + operator, but to use templates for the << operator since the c++ standard doesn't overload the <<(string, char) operator.
This is the code I currently use in my projects:

template<typename T>
string& operator<<(string& s, T i) {
return s.append(toString(i));
}
string operator+(string s, int i) {
return s.append(toString(i));
}
string operator+(string s, float i) {
return s.append(toString(i));
}
string operator+(string s, double i) {
return s.append(toString(i));
}


So there you have it, you can now do stuff like:

double myDouble = 123.79;
string s("My double is ");
s << myDouble << ". My int is " << 12 << ".";
cout << s << endl; // prints 'My double is 123.79. My int is 12.'

// this prints the same thing as above but uses
// printf() and C-style strings (null-terminated char arrays)
printf("%s", s.c_str());

Returning Arrays by Reference in C++

Last article we talked about returning arrays, but we only did it by value (so that a copy of the data is returned). This time we'll look at returning arrays using references.

This feature is something that I rarely see used in practice, I think partially because the syntax is so bizarre and confusing to those that have never seen it. I first saw it in some code by shuffle2 and thought it was crazy; but now I understand why the syntax is the way it is.

Here's how to do it:

int testArr[5] = { 1000, 1, 2, 3, 4 };

int (&retArr())[5] {
return testArr;
}

int main() {
int (&arr)[5] = retArr();
cout << arr[0] << endl; // prints 1000
return 0;
}


The 'retArr()' is the function which is returning a reference to an int array with 5 elements. If you wanted any parameters for the function 'retArr()' you can place them in the '()' like you normally would.

The seemingly awkward syntax becomes less awkward when you think of the syntax for declaring references to an array.
'int (&arr)[5] = ...' is how you would declare a normal reference to an array, so studying that syntax and then looking at the function prototype 'int (&retArr())[5]' should help you understand it.

Also why is returning a reference to an array useful?
Well its useful for various reasons, but I'll give an interesting example.

Assume that you have a memory buffer that was dynamically allocated (for w/e reason, maybe a custom allocation routine to guarantee alignment), then you want that buffer to be thought of as a fixed-size array; well you can do that by returning a reference to an array.

Like so:

struct myStruct {
int* myPtr;
myStruct() {
myPtr = new int[5];
myPtr[0] = 1000;
myPtr[1] = 1;
myPtr[2] = 2;
myPtr[3] = 3;
myPtr[4] = 4;
}
~myStruct() {
delete[] myPtr;
}
int (&getArray())[5] {
return (int(&)[5])*myPtr; // cast to a reference
}
};

int main() {
myStruct testStruct;
cout << testStruct.getArray()[0] << endl; // prints 1000
cout << sizeof(testStruct.getArray()) << endl; // prints 20
return 0;
}


The benefit of returning an array by reference instead of just an int pointer in the above code is that the former explicitly tells the programmer reading the function signature that the buffer being returned holds 5 ints, returning an int pointer instead will not tell us how much elements are in the array.
Also sizeof(testStruct.getArray()) returns 5*sizeof(int), whereas if we were returning an int pointer, it would just return sizeof(int*), which most-likely isn't what we wanted.

Sunday, December 19, 2010

Returning arrays in C++ (including multi-dimensional arrays)

Last article we were talking about passing arrays as arguments and it got a bit long, so this time I'm going to try to keep it shorter.

First I'll say that like when we passed an array by value in the last article, there is no built in easy way to return arrays by value, so we have to use workarounds.

The first thing I'll show is some WRONG CODE that beginners might try to use:

int* getArray() {
int my_arr[10] = { 0,1,2,3,4,5,6,7,8,9 };
return my_arr;
}

int main() {
int* arr = getArray();
arr[1] = 4;
for(int i = 0; i < 10; i++) {
cout << arr[i] << endl;
}
return 0;
}


Can you see the error?
Well the problem is that in the function getArray() the array 'my_arr' is being allocated on the stack, and then you are returning a pointer to it. By the time you're back in the function main() 'my_arr' is not guaranteed to be valid anymore so using it as if its valid will cause undefined behavior. So don't do this.

Now lets look at some correct ways to do this.
One way is to use dynamic memory allocation to allocate the array on the heap and then pass a pointer to it, and then consider that memory as an array.

It looks like this (also note that I'm going to use a multi-dimensional array so people know how to do this):


const int n = 5;
const int m = 5;

int* getArray() {
int* arr = new int[n*m];
for(int i = 0; i < n*m; i++) arr[i] = i;
return arr;
}

int main() {
int* ptr = getArray();
int (&arr)[n][m] = (int(&)[n][m])*ptr;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << arr[i][j] << endl;
}}
delete[] ptr;
return 0;
}


In the function getArray() we allocate a chunk of memory the size of n*m*sizeof(int) bytes. Then we initialize it to the numbers 0...n*m-1, then we return it as an int*.
Back in the function main() we get the ptr and then consider it as a multi-dimensional array by using references. Here you can see how to cast a pointer to an array (for single-dimensional arrays just ignore the extra [m] part).
Lastly we need to remember to delete[] the ptr, because we allocated this memory on the heap so it won't automatically delete it for us.


Since the above method has us needing to delete[] our memory ourselves, it is slightly inconvenient. This last approach doesn't have that problem, we use a struct like we did when we passed arrays as arguments in the last article.

const int n = 5;
const int m = 5;

struct arrayStruct {
int arr[n][m];
};

arrayStruct getArray() {
arrayStruct t;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
t.arr[i][j] = i*m + j;
}}
return t;
}

int main() {
arrayStruct a = getArray();
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << a.arr[i][j] << endl;
}}
return 0;
}


When we use a struct to encapsulate the array, c++ creates a default copy constructor that will copy our data created in getArray() to our 'a' struct in main().

If we didn't want to access the array by using 'a.arr[i][j]', we could again use references like so:

int main() {
arrayStruct a = getArray();
int (&arr)[n][m] = a.arr;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << arr[i][j] << endl;
}}
return 0;
}


We have now seen a couple ways to do this, there are more ways to accomplish this task but this blog post would get huge if I continue listing the various ways.

Saturday, December 4, 2010

Passing Arrays as Arguments in C++

In C++ and other programming languages there is the concept of Arrays.
One of the things you might want to do with arrays is pass them to another function so that that function can read the data in the array.

The simplest thing you might think of to accomplish this task might be:

void foo(char* a) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


And this does work. What its doing here is passing the pointer to first element in 'my_array' to the function 'foo', and 'foo' uses it as a normal char*.


The problem with this approach is if you want to specify that you want an array of a certain size as the parameter to the function. Since we're treating the array as a pointer, there is no hint to the programmer of the size needed for the array.

There is a feature that originated in C which lets us specify a size parameter, and it looks like this:

void foo(char a[128]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


Now this method is pretty evil in C++ (although in C I guess its more justifiable since it doesn't support the better way which i will explain later).

Now in the function 'foo' you are hinting to the programmer that you want an array with 128 elements, but guess what happens if you give it an array of 10 elements?
Nothing happens, its perfectly fine accepting that.

What happens if you pass it char* instead of an array of char?
Nothing happens, its perfectly fine accepting that.

Also notice what this function prints out. It prints out '4' for sizeof(a)!?

Any experienced coder will know that sizeof(some_array) will be the size of one element times the number of elements. So why is it giving us '4' instead of '128'?

Well it turns out that this feature is equivalent to the first method we used, that is it is as if we're passing a "char*", not an array by reference (as we may have thought).

So once we realize that, it all makes sense, sizeof(char*) on a 32bit system is '4', so that's how we got that number.

For these reasons this approach is what I would call 'evil'. You would expect it to behave a certain way, but it doesn't.

There is a special form of the above C-feature that looks like this:

void foo(char a[]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


This time in the function 'foo' we didn't specify a size of the array.
I don't think there's any reason to use this form as opposed to 'char* a' since they both behave the exact same way. And this time you're not even hinting to the programmer how big you want the array, so its kind-of pointless to have this notation.
'char a[]' might look cool though compared to just using 'char* a', so maybe that's why someone might want to use it. Although i would just recommend using 'char* a' since its more commonly seen (and therefor easier to read imo).


Now that we learned all these bad ways to pass arrays in C++, lets look at a good way.
Passing an array by reference! (Before we were just using different forms of passing a char*, but this time we will pass the array by reference and the compiler will understand that it is array).

It looks like this:

void foo(char (&a)[128]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 128
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


Aha! Finally the sizeof(a) is printing out 128 (what we expected it to).
Now 'a' is behaving like an array instead of 'char*' and that is what we wanted.

Now guess what happens if we try to pass an array of 10 elements to function 'foo'?
We get a compiler error! The compiler knows that an array of 10 elements is not an array of 128 elements, so it gives us an error.

Now guess what happens if we try to pass a char* to the function 'foo'?
We get a compiler error! The compiler knows that a char* is not the same as an array of 128 elements, so it gives use an error.

The compiler errors are useful in order to prevent bugs by people who mistakenly are passing arrays of incorrect size to the function.
In another article I will explain ways to circumvent such compiler errors when you 'know' for sure that the pointer/array you're passing is suitable for the function 'foo' but may not have the same type (such as an array of 1000 elements, whereas the function foo will only accept an array of 128 elements).



We have learned how to pass arrays using pointers and references (which means that any data of the array 'a' that was modified in function 'foo' modifies the data in 'my_array'), now we will learn how to pass an array by value (which means that a copy of the array data will be transferred via the stack to function 'foo', so that modifications to 'a' will not effect 'my_array').

Now here's the funny part about this, you can't do it! At least there's no fancy parameter declaration that lets you do this.

There are various workarounds for this problem, and one of them is to create a struct which will act as the array, and then pass the array as that struct on the stack.

For example:

struct temp_struct {
char a[128];
};
C_ASSERT(sizeof(temp_struct)==sizeof(char)*128);

void foo(temp_struct t) {
char (&a)[128] = t.a; // reference to the array t.a
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 128
}

int main() {
char my_array[128] = {0};
foo((temp_struct&)my_array);
return 0;
}


Notice what we did here. We created a struct 'temp_struct' which holds only a single array of the size we're passing. (I will explain what the C_ASSERT does in a bit).
Then we made foo() take as a parameter the 'temp_struct' that we defined earlier.

In the first line of foo's body, we create a reference to the first element inside the foo struct. Then we use this array reference like normally.

Back in the function main(), we need to typecase my_array as a reference of type temp_struct. Basically what we're saying to the compiler is that this array should be treated as if it were a temp_struct, without doing any conversion of the data.
Now the last thing is, since the compiler thinks my_array is a temp_struct, it will copy over the array on the stack (like it would do for any other struct that was passed by value). So with that we have completed our goal.

C_ASSERT is a compile-time assert, and is useful for things like making sure structs you've declared are the size you expected them to be.
The C_ASSERT in the above example is added just to make sure the compiler is generating the struct the same exact size as the array we're dealing with.
If the compiler didn't make the struct the same size, then we would get a compiler error.
I think that a compiler will never end up breaking that C_ASSERT. But I don't know the full c++ standard well enough to guarantee that will never happen, so that's why I add the check in the first place.

Anyways hopefully this article was informative, and by now you should know how to pass an array by treating it as pointer, pass an array by reference, and pass an array by value.

Thursday, December 2, 2010

Simple Tic Tac Toe Game (c++)

I've seen a bunch of random people learning programming starting out with Tic Tac Toe games. It is an interesting challenge for beginners and I think its probably a great first-challenge to try once you think you're getting the hang of the language basics.

I myself had never made a Tic Tac Toe game, but after seeing so much beginners having trouble with it, and seeing their code filled with un-necessary "bloat", I decided to try making a lean C++ console based tic tac toe game.

The goal was to make a functional Tic Tac Toe game without the hundreds of lines other people's code usually takes.

In my head I was thinking it might even be possible to do in as few as 30 lines, but that was a bit too optimistic for me in practice.
Although if I didn't handle bad-cases, and didn't format the output nicely, I could probably do that; but the quality of a program should not be sacrificed for "less lines" of code, so I decided to handle all the bad-cases and such I'm aware of (which in turn increased the amount of code the program needed).

Anyways here's the full code:

#include <conio.h>
#include <iostream>
using namespace std;

void ticTacToe() {
char w = 0, b[9] = { '1','2','3','4','5','6','7','8','9' };
char player[][9] = { "Player O", "Player X" };
unsigned int slot = 0, turn = 1, moves = 0;
for(;;) {
system("cls");
cout << "Tic Tac Toe!" << endl << endl;
cout << " " << b[0] << "|" << b[1] << "|" << b[2] << endl << " -+-+-" << endl;
cout << " " << b[3] << "|" << b[4] << "|" << b[5] << endl << " -+-+-" << endl;
cout << " " << b[6] << "|" << b[7] << "|" << b[8] << endl << endl;
if (w || (++moves > 9)) {
if (w) cout << player[w=='X'] << " is the winner!!!" << endl << endl << endl;
else cout << "No Winner!!!" << endl << endl << endl;
cin.clear(); cin.ignore(~0u>>1, '\n'); _getch();
return;
}
cout << player[turn^=1] << " Choose a Slot... ";
cin >> slot;
if (slot < 1 || slot > 9 || b[slot-1] > '9') {
cout << "Please Choose A Valid Slot!!!" << endl;
cin.clear(); cin.ignore(~0u>>1, '\n'); _getch();
turn^=1; moves--;
continue;
}
b[slot-1] = turn ? 'X' : 'O';
((((b[0]==b[1]&&b[0]==b[2]&&(w=b[0])) || (b[3]==b[4]&&b[3]==b[5]&&(w=b[3]))
|| (b[6]==b[7]&&b[6]==b[8]&&(w=b[6])))||((b[0]==b[3]&&b[0]==b[6]&&(w=b[0]))
|| (b[1]==b[4]&&b[1]==b[7]&&(w=b[1])) || (b[2]==b[5]&&b[2]==b[8]&&(w=b[2])))
||((b[0]==b[4]&&b[0]==b[8]&&(w=b[0])) || (b[2]==b[4]&&b[2]==b[6]&&(w=b[2])))));
}
}

int main() {
for(;;) ticTacToe();
return 0;
}


The whole program ended up being 40 lines of code, which since it handles bad-cases, probably isn't that bad.

I would like to point out how few If statements or Switch Statements are needed for a tic tac toe game as seen above.
I usually see tic tac toe code examples filled with If/Switch statements that simply aren't necessary and make the code a lot bigger.

If you dislike the code above then I somewhat agree that the code could be prettier. Since the goal was to keep the amount of code to a minimum, it limited me in my code cleanliness.

If you notice, in my coding style there are times where I group more than one statement on the same line of code.
This is my personal preference when it comes to short statements that go hand-in-hand with each other. There are some programmers that don't like this style of mine, and I respect that, but I like code that is structured pretty, and grouping similar short statements allows me to accomplish nicer looking code IMO.

I kind-of lied with the title of this blog post though. Although this program is 'simple' in terms of 'little code', it isn't very 'simple' in terms of the ability for someone that isn't experienced with c++ to understand.
I used some tricks which newer c++ programmers may have trouble with; and if you do wish to understand or ask about part of the code above don't hesitate to leave a comment.

Wednesday, November 24, 2010

Virtual Key Code to String

If you ever used GetAsyncKeyState() or GetKeyState() or any other windows function using virtual keys, this post might be useful to you.

There are certain situations where you might want to print out a string representation of the virtual key you're dealing with, and from what I've researched, I don't think there's an API function call that will do this for us, so I made my own.

The following function will take as input a Virtual Key Code (e.g. VK_RETURN) and then return the string representation of the key (e.g. "VK_RETURN"). Of course it works for 'A' to 'Z' and digits as well (there are no VK_* macros for such virtual keys because they're the same as the ascii char representation).

Here's the function, Its a big one so I've hidden it by default.
Show Code


I ended up needing this on my NES emulator in order to display mapped-keyboard settings. That's probably where other people will need it too.


In case you've never seen something like this and you're wondering how it works, it uses a powerful and very useful feature of the C/C++ preprocessor called 'stringification'.

Essentially you take convert arguments to a macro into string representations.
Here's more info:
http://gcc.gnu.org/onlinedocs/cpp/Stringification.html

I took the winapi macros for the virtual keys and wrapped them in a macro that converts them to strings. Mostly was just copy-paste and regex replace work.
Anyways have fun :)

Monday, October 11, 2010

Easy generic toString() function in C++

In c++ you can make a simple toString() template function that converts primitives to a string.


#include <iostream>
using namespace std;

template<typename T>
string toString(T t) {
stringstream s;
s << t;
return s.str();
}


This will work for any type that defines the << operator for use with streams.

So with this simple one function we have an "int to string" function, a "float to string" function, a "double to string" function, etc...

Using it is of course simple:

int main() {
string s0 = toString(100);
string s1 = toString(100.1f);
string s2 = toString(100.12);
cout << s0 << " " << s1 << " " << s2 << endl;
return 0; // Prints out "100 100.1 100.12"
}

Monday, September 27, 2010

Fun Shader Bugs

Yesterday I ran into a funny bug when implementing an Eagle2x pixel shader.

I think the picture speaks for itself xD


(Click the picture to see the magnified version and spot the bug :D)


Funny thing is the bug wasn't actually caused by the shader itself, instead it was due to the way I applied the shader. On my emu I had 3 separate textures for Sprite Background, Sprite Foreground, and Tile Background. Then I would render all 3 of these textures to the output buffer. When I applied the shader it applied it to each individual layer (sprite/bg) instead of the picture as a whole. The sprite layers for instance were just a few sprites with a lot of transparency, and running the Eagle2x algorithm on this layer caused artifacts.

The solution then was to combine all 3 layers into 1 texture before applying the shader; using 1-complete texture fixed the cause of the artifacts. I ended up coming up with some pretty cool bitwise-based merge function which combined 3 bitmaps arrays according to priority and transparency without having to use any conditional statements.

void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
s32 m0 = ((s32)(buffer[0][y][x])) >> 31;
s32 m1 = ((s32)(buffer[1][y][x])) >> 31;
s32 m2 = ((s32)(buffer[2][y][x])) >> 31;
m0 &= ~m1 & ~m2; m1 &= ~m2;
buffer[0][y][x] = buffer[0][y][x] & m0;
buffer[0][y][x] |= buffer[1][y][x] & m1;
buffer[0][y][x] |= buffer[2][y][x] & m2;
}}
}


Essentially the code creates masks by checking if the MSB is set (since the most significant byte is 0xff on non-transparent pixels, and 0x00 on transparent pixels, the most-significant bit is set when non-transparent; of-course this code won't work properly with partial-transparencies).
After it creates the masks, it uses some bitwise math to make sure the pixel with the highest priority is the one that is shown, and leaves the resulting pixel in the buffer[0] bitmap layer.
Note: The priority is (buffer[2] > buffer[1] > buffer[0]). So the pixel in buffer[2] is the top-most.

I initially had thought this bitwise version would be faster than a version using conditionals, but I was wrong. The compiled code for this is pretty bloated. Instead the conditional version compiles to much better optimized code:


void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
if (buffer[2][y][x] >> 31) buffer[0][y][x] = buffer[2][y][x];
elif(buffer[1][y][x] >> 31) buffer[0][y][x] = buffer[1][y][x];
elif(buffer[0][y][x] >> 31) buffer[0][y][x] = buffer[0][y][x];
else buffer[0][y][x] = 0;
}}
}


So I guess the lesson learned here is that many times using conditionals are better than complex bitwise operations. Even though the bitwise versions may seem more clever and quicker on first impression.

Note: "elif" in the above code is a macro for "else if"

#define elif else if


I like the elif macro because it keeps code more compact. Using "else if" almost always messes up text alignment and due to having too much letters.

Sunday, September 26, 2010

cottonNES - More Progress...

So I have been working diligently on cottonNES, and have made some progress over my last update.

As far as the core goes, I've been able to pass some more cpu/ppu tests, but my emu still fails many of blargg's timing tests (those things are brutal!). I already switched my cpu core to a cycle-accurate design, this proves to be needed for a few games (Ms. Pacman, Bad Dudes, Baseball Stars 2, etc...). Whats cool is my emu now runs Battletoads and Battletoads & Double Dragons, these 2 games are very timing sensitive and many NES emulators still fail to run them properly (some even crash...). That said, the games still have graphical problems, but at least they go in-game.

Something else I implemented was joypad support. Currently its just hard-coded to match my ps3-controller, but in the future I'll do a fancy configuration screen. I ended up going with SDL for the joypad API because I just wanted something simple to use. Perhaps in the future I will rewrite the code with direct-input since i wasn't very satisfied with SDL.

The last big thing I did was support for pixel shaders! I'm a shader noob, but between yesterday and today I've learned enough of HLSL to make 2 shaders. One shader is a scanline shader, and the other is a Scale3x shader. The cool thing is this offloads the filtering onto the GPU so the CPU can do less work, overall the Scale3x filter was about 54% faster than my C++ based attempt (222fps vs 144fps in a scene). In the Super Mario Bros 2 pic above, I have the Scale3x shader on so you can see what it looks like.

Anyways, I still have a lot more work to do with my emulator before its ready for a release; particularly PPU and APU need more work since they're the cause of most problems I'm having now. I hope I continue to be motivated so that cottonNES can become one of the better NES emulators out there; I feel it has the potential, but it just depends on if I continue working on it.

Thursday, September 16, 2010

NES emu progress

For those that don't know, I had started a NES emu project from scratch a while back, but then I got bored and stopped working on it in favor of some other projects.
This past week I have regained my interest in it and have made some progress.
After some hours of debugging, it now passes all normal opcode tests in Kevin Horton's nestest.nes rom.
Last week it failed most of those tests miserably (turns out I was just setting wrong flags in a few opcodes like BIT and PLP).

I currently have mappers 0(no-mapper),1,2,3,4,and 7 implemented. I think mapper #1 has a few bugs and mapper #4 needs to have its IRQ timer code redone later on to be more accurate (but its okay for now).
Mappers 2,3, and 7 are kind-of fun because they're so easy to implement. Each can be done in just a few lines of code.

For example, my Mapper #2 implementation is just:

class Mapper2: public Mapper {
public:
Mapper2() { Init(); }
virtual ~Mapper2() { Close(); }
void WritePROM(u16 addr, u8 value) {
clamp(value, ROM.romBanks, "Mapper2: PROM");
PROM_SLOT[0] = _PROM[value*2+0];
PROM_SLOT[1] = _PROM[value*2+1];
}
};


Something I've noticed when browsing through other NES emu's source-code, is that many of the emulators have horribly messy code. There is one popular emulator that uses OOP too much. When this happens it fucks up intellisense and makes browsing code a PITA. You right click on any method to go to the definition and intellisense finds ~50 possible matches and makes you chose manually which method is the one you're looking for. This always happens in projects that use too much OOP and is a very annoying problem.

Another popular emulator does the opposite; it doesn't use any OOP even though the code is now in c++ (I guess originally it was in C). Anyways the whole code is very C-like and abuses macros like crazy, this makes everything a mess. Its a common mistake to abuse macros; I did it when I started with c++, but any good coder knows that you should only use macros when there's no better way. Oh and the code uses 2-space tabbing which is ridiculous.

On the subject of macros, its more understandable to see macro usage in C code compared to C++ code. This is because C is very limited compared to C++, and there is a lot more stuff you can't do so you use macros to simulate these features. In C++ however you have templates and references which can help you remove the need for many macros. This is a big reason why C++ is better than C.
Many C-programmers move on to C++ and still code C-like because they don't know proper C++, that could be the case with the nes-emu I looked at. Anyways I suppose I'll save a C/C++ comparison rant for another blog post.

So back to talking about my emulator, cottonNES...
My emulator is no-where near ready for a release. Although it boots hundreds of games now that the big cpu bugs are fixed, it still has problems in many games (most-likely to do with PPU and timing issues...).

I would like to continue discussing nes emulation further here, but sadly I have an exam this week and need to study D:

I suppose this post will be boring without screen-shots :D



Saturday, September 4, 2010

Using the volatile keyword to prevent float precision problems

Consider the following code:


int main() {
for(float i = 0; i < 100; i++) {
float f = i / 100.0f;
if (f != i / 100.0f) {
cout << i << ", ";
}
}
cout << endl;
return 0;
}


One should expect the above code to not print out anything, since the 2 operations are obviously equal.
Ironically, if you compile this code in msvc or gcc, you will most-likely get a bunch of numbers spammed to the console.

Why is this? Well it seems that the compilers take liberties with floating point calculations; even though you're dealing with single-precision 32bit floats, the compilers may use the full 80bit precision of the x87 FPU in calculations, or may even use double-floating point arithmetic for one of the operations, and use single-floating point for another one... mix and matching precisions, causing problems...
In the end, the different precision of the operations will cause the results of two seemingly equal operations to be unequal.

One solution is to use the volatile keyword.
The volatile keyword essentially makes the compiler always write-back values to memory once an operation is preformed, and it always reloads the value from memory when it will be used again.
This allows us to force the compiler to truncate the values to 32bit floats when it writes them to memory, and when it loads the value again it will be limited to the precision of a 32bit float.

So now we can do:

int main() {
for(float i = 0; i < 100; i++) {
volatile float f1 = i / 100.0f;
volatile float f2 = i / 100.0f;
if (f1 != f2) {
cout << i << ", ";
}
}
cout << endl;
return 0;
}


This will fix the problems we were having before, and not print out anything.

There are many papers and discussions about ways to compare floating point values, and I won't go into more detail here.
This site lists a few methods, and the problems they could have.
So I suggest reading that if you're interested.

Sunday, August 22, 2010

32bit Floats - Integer Accuracy and Interesting Consequences

There are big misconceptions I've seen regarding the accuracy of floats.
One professor of mine even went as far as to say, the computer might not store "2" as "2.0", but rather "2.000019" if using floats.
This is not true.

32-bit floats can actually represent quite a large amount of integer values 100% accurately.

The exact integer-range a 32bit float can represent accurately is -16777216 to 16777216.
This means that it is roughly equivalent to a 25-bit signed integer, which has the range -16777216 to 16777215.

If you only care about positive values, then the float is equivalent to a 24-bit unsigned integer, which has the range 0 to 16777215.

I commonly see these values messed up, with people saying a float can only represent a 24-bit signed integer (which would be -8388608 to 8388607, which is wrong).

Below I made a test case to prove the floating point range by exhaustive search.


int testFloatRange(bool pos) {
volatile int i = 0;
volatile float f = 0.0f;
volatile double d = 0.0;
for (;;) {
volatile double t = (double)(float)i;
if ((double)f != d || (int)f != i || t != d) break;
if (pos) { f++; d++; i++; }
else { f--; d--; i--; }
}
printf("%f != %d\n", f, i);
return pos ? (i-1) : (i+1);
}

int _tmain(int argc, _TCHAR* argv[]) {
int p = testFloatRange(1);
int n = testFloatRange(0);
printf("Positive Range = 0 to %d\n", p);
printf("Negative Range = %d to 0\n", n);
printf("Full Range = %d to %d\n", n, p);
}



Now its pretty interesting what happens once a 32-bit float reaches its limit of 16777216.
If you try to increase the float by 1 when it has this value, the float will actually stay the same. This means if you try to increment a float by 1 in a loop, you will never get past 16777216! It will just get stuck in an infinite loop.

Here is some proof of that:

int _tmain(int argc, _TCHAR* argv[]) {
volatile float f = 0xffffff-100;
for( ; f < 0xffffff+100; f++) {
printf("Value = %f, Binary Representation (0x%x)\n", f, (int&)f);
}
}


The programs output is:

...
Value = 16777214.000000, Binary Representation (0x4b7ffffe)
Value = 16777215.000000, Binary Representation (0x4b7fffff)
Value = 16777216.000000, Binary Representation (0x4b800000)
Value = 16777216.000000, Binary Representation (0x4b800000)
... Keeps repeating the last line infinitely...



Admittingly, I didn't know this infinite looping behavior until I made the test-case. This is something you should definitely watch out for.

Oh and btw, you might be wondering why I was using "volatile float" in the above test-cases, instead of just "float". The "volatile" keyword is useful to use when we need to compare floats with their exact precision. I'll probably explain why in another article :)

Friday, August 20, 2010

Checking if a Float is a Power of 2

Last article I showed bitwise ways to check if integers are powers of 2, for completeness I'll show bitwise ways to check if floats are powers of 2.

For floating point numbers, remember that they're represented in the form |S*1|E*8|M*23|.
S = Sign Bit (1-bit)
E = Exponent (8-bits)
M = Mantissa (23-bits)

Check out the IEEE-754 standard for more info:
http://en.wikipedia.org/wiki/IEEE_754-1985

The exponent is actually used to compute powers of 2, which is then multiplied by the mantissa, which has an implicitly hidden '1.' in front of it.

So when we're checking if a float is a power of two, all we have to do is check if the mantissa is 0, and the exponent is non-zero (when the exponent is zero, the result is zero when the mantissa is also zero, or a denormal value when the mantissa is non-zero, so we don't want to consider these as powers of two).

The code ends up looking like this:

typedef unsigned __int32 u32;

bool isPow2(float f) {
u32& i = (u32&)f;
u32 e = (i>>23) & 0xff;
u32 m = i & 0x7fffff;
return !m && e;
}


This however will end up counting negative-exponent powers of 2 (such as 2^-1 = 0.5), if this is not desirable, then we can modify the code to only count non-negative exponents (2^0 = 1, 2^1 = 2,...)

The modified code looks like this:

bool isPow2(float f) {
u32& i = (u32&)f;
u32 e = (i>>23) & 0xff;
u32 m = i & 0x7fffff;
return !m && e >= 127;
}


One last thing, both these versions will also consider negative powers of two (such as -4, -2, -1) to be powers of two.
If this is also undesirable, then we just need to check the Sign-bit of the float to determine if its negative, and if it is, then we will return false.

This last function returns true if the float is a positive power of two with a positive exponent (1, 2, 4, 8, 16...).

bool isPow2(float f) {
u32& i = (u32&)f;
u32 s = i>>31;
u32 e = (i>>23) & 0xff;
u32 m = i & 0x7fffff;
return !s && !m && e >= 127;
}

Checking if an Integer is a Power of 2

There are a few ways to check if an integer is a power of 2; some better than others.

One crucial thing to notice for power-of-two integers, is that in their binary representation, they only have 1 bit set.

0001 = 1
0010 = 2
0100 = 4
1000 = 8
... and so on...


So one approach we can do to check if an integer is a power of two, is to just loop through the bits, and check if only 1 bit is set.


bool isPow2(uint n) {
const uint len = sizeof(uint)*8; // 32 for 4-byte integers
uint count = 0;
for(uint i = 0; i < len; i++) {
count += n & 1;
n >>= 1;
}
return count == 1;
}


Now the above approach is pretty obvious and simple; but its not that nice considering we have to loop for every bit (32 times for 4-byte integers).


There's a popular bitwise trick for determining if an integer is a power of 2, and it looks like this:

bool isPow2(uint n) {
return (n & (n-1)) == 0;
}


Now this is a lot nicer than our loop version. Instead of looping 32 times, we do just a few bitwise calculations.

I was playing around with bitwise operations earlier today, and I discovered another bitwise method for checking powers of 2.

bool isPow2(uint n) {
return (n & -n) == n;
}


I was really excited because I found this one out on my own, and I thought I had discovered it. However I did a google search on it, and I found out that this trick is already known :(

The nice thing about this second bitwise version, compared to the more popular version above it, is that its a lot simpler to remember IMO. Speed-wise, they're both around the same, I suppose it depends on your target architecture on which one is actually fastest, but in practical purposes it probably doesn't matter which one you use.

Note:
Its important to mention that both of the bitwise methods mentioned above, will incorrectly treat zero as a power of two.
To fix this behavior you can modify the functions like so:

bool isPow2(uint n) {
return n && ((n & (n-1)) == 0);
}

and...

bool isPow2(uint n) {
return n && ((n & -n) == n);
}


Sadly this adds an extra conditional to our fast bitwise methods, however it still ends up being a lot nicer (and faster) than our initial loop version.

Saturday, July 31, 2010

Small Comparison Optimization = Big Gains

This is a small and pretty well known trick, but its worth mentioning since a lot of high-level coders don't seem to realize it, and it could result in some big speedups.

Lets take a look at some random code snippet:

int main() {
const double len = 10000;
const double fract = 1.0/3.0;
for(double i = 1; i < len; i++) {
for(double j = 1; ; j++) {
if (j/i < fract) {
cout << j << " / " << i << endl;
}
else break;
}}
}


What this code is doing is printing out all fractions that are less than 1/3, for all denominators less than 10,000.

Notice anything that we can optimize?

Well if you take a look at the if-statement, notice we are dividing j by i, and then doing a comparison. But as we should know by now, division is a slow operation.

On Core 2 Duos, double floating-point division has a 6~32 clock cycle latency (exact latency depends on the operand values), whereas multiplication is only 5 cycles, and add/sub are 3 cycles.

So it would be nice if we can omit that division from the comparison... oh wait we can!

Using simple mathematics we can do:

j/i < fract =>
i * (j/i) < fract * i =>
j < fract * i


Now we got the same equation, but we were able to remove the division completely!
So in the worst-case that our division was taking ~32 clock cycles, this new optimization is ~6 times faster.

Now lets see this code in action.
Here is the same problem as above, except we've now increased 'len' to 100,000, and instead of printing out the results, we just count the fractions (this way we're not limited by console-write speed):

int main() {
const double len = 100000;
const double fract = 1.0/3.0;
int ret = 0;
for(double i = 1; i < len; i++) {
for(double j = 1; ; j++) {
if (j < fract * i) {
ret++;
}
else break;
}}
cout << ret << endl;
}



When I ran the above code using "if (j/i < fract)", it took ~19 seconds to finish on my PC.
When I ran the optimized version using "if (j < fract*i), it took ~2 seconds to finish executing! That's an even bigger speedup than predicted.

I hope this shows why optimization is important.

One last thing to note:
When using integers, you might not always want to do this trick.
j/i > x, is not necessarily the same as j > x*i when using integers, since the first one truncates the value (obviously, since integers don't hold the fractional part).
For example:

int main() {
int i = 21;
int j = 2;

if (i/j > 10) cout << "First one" << endl;
if (i > 10*j) cout << "Second one" << endl;
}


The above example only prints out "Second one" since we're using integers (if i and j were doubles, it would print out "First one" and "Second one").

This isn't really a negative, since when using integers you probably want to be using the second-method anyways, so you don't have to deal with converting to floats; in that case, you not only would save time by not doing a division, but you would also save time by not doing an int-to-float conversion.

Wednesday, July 21, 2010

Recursion is not always good!

In my university's computer science curriculum, they seem to put a big emphasis in using recursion to solve things, as opposed to alternative methods (such as loops).

It is indeed true that recursion can simplify the high-level code, and many times it can look beautiful. There is of-course a price to pay with recursion, and that is the extra overhead recursive calls have (therefor they're usually slower than alternative iterative/loop versions).

I will stress that learning to think recursively is indeed a good skill to have, and any good programmer should be able to do so. So in that sense, it is good that computer science curriculums seem to be stressing recursion; however, once you have learned to think recursively, it is again beneficial to start thinking non-recursively when algorithms are simple-enough to do so.

Let's consider one of the most common examples of recursion used, the factorial function! (I have used this function many times in my previous articles to demonstrate things, so shouldn't be a surprise :D)


template<typename T>
T factorial(T n) {
return n <= 1 ? 1 : n * factorial(n-1);
}

int main() {
int n = factorial(5);
cout << n << endl; // Prints "120"
return 0;
}


This does end up looking pretty clean, as we can do the whole factorial function in one line.

Now lets take a look at the iterative/loop version:

template<typename T>
T factorial(T n) {
T ret = n ? n :(n=1);
while(--n) ret*=n;
return ret;
}

int main() {
int n = factorial(5);
cout << n << endl; // Prints "120"
return 0;
}


Now this iterative version is not as 'beautiful' as the recursive version, because as you can see its 3 lines instead of 1.

However the generated code for this version ends up being nicer than the recursive version.

When you make recursive calls, the arguments that are passed to the recursive function need to be pushed on the stack, and then these arguments eventually need to be popped back out as well. Furthermore, not only do the arguments need to be pushed on the stack, but also the return address needs to be pushed. Sometimes the recursion can be nested so-deep, that you end up running out of stack, and get a stack-overflow crash. This extra overhead is the downside of recursion.

The iterative/loop version doesn't have such problems. No function calls are made so you don't have the function call overhead. Instead the loop versions usually end up compiling with a simple short-jump.

Another thing is that the loop version is inline-able, whereas the recursive version likely will not be inlined, and even if it does get inlined, it probably will only inline the first call to the function.

Potentially the compiler might be able to turn a recursive call into a iterative/loop-version, but don't count on this. At least, I haven't seen it happen...


Now, there are some cases where an algorithm will end up being naturally-recursive, and it ends up being difficult or unpractical to code an iterative version. In these cases recursion is the better way to go (in these cases, the algorithm can end up being faster using recursion instead of iteration as well). So in the end, it really depends on the situation whether to go with recursion or iteration/loops; the more experience you get, the more you figure out when to use recursion vs iteration.


Notes:
If you noticed, I was using template-functions for the factorial function. The reason I made it a template, is because the factorial function is something you may want to re-use with different data-types (int, unsigned int, long long int, float, double, some custom BigInteger class, etc...).
Using templates, we only have to code one function that can be reused by these different data-types.

Monday, July 19, 2010

Max/Min Values for Signed and Unsigned Ints

There are a variety of ways you can assign the max/min values to a signed/unsigned integer.

A very professional looking way is using the numeric_limits class like so:

#include <limits>

int main() {
int x = numeric_limits<int>::max();
int y = numeric_limits<int>::min();
return 0;
}


But there are many other ways you can do this; and they all take a lot less typing than using numeric_limits, and you don't have to include any extra headers...

Lets first take a look at max/min integer values for different variables (in hexadecimal representation because its easier to memorize):

// u8, u16, u32... mean unsigned int of 8, 16, and 32 bits respectively
// s8, s16, s32... mean signed int of 8, 16, and 32 bits respectively
----------------------------------
| type | max | min |
----------------------------------
| u8 | 0xff | 0x0 |
| u16 | 0xffff | 0x0 |
| u32 | 0xffffffff | 0x0 |
----------------------------------
| s8 | 0x7f | 0x80 |
| s16 | 0x7fff | 0x8000 |
| s32 | 0x7fffffff | 0x80000000 |
----------------------------------


The table is pretty easy to memorize. Essentially, for unsigned ints, the max value is always a bunch of f's in hexadecimal, and the min-value is always 0.
For signed ints, the max value is always 0x7f, followed by a bunch of f's.
And lastly the min value for signed ints is always 0x80, followed by a bunch of 0's.


One thing to notice is the bitwise representation for "-1" always has all-bits set.
That means that for a 32bit integer, "-1" is "0xffffffff", which is the max value for an unsigned integer.

This means that for unsigned integers, we can declare int max/min like this:

int main() {
unsigned int x = -1u; // Unsigned int max
unsigned int y = 0; // Unsigned int min
return 0;
}


In c++, you can put the suffix "u" at the end of an int literal to mean its unsigned. So what that code is doing, is treating the value "-1" as an unsigned int, which means its assigning 0xffffffff to the variable 'x'.

Here's the thing, I've noticed that some compilers end up generating warnings when you treat a negative number as unsigned in operations, so instead of writing '-1u', we can notice that negative one, is really the same as NOT 0, so we can instead write '~0u'. This will do the same thing, and is generally better to do since it won't generate the compiler errors.

So its nicer to do:

int main() {
unsigned int x = ~0u; // Unsigned int max
unsigned int y = 0; // Unsigned int min
return 0;
}


Now that we know this trick for unsigned ints, lets think of signed integers.
The max value for 32bit signed integers is 0x7fffffff.
If we notice though, 0x7fffffff is really (0xffffffff >> 1).
That is, signed int max, is equal to unsigned int max shifted right by one.
But since we already learned the trick on how to represent unsigned int max easily, we can use that to our advantage:

int main() {
int x = ~0u>>1; // Signed int max
return 0;
}


The last thing to notice for signed ints, is that the minimum value for 32bit signed ints in hex is 0x80000000, but it just so happens that that number is signed int max + 1, that is 0x7fffffff + 1.

So we can end up doing this:

int main() {
int x = ~0u>>1; // Signed int max
int y =(~0u>>1)+1; // Signed int min
return 0;
}


That about does it for integers, for floats its not as simple so I didn't talk about them here. Maybe I'll end up making another blog post about the different floating point representations...

Notes:
When doing the '-1' or '~0' tricks where you shift the value; be sure to do the shift as an unsigned operation! That is either do: ~0u>>1 or (unsigned int)~0>>1, DO NOT do ~0>>1. This will do an arithmetic shift, instead of a logical shift, and make the result be -1, instead of 0x7fffffff.

And in-case you're wondering, these tricks are not 'slower' than typing in the numbers manually. C++ compilers do something called constant-propagation and constant-folding, which converts stuff like (-1u>>1)+1 to a constant at compile-time, and therefore there's no extra overhead compared to typing 0x80000000....

Also, this will not apply to most people, but another thing to note is that some very-old/weird hardware can use one's complement integers (instead of two's complement) or some other funky stuff; when dealing with weird hardware its best to use the numeric_limits class instead of bitwise tricks, because numeric_limits assigns its values based on the specific platform. That-said, I don't even know any hardware that uses one's complement for signed integers, so these tricks are almost always safe :p

Thursday, July 15, 2010

C++0x reusing same keywords and symbols for entirely different meanings

One thing especially bad about c++0x is that it seems to give different meanings to already known keywords and symbols.

For example, the array subscript operator "[]", already can be used to define arrays and index them, but now in c++0x they also denote lambda expressions.

But an even more ludicrous example is the keyword "auto".

"auto" in c++0x can be used to infer the type of an identifier, but it just so happens that "auto" already has a meaning in c++.

auto is used to denote "automatic storage", which essentially means stack-storage for variables that weren't declared globally..
So you can do:
auto int i = 0;


The reason you don't ever see code explicitly using auto, is because 'auto' is on by default, so its rarely used.

This means however, that in a c++0x function you should be able to do this:
auto auto i = 0;


I tested this with GCC 4.5.0, and got a compiler error :/
In fact, GCC 4.5.0 with c++0x extensions enabled doesn't even compile "auto int i = 0;"...

I think this is a GCC bug, but still this shows the problems that reusing keywords and symbols can cause.

Also, needless to say, it makes things a lot more confusing for beginners to understand the different meanings of the words...

Wednesday, July 14, 2010

String Literal with Array Subscript Trick

This is a cool trick I just found out recently.

If you have a string literal such as "abcdef", you can interpret that as an array and use the '[]' operator to get the correct character from the string.

For example:


int main() {
for (int i = 9; i >= 0; i--) {
cout << "abcdefghij"[i];
}
cout << endl;
}


That will print out "jihgfedcba".


Now knowing this trick (as well as some other minor changes), we can omit a few lines from our hexToString() function in my previous article.


string toHexString(u32 value) {
char str9[9] = {'0'}; // '0' char followed by null bytes
for (int i = 7, pos = 0; i >= 0; i--) {
char dig = (value>>(i*4))&0xf;
if (!dig && !pos) continue;
str9[pos++] = "0123456789abcdef"[dig];
}
return string(str9);
}

Multi-Dimensional Array Optimization

When using multidimensional arrays in c++ (or most languages for that matter), it is better to use powers of 2 for all dimensions (except for the first one which doesn't matter).

For example, take the following multi-dimensional array:
int a[3][3][3];

It will be faster to reference the elements in the following array instead:
int a[3][4][4];

The reason is that the compiler can use shifts when calculating the address to read from, as opposed to using multiplication operations. On x86 processors, this shifting by powers of 2 could even be implicit as part of the memory addressing modes which allow a shift parameter in their encoding.

So if you need speed, it could be beneficial to use powers of 2 dimensions even when you don't actually reference the extra memory. Although this optimization may require benchmarking if you increase the size of the array with too much unused data; because it may slow down the code by having less efficient cache-usage due to the extra unused memory. (in the case where all the memory will be used, then its a win-win situation)

In some cases you don't even need to increase the array dimensions to benefit from this knowledge. If you have an array "int arr[4][7];" you know it will be more efficient to swap the dimensions and just access it the other way: "int arr[7][4]".

Reading Hexadecimal Numbers and Understanding Binary Numbers

This post is aimed towards beginners trying to understand hexadecimal numbers, and why programmers often use them instead of base-10 decimal.

Well computers internally deal with logic circuits and we have a concept of a 'bit', which represents either '1' or '0', 'on' or 'off', etc...

You can put 8 bits together and form a byte. These 8 bits can then be interpreted as a binary string to represent a number.
The first bit, called the least-significant bit, represents 2^0 power (0 or 1).
The second bit, represents 2^1 power (0 or 2)
The third bit, represents 2^2 power (0 or 4), etc...

So in one byte, you have 8-bits that represent this:

b7, b6, b5, b4, b3, b2, b1, b0
2^7, 2^6, 2^5, 2^4, 2^3, 2^2, 2^1, 2^0


All these bit-values are then added together to give you the value of the number.

For example, '4' is represented as "00000100", notice that b2 is set (2^2)
Likewise, '5' is "00000101", notice that b2 and b0 are set (2^2 + 2^0 == 4 + 1 == 5)
And so on...


Now because internally numbers are represented in this format, you should understand that binary numbers are pretty important to programmers, especially low-level programmers.

But notice that it takes 8 digits to represent 1-byte, a byte only represents 1-char in c++.
But usually we deal with ints, these don't just represent 1-byte, but instead 4-bytes (usually).

That means it will take 8*4 = 32 digits to represent an integer in binary!
You can't expect a programmer to be writing 32-digit numbers full of 1's and 0's. There has to be a better way!

That's where hexadecimal comes in.
It just so happens that 1-digit in hexadecimal, represents exactly 4-digits in binary. This is very useful to us, because that means if we memorize the hexadecimal digits, then we can easily refer to 4-binary digits with just 1 hexadecimal digit.

The 16 digits in hexadecimal are:

Hex | Binary
--------------
0 | 0000
1 | 0001
2 | 0010
3 | 0011
4 | 0100
5 | 0101
6 | 0110
7 | 0111
8 | 1000
9 | 1001
a | 1010
b | 1011
c | 1100
d | 1101
e | 1110
f | 1111


The key to using hexadecimal fluently, is memorizing this hex-to-binary table.
I'll give you some hints to remember some of the numbers:
0 is 0 in hex and in binary.
1 is 1 in hex and in binary.
1, 2, 4, and 8 only have 1-bit set (because they're powers of 2)
f is the last hex number, and has all 4-bits set (all 1's)
'a' comes after 9, which in decimal should be '10', if you notice though, 'a' represents '1010' in binary, so think of two 10's.


Anyways, it takes some time to memorize the table, but once you do, you are then able to convert hex numbers to binary very easily.
For example:

7 f
0x7f = | 0111 | 1111 |

5 5
0x55 = | 0101 | 0101 |

a b c d e f 9 8
0xabcdef98 = | 1010 | 1011 | 1100 | 1101 | 1110 | 1111 | 1001 | 1000 |



As you can see by the last example, we were able to represent a 32-bit number with just 8 digits. Instead of typing "10101011110011011110111110011000" we just had to type "0xabcdef98".

Oh and just so you know, c++ uses the prefix "0x" to refer to hexadecimal numbers.
Sometimes people use "$" to refer to hex, or sometimes they just use "x" as the prefix.

Also I should note that the reason you can't use normal base-10 decimal numbers to refer to binary easily, is because a single decimal digit does not equal an exact amount of binary digits. 3-binary digits is octal, and 4-binary digits is hex. Decimal is in-between those, so you can't use decimal to refer to binary numbers nicely.

Anyways I hope this article helps those who were having a bit of trouble understanding hex.

Monday, July 12, 2010

Random String Function Implementations

There are a bunch of string functions out-there that do almost everything you could want to do with strings.

There's so much however that sometimes its just simpler to implement your own method instead of finding the appropriate function to use.

What if you wanted to find the size of a char-array string including its NULL-byte.
Well you can easily and elegantly do this:

int strSize(const char* str) {
int len = 0;
while(str[len++]);
return len;
}


Another easy thing to implement is a toLower() function which converts uppercase letters to lowercase:

void strToLower(char* str) {
for( ; *str; str++) {
char& c = *str;
if (c >= 'A' && c <= 'Z') c = c - 'A' + 'a';
}
}


A matching toUpper() function is obviously just as easy ;p


Last here's a fun function I made that converts a 'u32' (unsigned 32-bit integer), to a hex string:


string toHexString(u32 value) {
char str9[9];
u32 pos = 0;
for (int i = 0; i < 8; i++) {
char dig = (value>>((7-i)*4))&0xf;
if (!dig && !pos) continue;
if ((dig) < 0xa) str9[pos] = '0' + dig;
else str9[pos] = 'a' +(dig-0xa);
pos++;
}
if (!pos) { str9[0] = '0'; pos=1; }
str9[pos] = '\0';
return string(str9);
}


That one is definitely more complex than the others, but it was fun to code.
All hex digits are 4-bits, and a u32 value only needs a char array of at-most size 9 (8 digits + 1 null byte).

Anyways, in real coding projects its probably best to use already defined functions that perform the string operation you want; but implementing your own can sometimes be fun, and possibly impress your professors on homework assignments.

Nested Functions in C++

Although c++ does not natively support nested functions, you can do a trick to support them using structs/classes.

In my other blog post I talked about how lambdas can be used as nested functions, and I also showed a way to do recursion with lambdas. If you have access to c++0x, then the nicer way to do nested functions is using lambdas, but if you don't, then you can use the nested-class trick.

Although it may sound odd, you can declare a struct/class in a function. And then this struct/class can have a static method which you can call.

Here's one version of it:

int main() {
struct factClass {
static int factorial(int x) {
return (x <= 1) ? 1 : x * factorial(x-1);
}
};

cout << factClass::factorial(5) << endl; // Prints '120'
return 0;
}


Okay real quickly I want to point out that in c++, structs and classes are the same thing; the only difference is the default access between structs and classes.

With struct all the method/variables are 'public' by default, and with classes they're 'private' by default.
You can change the behavior by specifying "public:" or "protected:" or "private:" keywords for both structs and classes.
When I first started with c++ I didn't know this, so that's why I'm mentioning it :D


Okay now that I got that out of the way, as you can see the above function is pretty bloated.
We can use macros to hide some of the bloat of the class declaration like so:


#define nested(className) struct className { static

int main() {
nested(factClass) int factorial(int x) {
return (x <= 1) ? 1 : x * factorial(x-1);
}};

cout << factClass::factorial(5) << endl; // Prints '120'
return 0;
}


There's also another way to do this using functors.
Basically what we do is have a struct/class and then overload its "()" operator to make it look like a function call like so:


int main() {
struct {
int operator()(int x) {
return (x <= 1) ? 1 : x * this->operator()(x-1);
}
} factorial;

cout << factorial(5) << endl; // Prints '120'
return 0;
}



Now that's pretty cool.
We create an anonymous struct, and 'factorial' is an instance of that struct. Then it has the overloaded '()' operator which lets us call factorial(5) as if it was a function.

We can again use macros to hide some of the bloat, although the end result looks kind-of weird:


#define nested(xReturnType) struct { xReturnType operator()

int main() {
nested(int)(int x) {
return (x <= 1) ? 1 : x * this->operator()(x-1);
}} factorial;

cout << factorial(5) << endl; // Prints '120'
return 0;
}


The nice thing about using functors like in the last two examples is that you don't have to use a separate class name and function name (className::functionName()), instead you just use one name and call it with that name.

Anyways there you go, we have shown a variety of ways you can do nested functions, and I would stick to lambdas unless you're not using c++0x, in which you'll have to use one of the nested-class tricks mentioned above ;p

c++0x autos, lambdas, and lambda recursion!

C++0x introduces some new features that will probably end up making C++0x code many-times more obfuscated and harder to read; however the features can be very useful and fun to code with as well.

The first interesting feature I'll talk about is the "auto" keyword. Essentially you can use the auto keyword to hold a type that will be automatically inferred.
For example:

int a = 1; // Type is explicitly int
auto b = 1; // Type of 'int', which is inferred by the int literal
auto c = a; // Type of 'int', which is inferred by the type of a
auto d = 1.4f; // Type of 'float', which is inferred by the float literal


The auto keyword will probably be abused by c++0x coders, making code harder to follow; I imagine people will start using this for a-lot of their variable declarations, making it difficult to know the true types of variables when browsing code.

The auto keyword has some great uses, but I will not explicitly talk about them here because it'll make this post too long...


The second feature I'll talk about are lambdas.
When I first saw the c++0x lambdas I thought it looked very confusing, but after playing around with them I found out they're pretty fun.

I'll use Microsoft's definition of lambda expression:
"A lambda expression is a function or subroutine without a name that can be used wherever a delegate is valid. Lambda expressions can be functions or subroutines and can be single-line or multi-line. You can pass values from the current scope to a lambda expression."

Basically a lambda allows you to create anonymous functions, and they can be defined within other c++ functions, so it also allows you to do nested functions.

They look like this:

// Foo is a function pointer to a lambda that has nothing in its body,
// so it does nothing...
auto foo = [](){};

// [] tells which variables in the current scope will be available to the lambda
// () is the parameter list to be passed to the lambda function
// {} is the lambda's body

// You can now call foo like so:
foo();


Okay I know this is confusing, so I'll give a concrete example:


int main() {
auto foo = [](int x) { return x * x; };
cout << foo(2) << endl;
return 0;
}


That prints out '4' to console.
As you can see we have a parameter list of "int x", and then in the body of the lambda we are returning x*x.
Notice how you don't have to explicitly mention a return-type! That is also an interesting feature of c++0x.

Now the '[]' part of the lambda is a bit tricky to understand, basically it allows you to specify variables from the current scope that will be available to the lambda's body.

For example you can do this:


int main() {
int x = 2;
auto foo = [&x]() { return x * x; };
cout << foo(2) << endl;
return 0;
}


This will return '4'.
What its saying is that the lambda's body has access to the 'x' from the main() function.
If you modify 'x' in the lambda's body, the 'x' in the main() function will also be modified because we're passing the value by reference.
If you want to pass x by value, you can just omit the ampersand, and it will pass the variable by value.

If you want the lambda to have access to all local variables then you can do:

int main() {
int x = 2;
auto foo = [&]() { return x * x; };
cout << foo(2) << endl;
return 0;
}


Using the '[&]' means that you're passing all variables of the same scope by reference.
You can also use '[=]' which means you're passing them all by value.
You can even do something like, '[&, x]', which means you're passing all by reference, except for 'x' which will be passed by value.

Technically I've been assigning an identifier to the lambda 'foo', so I've not been using lambdas as anonymous functions here.

The way you would use lambdas as anonymous functions (a function without a name or identifier), is when another method takes a function object as an argument.
The typical example seen all over the web is with 'for_each'.
The arguments for for_each() are found here, but basically are:
param1 = A starting iterator
param2 = An ending iterator
param3 = A function Object

So you can either pass an already defined function as param3, or you can use lambdas to define the function in the current scope.

This example is on wikipedia, but I'll reuse it here.
Suppose you have a vector full of ints, and you want to add up all the ints together.
With for_each, you can specify a lambda as param3 which adds each int to a 'total' variable:

std::vector<int> someList;
int total = 0;
std::for_each(someList.begin(), someList.end(), [&](int x) {
total += x;
});


I recommend reading the wikipedia link as it has more examples.


Finally the last thing I want to mention is lambda recursion.
When I originally tried using recursion with c++ lambdas, I got compiler errors. So I ended up concluding that recursive lambdas in c++ were not possible without using hacks.

I ended up coding a hack to support lambda recursion:


typedef int(*pFoo)(int, int);

int main() {

pFoo foo = [](int x, int _f) -> int {
int* _p = (int*)&_f;
void*& p = (void*&)*_p;
pFoo* a = (pFoo*)_p;
pFoo f = *a;
return (x<=1) ? 1 : x * f(x-1, _f);
};

int* _p = (int*)&foo;
int p = *_p;

cout << foo(5, p) << endl; // Prints '120'
return 0;
}


Yes I know its super ugly, I had to do a lot of ugly casts to get GCC to compile without warnings or errors.

The basic idea here is that you define the lambda taking two arguments, one being the integer we will perform the factorial function on, and the other being a function pointer to the lambda itself. The lambda can then use that function pointer to call itself. The function pointer in this case is disguised as an 'int', and then casted to the function pointer type.

I have recently figured out however that you don't need to resort to hacks for lambda recursion!

Here is the proper way to do lambda recursion:


int main() {
function<int(int)> factorial = [&factorial](int n) -> int {
return n <= 1 ? 1 : n * factorial(n-1);
};
cout << factorial(5) << endl; // Prints '120'
return 0;
}


Notice how we give the lambda access to the 'factorial' variable.

I had tried this before using 'auto' instead of 'function<int(int)>' and it didn't work with GCC, so I had assumed recursive lambdas were not supported (I had also read posts by people saying they weren't supported... guess they were wrong :p).

So the trick is to not use 'auto' when declaring the lambda type, but instead use function<int(int)> to explicitly show its a function.

And when you think about it, it makes sense that you can't use 'auto' when using lambda recursion, since you're using the identifier before its type has fully been defined.

Anyways, with the introduction of lambdas, c++0x has nice support for nested functions, recursive nested functions, and anonymous functions.

So get yourself a c++0x compiler and try it out!

Sunday, July 11, 2010

Using c++ references for nicer code

It is common to see castings and conversions between different types of primitives in c++ code.
Many times however you don't want to do any conversion of the value, but instead just want to reference the binary representation of the variable as another type.

The common ugly way to do this is something like this:

// Program Entry Point
int main() {

// Make x hold the binary representation of a
// positive infinity float
int x = 0x7f800000;

// Treat x as if it was holding a float instead of an int,
// and then give f the same value
float f = *(float*)&x;

// Print float f (positive infinity) to the console
cout << f << endl;
return 0;
}


This does work, however as you can see the second line of code is pretty ugly.
The reason you can't just do "float f = (float)x;", is because that would mean you're casting the int value to a float; so the value is actually converted to a floating point value.

But that's not what we wanted, we instead want the 'int x' to behave as it was a float the entire time; that is why we first take the address of 'x' (&x), then treat it as a float pointer ((float*)&x), then finally dereference it back (*(float*)&x).

With c++ however, you can use references to simplify this whole procedure.

// Program Entry Point
int main() {

// Make x hold the binary representation of a
// positive infinity float
int x = 0x7f800000;

// Treat x as if it was holding a float instead of an int,
// and then give f the same value
float f = (float&)x;

// Print float f (positive infinity) to the console
cout << f << endl;
return 0;
}


This trick was taught to me by Jake Stine from pcsx2, and its something not a lot of coders know apparently, because I commonly see people using the ugly-way to do this instead.
Most-likely because you can't do this in normal C (just another reason why C sucks compared to C++ xD).

Fast Multiplication and Division

Another well known bitwise trick is that you can do multiplication and division by powers of two "2^n", with just simple shifts by "n".

For example, you can do:


// Normal Multiplication
void mul(int& x) {
x = x * 4;
}

// Optimized Multiplication
void mul(int& x) {
x = x << 2;
}


And for division:


// Normal Division
void div(unsigned int& x) {
x = x / 4;
}

// Optimized Division
void div(unsigned int& x) {
x = x >> 2;
}


Unlike the modulus optimization trick, the multiplication trick does work for both signed and unsigned variables, so the compiler is safe to optimize these cases for you.

In the case of the division trick there is a problem that may happen if you have a negative signed number. If you divide a negative integer and the result should have a remainder, then using shifts may give you the wrong result (for example, -1/2 should be 0, but -1>>1 is -1).
Compilers are able to optimize signed division by powers of two using shifts and interesting logic to account for the possible errors.

For the reason that compilers will make the necessary optimizations for you, it is generally better to leave the code as normal multiplications and divisions for readability and to prevent mistakes.
If you're dealing with low level assembly, you'd obviously want to be doing the shifts where possible instead of using mul and div.

Fast Modulus Operation and Using Unsigned Ints as an Optimization

This is another pretty well-known trick; but there is often a few misconceptions with it.

If you are using the modulus operator '%' with a power of 2 value '2^n', then you can do the same operation with a bitwise AND with 2^n-1.


// Normal Mod Operation
void mod(unsigned int& x) {
x = x % 4;
}

// Optimized Mod Operation
void mod(unsigned int& x) {
x = x & 3;
}


It is very important to note that this only works with positive/unsigned values.
If x was '-6' for example, the correct value returned should be '-2', but with the optimized mod it would return '2'.

The optimized mod trick however is very useful as perhaps one of the most common uses of Mod is to check if a number is odd or even:


// Normal isEven Operation
bool isEven(int x) {
return !(x % 2);
}

// Optimized isEven Operation
void isEven(int x) {
return !(x & 1);
}


This works regardless if the input is positive/unsigned, and is much faster than using the modulus operator.

The way the Modulus operator works in x86-32 is it has to do a Div operation, and get the remainder; Div is almost always a slow operation on any architecture.


I also wanted to point out something else.
The misconception some people have is that the compiler already will optimize something like "x = x % 2" into "x = x & 1", well the real answer is not-always.

As I already said, the optimization only works for positive/unsigned values, so if you declare x as a normal signed-int, then the compiler can't safely optimize the operation into an AND.
This is why I wanted to point out that using "Unsigned Int" can be an optimization because the compiler will be able to change such mod operations into bitwise ANDs.

If you leave the values as signed, the compiler either does one of 2 things:
1) Will just compile it using a div and get the remainder.
2) Will compile code that checks the sign bit of value, does the optimized AND, and negates the value if the original value was negative.

That is, the compiler will do something like this:

// What the compiler might generate for
// "x = x % 2;" with signed values
if (x < 0) {
x = x & 1;
x = -x;
}
else x = x & 1;



Many times I have seen code where performance has increased dramatically from removing modulus operations in favor of bitwise ANDs or other logic.

For example, if you were to have a loop like this:

for(int i = 0; i < 100000; i++) {
someVar++;
someVar%=7;
// ... Do more stuff
}


You can make it much faster by doing this:

for(int i = 0; i < 100000; i++) {
someVar++;
if (someVar >= 7) someVar = 0;
// ... Do more stuff
}

Saturday, July 10, 2010

Bitwise Set-Bit

Say you want to set a variable to '0' if an integer is 0, and '1' if an integer is non-zero.

You can do it this way:

int setBit(int x) {
return !!x;
}


If you want to instead emulate this behavior, you can do it some other ways.

If you know for sure the sign-bit is not set, then you can implement this behavior like this:

int setBit(int x) {
return (uint)(x + 0x7fffffff) >> 31;
}


This works because if any bits were set, then it ends up carrying a '1' over to bit #31 (the sign bit).

If you also want to include the sign bit into this, you can implement the setBit() function like this:

int setBit(int x) {
return (((uint)x >> 1) | (x&1)) + 0x7fffffff) >> 31;
}


Or you can use 64bit operations like this:

int setBit(int x) {
uint64 y = (uint64)(uint)x + 0xfffffffful;
return ((int*)&y)[1];
}



Notes:
This trick isn't really useful when sticking to high level c++, but it can be useful when dealing with low level assembly.

With x86 if you don't wish to use the SETcc instruction, you can use a variation of the trick for full 32bit detection using the carry bit like so:

add eax, -1 // Add 0xffffffff to eax
rcl eax, 1 // Rotate with the carry bit so its in the lsb
and eax, 1 // AND with the carry bit

The above is nice because SETcc only sets the lower byte to 1/0, while this will set the full 32bit register to 1/0. Also because SETcc only modifies the lower byte, it can be a speed penalty with the CPU's pipeline as opposed to modifying the full 32bit register.

Also note that I use 'int' and 'uint' in this article to refer to 32 bit signed and unsigned ints.
And i use 'uint64' to refer to unsigned 64bit int.

Friday, July 9, 2010

Testing for NaN and Infinity values

From my work on pcsx2 I have had to deal extensively with the problem of NaN and Infinity values with floats.

The ps2's FPU and VU processors do not support NaN or Infinity values, so it is a pain to emulate them on a system that does support such values (x86-32/SSE processors).

There are a variety of ways to test for NaN and Infinity values, and I will list a few here.

The first is pretty well known.
If you compare a float for equality against itself, it should return True unless the float is a NaN.
So the typical approach is to do something like:

// Test for NaN
bool isNaN(float x) {
return x != x;
}


That will only test for NaN's, if you want to check for infinities you can do something like this:


#include <limits>
#include <math.h>

// Test for positive or negative infinity values
bool isInf(float x) {
return fabs(x) == numeric_limits<float>::infinity();
}



You can instead use bitwise logic to test for NaN's and Infinities.


// Test for NaN with bitwise logic
bool isNaN(float x) {
return ((int&)x & 0x7fffffff) >= 0x7f800001;
}

// Test for Inf with bitwise logic
bool isInf(float x) {
return ((int&)x & 0x7fffffff) == 0x7f800000;
}


You can even check for both NaN and Infinities with just one comparison:


// Test for NaN or Inf with bitwise logic
bool isNaNorInf(float x) {
return ((int&)x & 0x7fffffff) >= 0x7f800000;
}



Generally you probably don't want to use the bitwise version of these functions for the reason that the compiled code will end up having to switch from FPU to integer arithmetic, which will most likely end up being slower than sticking to the floating point comparisons.