Sunday, September 29, 2013

The proper way to write multi-statement macros in C++

If you've been programming for a while, you've probably written some multi-statement macros, and you probably wrap them in {} brackets like this:

#define print_words(str1, str2) { \
    cout << str1 << " ";          \
    cout << str2;                 \
}

int main() {
    print_words("hello", "world");
    return 0;
}

This looks fine at first glance, and the program will print out "hello world" as expected... So whats the problem?

The problem is calling this macro in certain places of code will generate errors that calling a normal function wouldn't.

For example:

#define print_words(str1, str2) { \
    cout << str1 << " ";          \
    cout << str2;                 \
}

int main() {
    if (1) print_words("hello", "world");
    else   cout << "good bye";
    return 0;
}

If you try to compile this, you will get a compilation error.
GCC gives the following: "error: 'else' without a previous 'if'"

Why do we get this error? Its because the generated code with the macro expanded looks like this:

int main() {
    if (1) { cout << "hello" << " "; cout << "world"; };
    else   cout << "good bye";
    return 0;
}

In c++ if you have an if-statement that uses brackets, and then you terminate the bracket using a semi-colon, it will terminate the whole if-statement, such that if you use an "else" clause after it, the compiler won't have an "if" to match it up with. So code that looks like: "if (1) {}; else {}" will have a compilation error. The correct code should not have a semi-colon: "if (1) {} else {}"

So if we want to write a multi-statement macro correctly (so it doesn't give us these errors), we need to use a trick when writing the macro. The common way to solve this is instead of wrapping the macro in brackets, we wrap the macro in a "do {} while(0)" loop like this:

#define print_words(str1, str2) do { \
    cout << str1 << " ";             \
    cout << str2;                    \
} while(0)

int main() {
    if (1) print_words("hello", "world");
    else   cout << "good bye";
    return 0;
}

Now when the macro is expanded, the semi-colon terminator will terminate the do-while loop, instead of terminating the if-statement: "if (1) do { } while(0); else {}", and this works fine without compilation errors.

So now you know the secret to writing robust macros. Wrap them in do-while loops!
This trick is also a great way to distinguish veteran c/c++ coders from novices. Most people will not know why the do-while loop is necessary; whereas veterans have likely seen this trick used before.
I recommend to just always wrap your multi-statement macros in a do-while loop.

p.s. There are other ways to solve this problem. One of them is to use the comma-operator to group all the expressions as one big statement. But it is not applicable in all cases (since it doesn't let you declare new variables inside the macro), and I won't go over it in this article.

Tuesday, March 13, 2012

Table Bitpacking to avoid lookup tables and switches / Reversing Nibbles and Bytes

Problem: Write a c++ function to reverse a nibble (a nibble is 4 bits); you may assume the nibble 'n' is being passed as an unsigned byte and is guaranteed to already be within the valid 0~15 range (so you don't have to have AND it with 15 or have assertions).

A straight-forward solution for this problem would something this:

u8 rev_nibble(u8 n) {
u8 ret = 0;
if (n&1) ret |= 8;
if (n&2) ret |= 4;
if (n&4) ret |= 2;
if (n&8) ret |= 1;
return ret;
}


A cooler solution is:

u8 rev_nibble(u8 n) {
return ((n & 1)<<3) | ((n & 2)<<1) | ((n & 4)>>1) | ((n & 8)>>3);
}


Both the above solutions however consist of multiple instructions to accomplish such a simple thing, better solutions might use either a switch or a table:

u8 rev_nibble(u8 n) {
static const u8 ret[16] = { 0x0,0x8,0x4,0xc,0x2,0xa,0x6,0xe,
0x1,0x9,0x5,0xd,0x3,0xb,0x7,0xf };
return ret[n];
}



Now this last solution uses a trick I came up with (although I'm sure others have to), which I call "Table Bitpacking".

Instead of using a table to store the return values, notice that all return values only take up 4 bits (a nibble), and there's 16 of them, so in total you only need 64bits of data for the table. So we can just use a 64bit immediate value for the table, and shift out the result instead!

u8 rev_nibble(u8 n) {
return (0xf7b3d591e6a2c480ull >> (n*4)) & 15;
}


Table bitpacking is cool because it can be used for many different permutations of tables; as long as the bits_per_choice * number_of_choices <= available_word_bits;
On 64bit platforms using 64bit tables should be fine, on 32bit platforms you'd want to stick to 32bit tables since 64bit shifts can be expensive.

Whether or not table bitpacking ends up being faster than the normal lookup table depends on the instruction set and the specific circumstances. But IMO its worth knowing the trick for possible future optimizations.

Now using our above function, we can do another common problem:
Problem: Reverse a byte!

Now that we know how to reverse a nibble, reversing a byte should be simple if we use the already existing rev_nibble function:

u8 rev_byte(u8 n) {
return (rev_nibble(n & 15) << 4) | rev_nibble(n>>4);
}


Note: There are plenty of other ways to reverse a byte, perhaps the fastest is generally a 256 entry lookup-table for each byte permutation (now that table would be 256*256 bits, so too large to use the Table Bitpacking trick on current machines).

Edit:
Just thought I'd throw in one more cool way to reverse a nibble:

u8 rev_nibble(u8 n) {
return "aiemckgobjfndlhp"[n] - 'a';
}

Tuesday, October 4, 2011

Bitwise Set-Bit (#2)

In a past article, I've shown some ways to do this. This time I'll show you another way to do it that I just came up with.

The problem:
Return 1 if a 32bit integer is non-zero, else return 0. The return-type should be a 32bit integer.

The Typical Solution:

s32 setBit(s32 x) {
return x ? 1 : 0;
}


The Cooler Solution:

s32 setBit(s32 x) {
return !!x; // Forces it to a bool which is 1 or 0
}


The Bitwise Solution:

s32 setBit(s32 x) {
return (u32)(x | -x) >> 31;
}


The last one works because any non-zero integer will have its sign bit set either when negated or normally (if its already negative). The only number that won't have its sign bit set after ORing these results is 0. So we just logically-shift right by 31 to get a 1 or 0 value, which is what we wanted :D

Note 1: The bitwise trick in this case is not preferred, instead you should use the 'typical' or 'cool' solution, since the compiler will be able to optimize it better for the given architecture. On x86 there is a SETcc instruction which can be used which is likely faster than doing the bitwise trick since its less operations. This trick may still be useful to know though in-case you're doing asm or compiler work with some architecture that doesn't have a SETcc-like instruction (and you wish to avoid jumps). Although the method from my past article might be more helpful in this case.

Note 2: There is actually a number which has its sign bit set both normally and on negation, and that's the minimum-negative integer. With 32bit integers this number is -2147483648 (i.e. 0x80000000) (so with 32bit integers: 0x80000000 == -0x80000000).
Using this knowledge, we now learned another neat trick. You can detect the minimum negative integer by doing: bool isMinNeg = x == -x; // x can be an int type of any size

Wednesday, September 21, 2011

-1 or 1 bitwise trick

Been a while since my last update, and been a long time since my last bitwise trick.

The Problem:
If you have a 32bit signed integer, return -1 if the number is negative, and return 1 if the number is positive (or 0).

The Typical Solution:

s32 sign(s32 x) {
return x < 0 ? -1 : 1;
}


The Bitwise Solution:

s32 sign(s32 x) {
return (x>>31)|1;
}


How does it work?
Well (x>>31) does a signed shift with the sign bit, which will return either -1 if the sign bit was set (because -1 in binary is all 1's), or 0 if the sign bit wasn't set.
But we want to return 1 when its positive; so what we do is OR the result with 1. (-1 | 1) == -1, and (0 | 1) == 1.
So we get the result we wanted without needing a branch, and only using 2 very fast bitwise operations :D

Wednesday, June 1, 2011

Don't initialize that big static data! (c++)

I learned this recently, and so I assume not a lot of people realize this.
But lets take a look at some sample code:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
int main() {}


static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10] = {7};
int main() {}


Now whats the difference between both of those programs? Well the only difference is that the bottom one is initializing a[0] to 7. Both of the programs initialize the rest of the a[n] data to 0.
In the first one, all of 'a' gets initialized to 0 because all static/global data that is not explicitly initialized gets initialized to 0 (as per some rule in C).
In the second one, the first element of 'a' gets initialized to 7 because we explicitly told it that in the array initializer and the rest of the data is filled with 0's because when you have an array initializer that does not cover all of the elements, the rest is treated as 0's.

Now why do we care about this?
Well why don't you try compiling both of those programs and take a look at the executable file?
You might be surprised to find out that the first executable would be just a few kb, while the second would be over 10mb!

Why is this?
Well it turns out that compilers/linkers end up putting uninitialized static data in its own section called the bss section. This section consists of data that is initialized to all 0's. This means that the executable doesn't need to store all that data in itself, it just needs information of how big the variable is, and where to place all the 0's in memory; it then does this when the application starts up.

In contrast, the second example now explicitly initializes the data. Now the compiler/linker places that data in the data section, and it actually stores the full contents of the data in the executable. This means when you have an explicitly initialized 10mb chunk of data, your executable file will grow by 10mb!

The exception to the above rule is data that is initialized to 0. Like if we had "char a[_1mb*10] = {0};", now what will happen?
Well it turns out it can either be put in the bss section or the data section. Apparently some compiler/linkers might be stupid enough to put it into the data section meaning the executable will still grow by 10mb. Thankfully msvc isn't that stupid and it doesn't do that (the built executable is the same as if you didn't explicitly initialize 'a').


Anyways, the moral of the story is to keep the above information in mind when you are explicitly initializing data.
What would be a way to explicitly initialize a[0] to '7' at program startup w/o bloating the executable an extra 10mb?
Well this is one way:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
struct _init_a {
_init_a(char* c) { c[0] = 7; }
} __init_a(a);
int main() {}


Now __init_a's constructor will end up initializing a[0] to 7 at the application startup like we wanted. (Note: remember to beware of the static initializion order fiasco problem in c++ if you will have other static variable's constructors using 'a').

Thursday, May 26, 2011

Commenting Code

Meh...Most comments I've encountered in my life are written under the assumption that comments can make up for poorly written code.. – Stargazer712 Oct 14 '10 at 20:53

I saw the above comment on this page, and I highly agree with what Stargazer712 said.

So often I hear people talking about commenting code as a 'solution' to making crap code understandable. Using comments in this manner is not a solution, but instead a hacky work-around for the in-ability to produce quality code.

If code is well written and structured, then the amount of comments you need can be greatly reduced, as the code itself would be pretty self explanatory.

Furthermore comments can be misleading and do more harm than good. I've seen many cases where a comment says code does one thing, but its really doing another. Or I've seen bad coders change code, but leave in their old comments which are now totally irrelevant, just making the reader more confused.

I'm not saying comments are bad, but I am saying that they are no excuse to be writing crap code. Sometimes code gets very complex, and its not due to the programmer being bad, but instead because they are heavily optimizing something or they are implementing something that's very involved and tricky. In these cases commenting code is very helpful and justified.

Something I personally don't like to see is a lot of comments in the main .cpp files of a project, mainly because it gets in the way of code. Instead I like comments to be placed in the headers and next to the function prototypes away from the main implementation of the code so that when you are actually looking at the .cpp file, you can focus on the real code instead of seeing a wall of green-text getting in the way.
If they're 1 or 2-liner comments then its fine to put them in the .cpp file, but if you're going to write a paragraph then save it for the .h file because I don't want to see that when I'm trying to understand your code (your real code, not what you 'say' your code does!).


Another thing I see often that newbie coders love to do is write obvious comments. Stuff like "i = 1; // set i to 1", these comments are not at all helpful, and they only serve to bloat code.

Anyways I guess I can rant more, especially about bloated code, but I'll think I'll save that for a future post.

Wednesday, May 11, 2011

Checking if variables are of certainin type in C++

Okay this problem is something I often see people do incorrectly in c++.

I've seen people use the sizeof() operator to try and distinguish types such as:
template<typename T> void foo(T x) {
if (sizeof(x) == sizeof(string)) cout << "x is a string\n";
if (sizeof(x) == sizeof(char)) cout << "x is a char\n";
if (sizeof(x) == sizeof(int)) cout << "x is a int\n";
}


The above is obviously wrong because different types can have the same size. In many 32bit implementations sizeof(std::string) will be equal to sizeof(int).

Now how can we do this better?
Well there's a few ways to do this, one way can be to use template specialization for each of the above cases and make separate functions.

But I'll explain another way. Boost has a templated struct called is_same which we can use in this case. However if you're like me and don't like using a lot of dependencies or over-engineered libraries in your code, then we can write our own code to tell us if two types are the same or not.
It looks like this:
template<typename A, typename B>
struct _isSame { static const bool value = false; };
template<typename A>
struct _isSame<A, A> { static const bool value = true; };


We can now use the above code like this:
template<typename T> void foo(T x) {
if (_isSame<T, string>::value) cout << "x is a string\n";
if (_isSame<T, char>::value) cout << "x is a char\n";
if (_isSame<T, int>::value) cout << "x is a int\n";
}


Lastly, we can make a helper function that lets us know if two variables are of the same type.
template<typename A, typename B>
inline bool isSame(A a, B b) { return _isSame<A, B>::value; }


And here's an example using the above function:
int main() {
int x = 0;
int y = 0;
float z=0;

cout << "Is x's type the same as y's type? ";
cout << (isSame(x, y) ? "Yes" : "No") << endl;

cout << "Is x's type the same as z's type? ";
cout << (isSame(x, z) ? "Yes" : "No") << endl;
}

Thursday, April 14, 2011

c++ functions returning void

I'm busy with exams now but I wanted to do a quick update on my blog after realizing how long its been since my last update ><

Anyways this post is about something that isn't seen too-often in c++ so might not be known to some people; c++ void functions can return things, as long as the type is void.

That means all of these are valid to do in c++:

void foo1 {}
void foo2 { return; }
void foo3 { return (void)0; }
void foo4 { return (void)1.1; }
void foo5 { return (void)(cout << "hello world"); }
void foo6 { return foo5(); }

You might think these are useless, but in some cases you might find it aesthetically pleasing to use one of the above styles in your code.

For example, in an emulator you can have an opcode execute function for your interpreter that looks something like:

void opExecute {
unsigned char opcode = fetch();
switch(opcode) {
case 0x84: return ADD();
case 0x62: return SUB();
case 0x34: return BR();
}
}

In this case it looks aesthetically pleasing to be returning a void function.

Another reason it might be useful is because it allows you to do in 1 statement what would otherwise take 2 statements.
Consider this code:

void foo() {
if (cond) { cout << "hello world"; return; }
//... do some code
}

The above if-statement needs brackets around it because it's body has 2 statements in it. But if you cast to void you can combine both statements into one and not need the brackets like so:

void foo() {
if (cond) return (void)(cout << "hello world");
//... do some code
}

In terms of clarity and length, the first way was better; but sometimes there are situations where you want to keep things as 1 statement to prevent possible errors (like when you're creating macros), so knowing that you can do stuff like this might be useful.

Saturday, January 29, 2011

Porting cottonNES

Lately I've been working on porting cottonNES to arm based cell phones, using the Airplay SDK which lets me use C++ and compile native code on a variety of arm-based platforms (iOS phones, android phones, etc...).

My original goal was to get cottonNES working on my iphone and I already accomplished that, but it turned out that a cycle accurate interpreted NES emulator on an iphone 3G runs very slow xD (about 5fps w/o frameskipping).

This slowness just won't do, so I said, "hey now i finally have a reason to code a NES dynarec!" So i went to work studying the ARM instruction set and learned the instruction encodings to code an arm code-emitter (which lets me write arm code to memory buffers so that i can later execute the generated code).

Well it turns out that the iOS on the iphone won't let me execute any code that was dynamically generated. I was using the Airplay SDK and mmap() to allocate a buffer with execution privileges, then emit arm code to it, then execute that buffer; but this ends up crashing on my jailbroken iphone 3G with firmware 3.1.2.

So at that point I was disappointed, and pissed off at fucking apple with their shit iOS. But not all hope was lost since I could still just port my emulator to Android phones which DO let me execute dynamically generated code (and therefor allow the possibility of dynarecs/JIT compilers).

The sad thing is i do not have an android phone to test my code on, but there are emulators i can use to test my code on the PC; and hopefully it will work on the real android phones.

I already had someone try my test-app on a real android phone, which tests the ability to execute dynamically generated code. And they told me the app worked on their droid, so this is good news.

If i don't get bored with the project, this should be the first nintendo emulator with a real dynarec ever AFAIK (there's one emulator i've seen claim to use dynamic recompilation, but it was really doing what i call "function caching", and not a real dynarec).

Anyways, here's a pic of the interpreter version running on one of airplay's simulators:

Wednesday, January 12, 2011

Dynamically allocate aligned memory (aligned_malloc)

The other day on irc i noticed some dolphin-emu developers having a problem where they needed to guarantee 16-byte alignment on some dynamically allocated memory.

You might be thinking: What does that mean? Why is this needed?
Well if you've done any programming and dealt with pointers, you should know that the pointer holds a memory address location which is where the data its pointing to is stored.
That memory address is just a number, and for it to be aligned to a 16-byte boundary means that the memory address needs to be a multiple of 16 (think of all of the computer's memory being divided into 16-byte chunks, what we want is the beginning of one of those chunks).

The reason we want this is because there are some cpu instructions that require memory to be aligned to certain byte boundaries. Many SSE instructions require 16-byte alignment when reading from memory locations. SSE deals with xmm registers which hold 16 bytes of data each, so it's instructions like to work with addresses that are multiples of 16 (there are some SSE instructions that don't need alignment, but they're usually slower). If an SSE instruction that required alignment reads data that is un-aligned, your program will most-likely crash.

So this was exactly the problem the dolphin-emu devs were having. They were using SSE instructions which required alignment, but reading data that was not guaranteed to be aligned. So the end result is it produced random crashes.

So how do we solve this problem?

Well there is an msvc function called _aligned_malloc which you can use to allocate memory aligned to whatever byte boundary you want (as long as its a power of 2).
However _aligned_malloc is non-standard and other compilers like GCC won't have it (afaik).

So what we want is a more portable solution that will work on all modern c/c++ compilers. How do we do this? We code our own!


// Alignment must be power of 2 (1,2,4,8,16...)
void* aligned_malloc(size_t size, size_t alignment) {
uintptr_t r = (uintptr_t)malloc(size + --alignment + sizeof(uintptr_t));
uintptr_t t = r + sizeof(uintptr_t);
uintptr_t o =(t + alignment) & ~(uintptr_t)alignment;
if (!r) return NULL;
((uintptr_t*)o)[-1] = r;
return (void*)o;
}

void aligned_free(void* p) {
if (!p) return;
free((void*)(((uintptr_t*)p)[-1]));
}


Now my code above might scare you. But I'll try to explain what its doing.

The first thing to note is how to use it. Basically you allocate an aligned chunk of memory with the aligned_malloc() function, then when you're done you free it with the aligned_free() function.
So to allocate a 160 byte chunk of memory thats aligned to a 16-byte boundary, you can do this:

void* p = aligned_malloc(160, 16);
... do something ...
aligned_free(p);



Now we get to the part of "fucking aligned_malloc, how does it work!?"
The first thing to note is that when we allocate any chunk of memory and we want to align it to some byte boundary, there are N possible different positions that memory can be in terms of alignment (where N = alignment size)
For example, if we want 4-byte alignment, and we do "void* p = malloc(10);", then the last 2 binary-digits of p are either 00,01,10,or 11. And if the last 2 digits are 00, that means that the memory is already aligned, that means that 1/4 times the memory is aligned already for us. 3/4 times the memory is not-aligned to 4-byte boundary (01,10,11). From this relationship we see that out of N different possibilities, N-1 outcomes will not be aligned.

So to align data we just need to allocated "memory + N-1" amount of space, and offset the memory pointer to point to the first aligned chunk of memory.

There is a trick to doing this, and it looks like this:

// p = address of a random chunk of memory
// N = alignment we want; must be power of 2
aligned_offset = (p + (N-1)) & ~(N-1);


With the above code, aligned_offset will equal the nearest aligned boundary. Try and analyze the code yourself to see how it works (note that N has to be a power of 2).

Now that we know the above trick to get an aligned address from any arbitrary address, we now know the basis of aligned_malloc().

The problem is that we can't just offset the pointer and then return that aligned address; because that means when it comes time to free() or delete[] the data, the pointer pointing to the data will not be correct.
So to solve this problem we use another trick. We can allocate a bit more memory to store the starting address of real allocated memory. Then we place that address somewhere in the memory before the offset pointer we return.
So it looks like this:

| unused_data0 | start_address | aligned_data | unused_data1 |


The pointer we're returning points to the beginning of aligned_data. But before that we have the start_address which points to the beginning of unused_data0 (the start of the allocated buffer).

So with that info, you should be able to analyze aligned_malloc() line-by-line and understand what its doing. If you still don't understand leave a comment with what you're having trouble with.

The last thing to look at is aligned_free. All we're doing when it comes time to free the allocated buffer, is we're grabbing the real start_address of the buffer which was placed immediately before our aligned data, and then freeing the memory where start_address points to.

Again if there's any questions feel free to comment. Hopefully my explanation made sense.

Notes:
In the above functions we're using malloc to allocate the memory. We could instead use c++'s 'new[]' operator to allocate the chunk of memory and 'delete[]' to free it. The reason we didn't is because 'new[]' is c++ only, by using malloc the code works with both c and c++. If you don't care about c-compatibility then you can easily convert the function to use new[]/delete[].

Another thing to note is that the above aligned_malloc is storing the full address of the buffer's starting address. If you limit the alignment max to be 2^15 or less, then you can instead store the difference between the returned offset and the start of the buffer which would only take 2-bytes of extra-storage (instead of sizeof(uintptr_t), which is 4-bytes on 32bit os, and 8-bytes on 64-bit os)

Here is what the code looks like for the space-saving version:

// Alignment must be power of 2 (1,2,4,8,16...2^15)
void* aligned_malloc(size_t size, size_t alignment) {
assert(alignment <= 0x8000);
uintptr_t r = (uintptr_t)malloc(size + --alignment + 2);
uintptr_t o = (r + 2 + alignment) & ~(uintptr_t)alignment;
if (!r) return NULL;
((uint16_t*)o)[-1] = (uint16_t)(o-r);
return (void*)o;
}

void aligned_free(void* p) {
if (!p) return;
free((void*)((uintptr_t)p-((uint16_t*)p)[-1]));
}

Thursday, January 6, 2011

Using and understanding the static keyword C++, as well as some interesting tricks

There are just so many things I can talk about here, so its difficult to condense this into a small blog post. I would like to rant about how I hate c++'s header/.cpp translation unit model mess, but I won't in order to keep this shorter.

The first thing to mention is that when you compile a c++ project with multiple .cpp files, what essentially happens is all the .h files included in the .cpp files are expanded and merged with the .cpp file to form a translation unit which the compiler uses to generate an object file. The different translation units don't know about the other .cpp files, so that's why you need function and class prototypes and extern-variables (such as 'extern int;') which tell the compiler/linker that these functions/classes/variables exist in some .cpp file. (This crappy design is the worst thing about c++, and it was inherited from the c programming language, most-likely so that the compiler can save memory when compiling your source code. Newer more advanced languages like c# don't need header files because their compilers are smart enough to detect the different functions/classes/variables in separate files without prototypes).

So because of the above design, we need to use crappy header files in our large c++ projects. There are some tricks I've learned over the years to eliminate the need for many duplicate prototype/declaration/definition mess, although it is not commonly used by other c++ coders and they probably won't like the tricks because its not the common way to program. That said I will probably talk more in detail about this in another article.

Now why is the static keyword useful? Well c++ being the evil bastard that it is, has different meanings for the static keyword depending on how its used. So just talking about 'static' in general is not specific enough.

One use of static is in the file-scope. This tells the compiler that the variable/function is unique to the current translation unit (.cpp file) not visible to other .cpp files.

So if you have:

// source1.cpp
static int x = 10;
int main() { cout << x << endl; return 0; }
// source2.cpp
extern int x;
int foo() { x++; }


The above code will give you a linker error after compiling:
source2.obj : error LNK2001: unresolved external symbol "int x" (?x@@3HA)

If you remove the static keyword, then it will compile. The reason is as mentioned before, static will make the variable unique to the translation unit essentially hiding it from other units.

The properties of a static variables declared at global file scope (not in a function) are that they're initiated at the start of the program, and then they're deconstructed at the end of the program.
So if you were to have a line of code that said "static someClass myClass(10);" and it was in global file scope, then the constructor myClass will be called sometime before the main() function in your program, and its destructor gets called when exiting your program.

One misleading thing c++ does is it changes this behavior when you have nested static variables in function scope.

int foo() {
static someClass myClass(10);
return myClass.someFunct();
}


Now you might expect myClass's constructor to be called at the beginning of the program, but this is not the case. Instead the constructor is called the first time the foo() function is ever called! This is a tricky part of the standard that may cause confusion at first.

The interesting thing about this behavior is that although its tricky and seemingly evil, it also is useful. Specifically it allows us to know the time of initialization of the static nested variables. When static variables are initialized in global file scope, it is unknown which order they get initialized in (relative to other static variables, and this is one reason people think they're evil; because they can be initialized before data they depend on is initialized). Using nested static variables eliminates this problem because we know that the first time we're calling function foo(), the constructor of the static variable is called.

Now to take advantage of this we can declare static variables like this:

// Instead of doing this:
// static someClass myClass(10);

// We do this:
someClass& myClass() {
static someClass _myClass(10);
return _myClass;
}


Now we can use 'myClass()' and we know that the first time its called will be when the constructor is called.


Now moving on, we can have static functions as well.
Static functions behave like static variables at file scope because it creates a unique function for the .cpp file where its defined, and its hidden from the other .cpp files.
You might think it doesn't matter, and it doesn't for the most part, but imagine the following code:

// header.h
static int& getInt() {
static int i = 77;
return i;
}

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
}


What do you think the output of the program will be?
You might think the output should be 10, but its not, its 77.

The reason for this is that we declared the function getInt() as static, that means for both source1.cpp and source2.cpp, it created separate local versions of the getInt() function. So when getInt() is called in source2.cpp's foo(), it is a separate function than getInt() the one source1.cpp sees, so when source1.cpp prints out getInt() it prints its own version which was initialized to 77.

What happens if we remove the 'static' keyword from 'static int& getInt()', well we get linker errors:
1>source1.obj : error LNK2005: "int & __cdecl getInt(void)" (?getInt@@YAAAHXZ) already defined in source2.obj

This is because the function is trying to be defined in both .cpp files so the linker doesn't know which version to use.
The usual way to prevent these problems is to define the function in one .cpp file, and then have the function prototype in the header like so:

// header.h
int& getInt();

// source1.cpp
#include "header.h"
void foo();

int& getInt() {
static int i = 77;
return i;
}

int main() {
foo();
cout << getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
}


The program now prints what you expect it to print, "10".


Now that we got that out of the way, we can talk about the static keyword in classes and structs.

When you have a struct/class with a static member variable, you're required to declare it in the struct, but then define it outside the struct.
So you do:

// header.h
struct someStruct {
static int i; // declare
// static int i = 10; // c++ doesn't allow this
};

// source1.cpp
#include "header.h"

// This line must go in only 1 translation unit
// so it should only be put in one .cpp file
int someStruct::i = 10; // define i


There will now be only 1 instance of someStruct::i, and you can use it throughout your code such as 'someStruct::i = 49'.

If you noticed though c++ makes you split the definition of the variable from the declaration; and the definition is put into only 1 .cpp file.
Well there is workaround for this which lets you define it in the same place you declare it.

Before I mention the solution, I'll mention another problem with c++.
What if you wanted to do this:

struct someStruct {
static const int i = 24; // This compiles
static const char* message = "Hello world"; // This doesn't
void foo() { cout << message << " " << i << endl; }
}


The only thing c++ allows us to initialize within a class are "static const int" variables, anything else it requires you to split the initialization part into a .cpp file.

Now whats the solution?
static member functions with static local variables!

Here's cool way to fix the problem:

struct someStruct {
static const int i = 24;
static const char* message() {
static const char* _message = "Hello world";
return _message;
}
void foo() { cout << message() << " " << i << endl; }
}


The evil part about c++ is that static member functions with static local variables behave differently than static file-scope functions with static local variables.

This example will illustrate that:

// header.h
static int& getInt() {
static int i = 77;
return i;
}

struct someStruct {
static int& getInt() {
static int i = 77;
return i;
}
};

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << getInt() << endl;
cout << someStruct::getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
someStruct::getInt() = 10;
}


What do you think the output of the program will be?
Well the first cout with the file-scope function we know will print '77', because this is the same example i gave earlier in this article. But strangely the second line with 'someStruct::getInt()' returns 10!

The reason for this is that static local variables in class member functions will only have 1 instance of themselves, even throughout different translation units!

This is a powerful thing to know and can be used for random tricks.

This last example i'm going to give will show you a trick so that you can declare and define static variables in a header file that are used in different .cpp files, but the variable will be the same throughout all the different translation units (so it doesn't make unique copies like normal static at file-scope does).

// header.h
struct someStruct {
static int& getInt() {
static int i = 77;
return i;
}
};

static int& i = someStruct::getInt();

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << i << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
i = 10;
}



The above code prints out 10.
Notice how we didn't have to declare any variable in the individual .cpp files and could declare and initialize the variables all in the header file.
This trick can be useful, so use it wisely.

Hope this article was useful for at least someone :p

Friday, December 24, 2010

Append integer to string (c++)

In this blog article we'll be using operator overloading with strings to let us append integers to the end of them (something pretty useful which std::string doesn't do for us already! :/)

We will be using some code described in a previous article to do the actual conversion from int to string.

We will be overloading the "<<" operator so that we can append to a string, and we will use the "+" operator so that we return a copy of the result of appending a string.

So it will work as follows:

int main() {
string str0("hello");
string str1("hello");

cout << str0 << endl; // prints 'hello'
str0 << 123;
cout << str0 << endl; // prints 'hello123'

cout << str1 << endl; // prints 'hello'
str1 + 123; // does not modify str1 (effectively does nothing)
cout << str1 << endl; // prints 'hello'
cout << str1 + 123 << endl; // prints 'hello123'
return 0;
}


Note: The reason we're overloading '<<' instead of '+=', is because '+=' is not an operator that allows for global operator overloading. See this link for more details on that.

So how do we make the above code work?
We need to add the following above the 'main()' function:

template<typename T>
string toString(T t) {
stringstream s;
s << t;
return s.str();
}

string& operator<<(string& s, int i) {
return s.append(toString(i));
}

string operator+(string s, int i) {
return s.append(toString(i));
}


Using the above overloads, the code in the main() function compiles and prints what we wanted it to.

Notice that the '<<' operator overload takes a reference to a string as one of its parameters, this is done so that the actual string is modified. The '+' operator overload takes a copy of the string and then does the appending, so the original string is not modified.

We can extend the above code to work for floats, doubles, and any datatype that our toString() function accepts. One way we do that is to use templates on our operator overloads.

Here's the new code:

template<typename T>
string& operator<<(string& s, T i) {
return s.append(toString(i));
}

template<typename T>
string operator+(string s, T i) {
return s.append(toString(i));
}


Although the above code works and is cool, there is a downside to doing this.
It won't allow us to do stuff like: string("s") + "hello", anymore. If we try to do that the compiler will generate a ambiguity error because it doesn't know which overload to choose. The +(string, char) overload is already defined by standard strings and our above template operator overload also handles this case; so the compiler doesn't know which to use and generates an error.

Our solution then is to not use templates and just manually overload the operators for the specific types on the + operator, but to use templates for the << operator since the c++ standard doesn't overload the <<(string, char) operator.
This is the code I currently use in my projects:

template<typename T>
string& operator<<(string& s, T i) {
return s.append(toString(i));
}
string operator+(string s, int i) {
return s.append(toString(i));
}
string operator+(string s, float i) {
return s.append(toString(i));
}
string operator+(string s, double i) {
return s.append(toString(i));
}


So there you have it, you can now do stuff like:

double myDouble = 123.79;
string s("My double is ");
s << myDouble << ". My int is " << 12 << ".";
cout << s << endl; // prints 'My double is 123.79. My int is 12.'

// this prints the same thing as above but uses
// printf() and C-style strings (null-terminated char arrays)
printf("%s", s.c_str());

Returning Arrays by Reference in C++

Last article we talked about returning arrays, but we only did it by value (so that a copy of the data is returned). This time we'll look at returning arrays using references.

This feature is something that I rarely see used in practice, I think partially because the syntax is so bizarre and confusing to those that have never seen it. I first saw it in some code by shuffle2 and thought it was crazy; but now I understand why the syntax is the way it is.

Here's how to do it:

int testArr[5] = { 1000, 1, 2, 3, 4 };

int (&retArr())[5] {
return testArr;
}

int main() {
int (&arr)[5] = retArr();
cout << arr[0] << endl; // prints 1000
return 0;
}


The 'retArr()' is the function which is returning a reference to an int array with 5 elements. If you wanted any parameters for the function 'retArr()' you can place them in the '()' like you normally would.

The seemingly awkward syntax becomes less awkward when you think of the syntax for declaring references to an array.
'int (&arr)[5] = ...' is how you would declare a normal reference to an array, so studying that syntax and then looking at the function prototype 'int (&retArr())[5]' should help you understand it.

Also why is returning a reference to an array useful?
Well its useful for various reasons, but I'll give an interesting example.

Assume that you have a memory buffer that was dynamically allocated (for w/e reason, maybe a custom allocation routine to guarantee alignment), then you want that buffer to be thought of as a fixed-size array; well you can do that by returning a reference to an array.

Like so:

struct myStruct {
int* myPtr;
myStruct() {
myPtr = new int[5];
myPtr[0] = 1000;
myPtr[1] = 1;
myPtr[2] = 2;
myPtr[3] = 3;
myPtr[4] = 4;
}
~myStruct() {
delete[] myPtr;
}
int (&getArray())[5] {
return (int(&)[5])*myPtr; // cast to a reference
}
};

int main() {
myStruct testStruct;
cout << testStruct.getArray()[0] << endl; // prints 1000
cout << sizeof(testStruct.getArray()) << endl; // prints 20
return 0;
}


The benefit of returning an array by reference instead of just an int pointer in the above code is that the former explicitly tells the programmer reading the function signature that the buffer being returned holds 5 ints, returning an int pointer instead will not tell us how much elements are in the array.
Also sizeof(testStruct.getArray()) returns 5*sizeof(int), whereas if we were returning an int pointer, it would just return sizeof(int*), which most-likely isn't what we wanted.

Sunday, December 19, 2010

Returning arrays in C++ (including multi-dimensional arrays)

Last article we were talking about passing arrays as arguments and it got a bit long, so this time I'm going to try to keep it shorter.

First I'll say that like when we passed an array by value in the last article, there is no built in easy way to return arrays by value, so we have to use workarounds.

The first thing I'll show is some WRONG CODE that beginners might try to use:

int* getArray() {
int my_arr[10] = { 0,1,2,3,4,5,6,7,8,9 };
return my_arr;
}

int main() {
int* arr = getArray();
arr[1] = 4;
for(int i = 0; i < 10; i++) {
cout << arr[i] << endl;
}
return 0;
}


Can you see the error?
Well the problem is that in the function getArray() the array 'my_arr' is being allocated on the stack, and then you are returning a pointer to it. By the time you're back in the function main() 'my_arr' is not guaranteed to be valid anymore so using it as if its valid will cause undefined behavior. So don't do this.

Now lets look at some correct ways to do this.
One way is to use dynamic memory allocation to allocate the array on the heap and then pass a pointer to it, and then consider that memory as an array.

It looks like this (also note that I'm going to use a multi-dimensional array so people know how to do this):


const int n = 5;
const int m = 5;

int* getArray() {
int* arr = new int[n*m];
for(int i = 0; i < n*m; i++) arr[i] = i;
return arr;
}

int main() {
int* ptr = getArray();
int (&arr)[n][m] = (int(&)[n][m])*ptr;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << arr[i][j] << endl;
}}
delete[] ptr;
return 0;
}


In the function getArray() we allocate a chunk of memory the size of n*m*sizeof(int) bytes. Then we initialize it to the numbers 0...n*m-1, then we return it as an int*.
Back in the function main() we get the ptr and then consider it as a multi-dimensional array by using references. Here you can see how to cast a pointer to an array (for single-dimensional arrays just ignore the extra [m] part).
Lastly we need to remember to delete[] the ptr, because we allocated this memory on the heap so it won't automatically delete it for us.


Since the above method has us needing to delete[] our memory ourselves, it is slightly inconvenient. This last approach doesn't have that problem, we use a struct like we did when we passed arrays as arguments in the last article.

const int n = 5;
const int m = 5;

struct arrayStruct {
int arr[n][m];
};

arrayStruct getArray() {
arrayStruct t;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
t.arr[i][j] = i*m + j;
}}
return t;
}

int main() {
arrayStruct a = getArray();
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << a.arr[i][j] << endl;
}}
return 0;
}


When we use a struct to encapsulate the array, c++ creates a default copy constructor that will copy our data created in getArray() to our 'a' struct in main().

If we didn't want to access the array by using 'a.arr[i][j]', we could again use references like so:

int main() {
arrayStruct a = getArray();
int (&arr)[n][m] = a.arr;
for(int i = 0; i < n; i++) {
for(int j = 0; j < m; j++) {
cout << arr[i][j] << endl;
}}
return 0;
}


We have now seen a couple ways to do this, there are more ways to accomplish this task but this blog post would get huge if I continue listing the various ways.

Saturday, December 4, 2010

Passing Arrays as Arguments in C++

In C++ and other programming languages there is the concept of Arrays.
One of the things you might want to do with arrays is pass them to another function so that that function can read the data in the array.

The simplest thing you might think of to accomplish this task might be:

void foo(char* a) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


And this does work. What its doing here is passing the pointer to first element in 'my_array' to the function 'foo', and 'foo' uses it as a normal char*.


The problem with this approach is if you want to specify that you want an array of a certain size as the parameter to the function. Since we're treating the array as a pointer, there is no hint to the programmer of the size needed for the array.

There is a feature that originated in C which lets us specify a size parameter, and it looks like this:

void foo(char a[128]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


Now this method is pretty evil in C++ (although in C I guess its more justifiable since it doesn't support the better way which i will explain later).

Now in the function 'foo' you are hinting to the programmer that you want an array with 128 elements, but guess what happens if you give it an array of 10 elements?
Nothing happens, its perfectly fine accepting that.

What happens if you pass it char* instead of an array of char?
Nothing happens, its perfectly fine accepting that.

Also notice what this function prints out. It prints out '4' for sizeof(a)!?

Any experienced coder will know that sizeof(some_array) will be the size of one element times the number of elements. So why is it giving us '4' instead of '128'?

Well it turns out that this feature is equivalent to the first method we used, that is it is as if we're passing a "char*", not an array by reference (as we may have thought).

So once we realize that, it all makes sense, sizeof(char*) on a 32bit system is '4', so that's how we got that number.

For these reasons this approach is what I would call 'evil'. You would expect it to behave a certain way, but it doesn't.

There is a special form of the above C-feature that looks like this:

void foo(char a[]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 4
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


This time in the function 'foo' we didn't specify a size of the array.
I don't think there's any reason to use this form as opposed to 'char* a' since they both behave the exact same way. And this time you're not even hinting to the programmer how big you want the array, so its kind-of pointless to have this notation.
'char a[]' might look cool though compared to just using 'char* a', so maybe that's why someone might want to use it. Although i would just recommend using 'char* a' since its more commonly seen (and therefor easier to read imo).


Now that we learned all these bad ways to pass arrays in C++, lets look at a good way.
Passing an array by reference! (Before we were just using different forms of passing a char*, but this time we will pass the array by reference and the compiler will understand that it is array).

It looks like this:

void foo(char (&a)[128]) {
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 128
}

int main() {
char my_array[128] = {0};
foo(my_array);
return 0;
}


Aha! Finally the sizeof(a) is printing out 128 (what we expected it to).
Now 'a' is behaving like an array instead of 'char*' and that is what we wanted.

Now guess what happens if we try to pass an array of 10 elements to function 'foo'?
We get a compiler error! The compiler knows that an array of 10 elements is not an array of 128 elements, so it gives us an error.

Now guess what happens if we try to pass a char* to the function 'foo'?
We get a compiler error! The compiler knows that a char* is not the same as an array of 128 elements, so it gives use an error.

The compiler errors are useful in order to prevent bugs by people who mistakenly are passing arrays of incorrect size to the function.
In another article I will explain ways to circumvent such compiler errors when you 'know' for sure that the pointer/array you're passing is suitable for the function 'foo' but may not have the same type (such as an array of 1000 elements, whereas the function foo will only accept an array of 128 elements).



We have learned how to pass arrays using pointers and references (which means that any data of the array 'a' that was modified in function 'foo' modifies the data in 'my_array'), now we will learn how to pass an array by value (which means that a copy of the array data will be transferred via the stack to function 'foo', so that modifications to 'a' will not effect 'my_array').

Now here's the funny part about this, you can't do it! At least there's no fancy parameter declaration that lets you do this.

There are various workarounds for this problem, and one of them is to create a struct which will act as the array, and then pass the array as that struct on the stack.

For example:

struct temp_struct {
char a[128];
};
C_ASSERT(sizeof(temp_struct)==sizeof(char)*128);

void foo(temp_struct t) {
char (&a)[128] = t.a; // reference to the array t.a
cout << a[0] << endl; // prints 0
cout << sizeof(a) << endl; // prints 128
}

int main() {
char my_array[128] = {0};
foo((temp_struct&)my_array);
return 0;
}


Notice what we did here. We created a struct 'temp_struct' which holds only a single array of the size we're passing. (I will explain what the C_ASSERT does in a bit).
Then we made foo() take as a parameter the 'temp_struct' that we defined earlier.

In the first line of foo's body, we create a reference to the first element inside the foo struct. Then we use this array reference like normally.

Back in the function main(), we need to typecase my_array as a reference of type temp_struct. Basically what we're saying to the compiler is that this array should be treated as if it were a temp_struct, without doing any conversion of the data.
Now the last thing is, since the compiler thinks my_array is a temp_struct, it will copy over the array on the stack (like it would do for any other struct that was passed by value). So with that we have completed our goal.

C_ASSERT is a compile-time assert, and is useful for things like making sure structs you've declared are the size you expected them to be.
The C_ASSERT in the above example is added just to make sure the compiler is generating the struct the same exact size as the array we're dealing with.
If the compiler didn't make the struct the same size, then we would get a compiler error.
I think that a compiler will never end up breaking that C_ASSERT. But I don't know the full c++ standard well enough to guarantee that will never happen, so that's why I add the check in the first place.

Anyways hopefully this article was informative, and by now you should know how to pass an array by treating it as pointer, pass an array by reference, and pass an array by value.