Wednesday, June 1, 2011

Don't initialize that big static data! (c++)

I learned this recently, and so I assume not a lot of people realize this.
But lets take a look at some sample code:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
int main() {}

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10] = {7};
int main() {}

Now whats the difference between both of those programs? Well the only difference is that the bottom one is initializing a[0] to 7. Both of the programs initialize the rest of the a[n] data to 0.
In the first one, all of 'a' gets initialized to 0 because all static/global data that is not explicitly initialized gets initialized to 0 (as per some rule in C).
In the second one, the first element of 'a' gets initialized to 7 because we explicitly told it that in the array initializer and the rest of the data is filled with 0's because when you have an array initializer that does not cover all of the elements, the rest is treated as 0's.

Now why do we care about this?
Well why don't you try compiling both of those programs and take a look at the executable file?
You might be surprised to find out that the first executable would be just a few kb, while the second would be over 10mb!

Why is this?
Well it turns out that compilers/linkers end up putting uninitialized static data in its own section called the bss section. This section consists of data that is initialized to all 0's. This means that the executable doesn't need to store all that data in itself, it just needs information of how big the variable is, and where to place all the 0's in memory; it then does this when the application starts up.

In contrast, the second example now explicitly initializes the data. Now the compiler/linker places that data in the data section, and it actually stores the full contents of the data in the executable. This means when you have an explicitly initialized 10mb chunk of data, your executable file will grow by 10mb!

The exception to the above rule is data that is initialized to 0. Like if we had "char a[_1mb*10] = {0};", now what will happen?
Well it turns out it can either be put in the bss section or the data section. Apparently some compiler/linkers might be stupid enough to put it into the data section meaning the executable will still grow by 10mb. Thankfully msvc isn't that stupid and it doesn't do that (the built executable is the same as if you didn't explicitly initialize 'a').

Anyways, the moral of the story is to keep the above information in mind when you are explicitly initializing data.
What would be a way to explicitly initialize a[0] to '7' at program startup w/o bloating the executable an extra 10mb?
Well this is one way:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
struct _init_a {
_init_a(char* c) { c[0] = 7; }
} __init_a(a);
int main() {}

Now __init_a's constructor will end up initializing a[0] to 7 at the application startup like we wanted. (Note: remember to beware of the static initializion order fiasco problem in c++ if you will have other static variable's constructors using 'a').


  1. >> In the first one, all of 'a' gets initialized to 0 because all static/global data that is not explicitly initialized gets initialized to 0 (as per some rule in C).

    It's not a C or C++ rule ... it's just convenient because of how modern virtual-memory-enabled operating systems work. On modern operating systems, the executable can simply "reserve" the block of memory needed for the .bss rather than charging a commit against the physical pool. Only when pages of the .bss is actually read does memory get committed to physical existence. Committed memory is *always* cleared to zero automatically, since the memory may have belonged to another process previously. If the memory isn't forcefully cleared by the operating system then a malicious program could save the 'uninitialized' data and essentially gain access to data that doesn't belong to it: a massive security risk on multi-user server machines in particular.

    Thus the C program can just rely on the .bss contents to be zero. If the application never actually uses the array, it doesn't even get allocated into memory. If the app uses only a small part of the memory, only that small part gets initialized and committed.

  2. I'm pretty sure the compiler can choose to initialize:

    char a[_1mb*10];

    ... as a debug value such as 0xcdcdcdcd if it wants to. That's not the default behavior (especially in release builds) since it would needlessly bloat executable size, memory consumption, and program load/startup time. The .bss virtual memory system is a lot more efficient for all of those.

  3. "It's not a C or C++ rule ..."
    it is part of the standard to initialize global/static variables to 0.
    although i have not read the part of the standard, everything i've read and c++ experts i've talked to seem to confirm it is part of the standard.

    which means regardless of how the OS allocates the memory, the c/c++ compiler must make it so that chunk of memory is initialized to 0 or else it won't be standards compliant.

    i suppose a compiler could have a debug option to initialize the values to something else to check for uninitialized globals, but that's not part of the standard.

  4. hmm, indeed. There seems to be a lot of conflicting sources on it (as with far too many C/C++ standards items, sigh). Witness one source I used before I responded:

    "When declaring a regular local variable, its value is by default undetermined. But you may want a variable to store a concrete value at the same moment that it is declared. In order to do that, you can initialize the variable."

    (I also found a corroborating source on StackOverflow last time; but can't locate it now).

    It's also quite possible that older C and C++ standards were "indeterminate" and that newer incarnations (C99, etc) have been updated accordingly to make it part of the standard. I seriously doubt the old circa-78 C standards called for zero-value initialization, since they could hardly afford to stick any data in the .bss that wasn't explicitly requested.

    Blah. And to think. Standards are usually designed in the hope of making things simpler. ;)

  5. Well that source technically isn't wrong, it says a "regular local variable", so i assume its talking about a non-static local variable on the stack, where the initialized value truly is indeterminate.
    it is possible older C/C++ standards were indeterminate, not sure. This rule has likely been there since c89/c90 though since current C++ doesn't retain compatibility with c99.

    the scariest part of c/c++ standards is really that compilers don't always follow them.
    also c++ standards committee care way too much about speed instead of reliable code, so they leave too much stuff as undefined behavior to the point where code does things that make no sense.
    with proposed changes for c++0x, it is possible that "hello world" prints out in the following code:
    "int main() { while(1) {} cout << "hello world"; }"

    check out this link:

  6. ... egad. >_<

    Speaking of which, many of the "simd/threading optimization tools" out there are exceedingly dangerous at best, and woefully inefficient *and* exceedingly dangerous at worst.

    I was looking at a few of the newer ones, all kinda based on OpenMP and the ilk, and all hoping to "make threading easy!" .. The lists of gotchas and caveats for each was mind-boggling, and the number of "ideal situations" where they actually work is usually so limited you can cover *all* of them in a few pages (and by covering them, I mean explicit code samples).

    Agner Fog did a nice breakdown of the vectorized and multithreaded optimizers on one of his latest PDF updates. It's just insane how inconsistent they are. So all they do is transfer your time from just doing a little red tape needed to custom-define SIMD and threading parameters to doing a LOT of guesswork trying to fool the compiler into deciding to optimize (or subsequently NOT optimize, when its not ok!) an otherwise simple simd/threading task.

  7. And on a thorough read of that link you posted, this sounds a lot like the sort of idle loop speedhacking that suits emulators well (most of the time), by trying to detect and remove programmer-imposed manual wait loops. ;)

  8. Yeh its similar to the idle loop speedhacking, except on pcsx2 we consider it a speedhack, whereas this being in the standard it means the hack it entitled to be always on :/
    i can understand some of the motivation behind the idea of allowing such optimizations (an idle loop should be rare, so its optimizing for the more common cases where the loop will actually terminate but the exact moment is unknown to the compiler).
    however i think the c++ committee often goes too far with its love for speed. they rather the language be "unpredictable and super fast" instead of "predictable and fast". imo its a horrible for a programming language to be designed with such principles.
    however the annoying thing is that the other languages that attempt to design themselves better than c++, such as c#, take the other extreme where they limit the languages features too much in attempt to prevent programmers mistakes. that imo is not the right way to design the language either.
    instead i want something very similar to c++, but without the stupid header/prototypes/TU non-sense, and where code is defined to behave the way the programmer intuitively thinks it will work.