Monday, September 27, 2010

Fun Shader Bugs

Yesterday I ran into a funny bug when implementing an Eagle2x pixel shader.

I think the picture speaks for itself xD

(Click the picture to see the magnified version and spot the bug :D)

Funny thing is the bug wasn't actually caused by the shader itself, instead it was due to the way I applied the shader. On my emu I had 3 separate textures for Sprite Background, Sprite Foreground, and Tile Background. Then I would render all 3 of these textures to the output buffer. When I applied the shader it applied it to each individual layer (sprite/bg) instead of the picture as a whole. The sprite layers for instance were just a few sprites with a lot of transparency, and running the Eagle2x algorithm on this layer caused artifacts.

The solution then was to combine all 3 layers into 1 texture before applying the shader; using 1-complete texture fixed the cause of the artifacts. I ended up coming up with some pretty cool bitwise-based merge function which combined 3 bitmaps arrays according to priority and transparency without having to use any conditional statements.

void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
s32 m0 = ((s32)(buffer[0][y][x])) >> 31;
s32 m1 = ((s32)(buffer[1][y][x])) >> 31;
s32 m2 = ((s32)(buffer[2][y][x])) >> 31;
m0 &= ~m1 & ~m2; m1 &= ~m2;
buffer[0][y][x] = buffer[0][y][x] & m0;
buffer[0][y][x] |= buffer[1][y][x] & m1;
buffer[0][y][x] |= buffer[2][y][x] & m2;

Essentially the code creates masks by checking if the MSB is set (since the most significant byte is 0xff on non-transparent pixels, and 0x00 on transparent pixels, the most-significant bit is set when non-transparent; of-course this code won't work properly with partial-transparencies).
After it creates the masks, it uses some bitwise math to make sure the pixel with the highest priority is the one that is shown, and leaves the resulting pixel in the buffer[0] bitmap layer.
Note: The priority is (buffer[2] > buffer[1] > buffer[0]). So the pixel in buffer[2] is the top-most.

I initially had thought this bitwise version would be faster than a version using conditionals, but I was wrong. The compiled code for this is pretty bloated. Instead the conditional version compiles to much better optimized code:

void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
if (buffer[2][y][x] >> 31) buffer[0][y][x] = buffer[2][y][x];
elif(buffer[1][y][x] >> 31) buffer[0][y][x] = buffer[1][y][x];
elif(buffer[0][y][x] >> 31) buffer[0][y][x] = buffer[0][y][x];
else buffer[0][y][x] = 0;

So I guess the lesson learned here is that many times using conditionals are better than complex bitwise operations. Even though the bitwise versions may seem more clever and quicker on first impression.

Note: "elif" in the above code is a macro for "else if"

#define elif else if

I like the elif macro because it keeps code more compact. Using "else if" almost always messes up text alignment and due to having too much letters.

Sunday, September 26, 2010

cottonNES - More Progress...

So I have been working diligently on cottonNES, and have made some progress over my last update.

As far as the core goes, I've been able to pass some more cpu/ppu tests, but my emu still fails many of blargg's timing tests (those things are brutal!). I already switched my cpu core to a cycle-accurate design, this proves to be needed for a few games (Ms. Pacman, Bad Dudes, Baseball Stars 2, etc...). Whats cool is my emu now runs Battletoads and Battletoads & Double Dragons, these 2 games are very timing sensitive and many NES emulators still fail to run them properly (some even crash...). That said, the games still have graphical problems, but at least they go in-game.

Something else I implemented was joypad support. Currently its just hard-coded to match my ps3-controller, but in the future I'll do a fancy configuration screen. I ended up going with SDL for the joypad API because I just wanted something simple to use. Perhaps in the future I will rewrite the code with direct-input since i wasn't very satisfied with SDL.

The last big thing I did was support for pixel shaders! I'm a shader noob, but between yesterday and today I've learned enough of HLSL to make 2 shaders. One shader is a scanline shader, and the other is a Scale3x shader. The cool thing is this offloads the filtering onto the GPU so the CPU can do less work, overall the Scale3x filter was about 54% faster than my C++ based attempt (222fps vs 144fps in a scene). In the Super Mario Bros 2 pic above, I have the Scale3x shader on so you can see what it looks like.

Anyways, I still have a lot more work to do with my emulator before its ready for a release; particularly PPU and APU need more work since they're the cause of most problems I'm having now. I hope I continue to be motivated so that cottonNES can become one of the better NES emulators out there; I feel it has the potential, but it just depends on if I continue working on it.

Thursday, September 16, 2010

NES emu progress

For those that don't know, I had started a NES emu project from scratch a while back, but then I got bored and stopped working on it in favor of some other projects.
This past week I have regained my interest in it and have made some progress.
After some hours of debugging, it now passes all normal opcode tests in Kevin Horton's nestest.nes rom.
Last week it failed most of those tests miserably (turns out I was just setting wrong flags in a few opcodes like BIT and PLP).

I currently have mappers 0(no-mapper),1,2,3,4,and 7 implemented. I think mapper #1 has a few bugs and mapper #4 needs to have its IRQ timer code redone later on to be more accurate (but its okay for now).
Mappers 2,3, and 7 are kind-of fun because they're so easy to implement. Each can be done in just a few lines of code.

For example, my Mapper #2 implementation is just:

class Mapper2: public Mapper {
Mapper2() { Init(); }
virtual ~Mapper2() { Close(); }
void WritePROM(u16 addr, u8 value) {
clamp(value, ROM.romBanks, "Mapper2: PROM");
PROM_SLOT[0] = _PROM[value*2+0];
PROM_SLOT[1] = _PROM[value*2+1];

Something I've noticed when browsing through other NES emu's source-code, is that many of the emulators have horribly messy code. There is one popular emulator that uses OOP too much. When this happens it fucks up intellisense and makes browsing code a PITA. You right click on any method to go to the definition and intellisense finds ~50 possible matches and makes you chose manually which method is the one you're looking for. This always happens in projects that use too much OOP and is a very annoying problem.

Another popular emulator does the opposite; it doesn't use any OOP even though the code is now in c++ (I guess originally it was in C). Anyways the whole code is very C-like and abuses macros like crazy, this makes everything a mess. Its a common mistake to abuse macros; I did it when I started with c++, but any good coder knows that you should only use macros when there's no better way. Oh and the code uses 2-space tabbing which is ridiculous.

On the subject of macros, its more understandable to see macro usage in C code compared to C++ code. This is because C is very limited compared to C++, and there is a lot more stuff you can't do so you use macros to simulate these features. In C++ however you have templates and references which can help you remove the need for many macros. This is a big reason why C++ is better than C.
Many C-programmers move on to C++ and still code C-like because they don't know proper C++, that could be the case with the nes-emu I looked at. Anyways I suppose I'll save a C/C++ comparison rant for another blog post.

So back to talking about my emulator, cottonNES...
My emulator is no-where near ready for a release. Although it boots hundreds of games now that the big cpu bugs are fixed, it still has problems in many games (most-likely to do with PPU and timing issues...).

I would like to continue discussing nes emulation further here, but sadly I have an exam this week and need to study D:

I suppose this post will be boring without screen-shots :D

Saturday, September 4, 2010

Using the volatile keyword to prevent float precision problems

Consider the following code:

int main() {
for(float i = 0; i < 100; i++) {
float f = i / 100.0f;
if (f != i / 100.0f) {
cout << i << ", ";
cout << endl;
return 0;

One should expect the above code to not print out anything, since the 2 operations are obviously equal.
Ironically, if you compile this code in msvc or gcc, you will most-likely get a bunch of numbers spammed to the console.

Why is this? Well it seems that the compilers take liberties with floating point calculations; even though you're dealing with single-precision 32bit floats, the compilers may use the full 80bit precision of the x87 FPU in calculations, or may even use double-floating point arithmetic for one of the operations, and use single-floating point for another one... mix and matching precisions, causing problems...
In the end, the different precision of the operations will cause the results of two seemingly equal operations to be unequal.

One solution is to use the volatile keyword.
The volatile keyword essentially makes the compiler always write-back values to memory once an operation is preformed, and it always reloads the value from memory when it will be used again.
This allows us to force the compiler to truncate the values to 32bit floats when it writes them to memory, and when it loads the value again it will be limited to the precision of a 32bit float.

So now we can do:

int main() {
for(float i = 0; i < 100; i++) {
volatile float f1 = i / 100.0f;
volatile float f2 = i / 100.0f;
if (f1 != f2) {
cout << i << ", ";
cout << endl;
return 0;

This will fix the problems we were having before, and not print out anything.

There are many papers and discussions about ways to compare floating point values, and I won't go into more detail here.
This site lists a few methods, and the problems they could have.
So I suggest reading that if you're interested.