I think the picture speaks for itself xD
Funny thing is the bug wasn't actually caused by the shader itself, instead it was due to the way I applied the shader. On my emu I had 3 separate textures for Sprite Background, Sprite Foreground, and Tile Background. Then I would render all 3 of these textures to the output buffer. When I applied the shader it applied it to each individual layer (sprite/bg) instead of the picture as a whole. The sprite layers for instance were just a few sprites with a lot of transparency, and running the Eagle2x algorithm on this layer caused artifacts.
The solution then was to combine all 3 layers into 1 texture before applying the shader; using 1-complete texture fixed the cause of the artifacts. I ended up coming up with some pretty cool bitwise-based merge function which combined 3 bitmaps arrays according to priority and transparency without having to use any conditional statements.
void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
s32 m0 = ((s32)(buffer[0][y][x])) >> 31;
s32 m1 = ((s32)(buffer[1][y][x])) >> 31;
s32 m2 = ((s32)(buffer[2][y][x])) >> 31;
m0 &= ~m1 & ~m2; m1 &= ~m2;
buffer[0][y][x] = buffer[0][y][x] & m0;
buffer[0][y][x] |= buffer[1][y][x] & m1;
buffer[0][y][x] |= buffer[2][y][x] & m2;
}}
}
Essentially the code creates masks by checking if the MSB is set (since the most significant byte is 0xff on non-transparent pixels, and 0x00 on transparent pixels, the most-significant bit is set when non-transparent; of-course this code won't work properly with partial-transparencies).
After it creates the masks, it uses some bitwise math to make sure the pixel with the highest priority is the one that is shown, and leaves the resulting pixel in the buffer[0] bitmap layer.
Note: The priority is (buffer[2] > buffer[1] > buffer[0]). So the pixel in buffer[2] is the top-most.
I initially had thought this bitwise version would be faster than a version using conditionals, but I was wrong. The compiled code for this is pretty bloated. Instead the conditional version compiles to much better optimized code:
void combineLayers() {
for(int y = 0; y < bHeight; y++) {
for(int x = 0; x < bWidth; x++) {
if (buffer[2][y][x] >> 31) buffer[0][y][x] = buffer[2][y][x];
elif(buffer[1][y][x] >> 31) buffer[0][y][x] = buffer[1][y][x];
elif(buffer[0][y][x] >> 31) buffer[0][y][x] = buffer[0][y][x];
else buffer[0][y][x] = 0;
}}
}
So I guess the lesson learned here is that many times using conditionals are better than complex bitwise operations. Even though the bitwise versions may seem more clever and quicker on first impression.
Note: "elif" in the above code is a macro for "else if"
#define elif else if
I like the elif macro because it keeps code more compact. Using "else if" almost always messes up text alignment and due to having too much letters.