Saturday, January 24, 2015

Compilers and costs of abstraction

Experimenting recently with embedded development I wondered how much abstraction comparable to virtual function table would cost. So I quickly wrote a small test app to check this:

#ifdef __AVR__
#include <avr/io.h>
void led_on() { PORTB |= 0x04; }
void led_off() { PORTB &= (unsigned char)~0x04; }
#else // __AVR__
#include <mcs51/8051.h>
void led_on() { P0 |= 0x04; }
void led_off() { P0 &= (unsigned char)~0x04; }
#endif // __AVR__

typedef void (*func_t)(void);
const func_t a = led_on;
const func_t b = led_off;

void main()
{
    while(1)
    {
        a();
        b();
    }
}

Compiling it with SDCC 3.4.0 for 8051 target and checking asm output I've got exactly what I expected - explicit reads from code segment and indirect calls. Compared to simple port bit manipulation this looks pretty overcomplicated:

_led_on:
        orl     _P0,#0x04
        ret
_led_off:
        anl     _P0,#0xFB
        ret
_main:
00102$:
        mov     dptr,#_a
        clr     a
        movc    a,@a+dptr
        mov     r0,a
        mov     a,#0x01
        movc    a,@a+dptr
        mov     dph,a
        mov     dpl,r0
        lcall   __sdcc_call_dptr
        mov     dptr,#_b
        clr     a
        movc    a,@a+dptr
        mov     r0,a
        mov     a,#0x01
        movc    a,@a+dptr
        mov     dph,a
        mov     dpl,r0
        lcall   __sdcc_call_dptr
        sjmp    00102$

Now, it's GCC time. Compiling source with "avr-gcc -mmcu=atmega328 -O2 -flto main.c" gives the following output:

00000080 <main>:
  80: 2a 9a       sbi 0x05, 2 ; 5
  82: 2a 98       cbi 0x05, 2 ; 5
  84: fd cf       rjmp .-6       ; 0x80 <main>

Uhm, applause to GCC developers :) Compiler is smart enough to see that function pointers are constant in this case, so it can call functions directly. And since functions are really tiny it decides to inline them. 

Conclusion? If you have smart enough compiler then constant function pointer tables are just as efficient as calling functions directly, with possible automatic inlining. And if you really need dynamic polymorphism you don't need to rewrite anything - just use non-constant pointers.

And, surely - GCC rocks :)

Sunday, January 4, 2009

SH light pre-pass renderer?

I really like idea of light pre-pass approach (credits to Wolfgang Engel here), but there is still one drawback - it's difficult to implement significantly different shading models.

Recently I was playing with SH-based lighting, and suddenly one idea has struck me :) It is theoretically possible to accumulate irradiance into SH in screen space using only depth-buffer, then render scene with this information. Of course, implementing this straightforward is just sick, as it would require at least 4x3 channels per pixel for linear SH only, not to mention quadratic SH needing 9x3 channels per pixel. On the other hand, there are some optimizations possible. A few that immediately come to mind:

1) Store color for 0-th band only, this will drastically reduce space needed to 3+3 or 3+8 channels, and there is some hope that subjective quality won't be too bad since human eye is much more sensitive to intensity than to color

2) If we forget about shadows for a while, irradiance is quite a low frequency function, so it might be possible to store it at lower resolution, and still get nice per-pixel bump-mapped details. That way framebuffer memory requirements could be lowered by a factor of 4x-16x or even more. This can be used to increase SH order or to store more color information. Another possibility is to store irradiance for different depths so that we can handle transparency in the same rendering path.

3) Considering shadows - this needs experimentation, but I still believe that it would be sufficient to have 1 irradiance sample per 2x2 pixel block for a good quality, or 4x4 block for average quality. Furthermore, shadow filtering becomes much cheaper since we are already working at lower resolutions.

I'm considering trying to implement this in near future, unless people at gamedev.net forums tell this is way too mad :)

P.S. One more thought - using second very low resolution screen space ambient irradiance map (something like 8x6, maybe 2 or 4 layers) it might be possible to regenerate it in real time, achieving realtime global illumination :)

Wednesday, July 23, 2008

First post

Well, that's it. I'm working as a graphics programmer in a small company, and currently I'm starting a major rework of hastily written graphics engine to improve quality, features and speed. The purpose of this blog is to share ideas and experimentation results I come up with. Another goal is to improve my written technical english skill, since I believe that ability to clearly express ideas in different languages (my native is russian) should help thinking in more ways, and as a neat side effect - widen job opportunities "just in case".