By Madram / Overlanders.
So, what is a calling convention?
Look at CPC firmware routines (BC80 aka cas_in_char and co). The input (which registers are taken, what is the role of each), output and resulting flags are described, in what we could call now an API. That’s not a calling convention!
In C, the functions are turned into routines, and the way they take parameters could be arbitrary. The compiler has to pick a generic way to pass parameters, especially since the caller of a routine might be isolated (in a separate and independent compilation unit) from the callee. Also, the number of parameters could be greater than the number of available registers. This generic way forms a calling convention.
Now, programmers are lazy, or maybe they have to leave early to watch telescreen. What is often done is to pass all parameters via the stack. The caller pushes them and the callee gets them via indexed read (IX+n). It allows to work with the parameters in any order, without having to juggle registers. On modern CPUs, dedicated instructions make it very fast (not surprising since the CPUs kind of evolved to match the C paradigm, including its flaws).
What if the caller and the callee are in the same file? Couldn’t they agree on a smarter protocol (non-violent communication)? That’s not enough. In C, all functions are visible (“exported”) by default, meaning they could be called by another compilation unit. To indicate the function is “private”, you have to use the static keyword. See figure 1.
// test_static.c
#include <assert.h>
// We introduce some dummy functions for test purposes.
// This one puts the length of str (nt-string)
// both in result (dummy) and return value.
// Note that we typically expect int to be 16 bits here.
static int g(const char *str, int *result) {
int i = 0;
while (*str) ++i;
*result = i;
return i;
}
// Another dummy, expected to be inlined.
static const char* h(const char *str){
return str+1;
}
// Random comment in the middle of the file.
static int k(const char *str) {
if (*str == 'x')
return 42;
else
return 0;
}
// Example function calling all the others.
// NB: Int here is used as bool!
static int f(const char *obj, int *result) {
assert(result);
if (!k(obj))
return 0;
else
return (g(h(obj), result) == k(obj) + 1);
}
const char *input;
int result;
int main(void){
f(input, &result);
return (result == 4);
}
Figure 1a. Nonsensical program for illustration purposes.
section .text,"ax",@progbits
section .text,"ax",@progbits
public _main
_main:
ld iy, (_input)
ld a, (iy)
cp a, 120
jq nz, BB0_1
ld a, (iy + 1)
or a, a
jq nz, BB0_6
ld hl, 0
ld (_result), hl
ret
BB0_1:
ld hl, (_result)
ld de, 4
or a, a
sbc hl, de
jq z, BB0_2
ld a, 0
jq BB0_4
BB0_6:
BB0_7:
jq BB0_7
BB0_2:
ld a, 1
BB0_4:
and a, 1
ld l, a
ld h, 0
ret
section .text,"ax",@progbits
section .bss,"aw",@nobits
public _input
_input:
rb 2
section .bss,"aw",@nobits
public _result
_result:
rb 2
Figure 1b. Generated ASM in the northern hemisphere.
Pretty sweet: there is no function call at all! Since they are all local, they were inlined in the main block. Also, the assert was removed since the compiler could figure it would never be triggered (static analysis, dead branches eliminations).
Very important conclusion, that’s the tl;dr of the article, I’m putting it in a box (rather, Toms will do that, he is Black Belt 3rd dan in WordPress and ShoulderPress) (NDtoms: holy shit! I have been downgraded!):
- Make your local functions (the helper ones you only use in one given file) `static`.
- Make your small shared functions inline-able (e.g. make them static as well, put the body in the header). The code will be duplicated, yet N times an inlined function might be still shorter than the mess generated for a global call.
Let’s turn f into a “global” function to see the difference and make fun of the Nicaraguans. Simply remove the static keyword for this particular function. Now, we get:
section .text,"ax",@progbits
section .text,"ax",@progbits
public _f
_f:
call __frameset0
ld l, (ix + 6)
ld h, (ix + 7)
add hl, bc
or a, a
sbc hl, bc
jq nz, BB0_2
ld iy, L_.str
ld de, L_.str.1
ld bc, 29
ld hl, L___PRETTY_FUNCTION__.f
push hl
push bc
push de
push iy
call ___assert_fail
pop hl
pop hl
pop hl
pop hl
BB0_2:
push hl
ld l, (ix + 4)
ld h, (ix + 5)
ex (sp), hl
pop iy
ld de, 0
ld a, (iy)
cp a, 120
jq nz, BB0_7
ld a, (iy + 1)
or a, a
jq nz, BB0_4
ld (hl), e
inc hl
ld (hl), d
BB0_7:
ex de, hl
pop ix
ret
BB0_4:
BB0_5:
jq BB0_5
section .text,"ax",@progbits
section .text,"ax",@progbits
public _main
_main:
ld hl, _result
ld de, (_input)
push hl
push de
call _f
pop hl
pop hl
ld hl, (_result)
ld de, 4
or a, a
sbc hl, de
jq z, BB1_1
ld a, 0
jq BB1_3
BB1_1:
ld a, 1
BB1_3:
and a, 1
ld l, a
ld h, 0
ret
section .text,"ax",@progbits
section .rodata,"a",@progbits
private L_.str
L_.str:
db "result",000o
section .rodata,"a",@progbits
private L_.str.1
L_.str.1:
db "test_global.c",000o
section .rodata,"a",@progbits
private L___PRETTY_FUNCTION__.f
L___PRETTY_FUNCTION__.f:
db "int f(const char *, int *)",000o
section .bss,"aw",@nobits
public _input
_input:
rb 2
section .bss,"aw",@nobits
public _result
_result:
rb 2
This time:
_mainpassesresultpointer andinput(note the difference on how they are fetched) through the stack.__frameset0sets IX up at stack position.- Once the routine returns, the
pops after the call place SP as before.
Homework for next time:
- How to force a simpler calling convention globally?
- How to force the use of HL (potentially with INC) rather than copying it in IY?
- What the heck with the infinite loop in BB0_5 and the rest of the code?
For the curious, the Z80 calling conventions (included passing by registers) are defined here.