Usually, too much emphasis is placed on the need to produce fast or compact code, and not enough on their cleanliness. However, it’s probably more important for a beginner to learn how to code cleanly before optimizing his code in one way or another. Moreover, even with the best resolutions in the world, a code that is clean at first will tend to become dirty and disorganized as it develops, with laziness helping. Even an experienced programmer has to watch out for this.
If experienced coders have picked up some bad habits, it’s undoubtedly linked to the limitations of the first assemblers: no macros, difficulties in navigating between the declaration of a label and its mention, too rigid conditional assembly, comments taking up too much space in RAM, labels limited to 8 characters, etc.
Today, with Orgams (native-assembler reference) or Rasm (cross-assembler reference), we can do away with all this in order to produce readable, solid, configurable and well-structured code. The amateur code “à la Jacquie et Michel” is over: it’s time to make it all more professional!
In the first part of this article, we will start by seeing 4 important rules for coding properly in z80. For all the following, I will mainly use Orgams syntax.
Disclaimer! All the rules discussed here must be used thoughtfully (usage, efficiency) and not in a mechanical or extreme way, otherwise they would become counterproductive. No Dogmas!
Rule #1: Be Readable
A first simple rule is to be careful in the way you write your code, so that you can easily read and understand it again.
Often, a well-presented code is a code that highlights its structure. Here is an example of a poorly presented code, because we don’t understand how it is structured:
To make it clearer, the code must be indented. Put one tabulation to each new level in order to highlight labels, directives, and instructions (useless for Orgams users, which indents everything automatically your code):
You can also isolate routines by passing lines between them or by inserting separators with comments.
Another way to present your code well is to group several instructions together on the same line:
ld (hl),a : set 3,h ld (hl),a : set 4,h ld (hl),a : res 3,h ld (hl),a
This makes sense because the code is redundant and each line performs the same task. But a misuse of this technique would be to write:
bc26 ld bc,#800 : add hl,bc : jr nc,end_bc26 : ld bc,#c050 : add hl,bc
Here the line is overloaded and very difficult to read.
Rule #2: Avoid Hard Coding
A second rule is to never leave literal values (ex: #C000, 255, etc.) in the body of your source. It’s preferable, as far as possible, to refer to them indirectly through symbols declarations.
It’s an opportunity to make a clear distinction between:
- Constants or symbols (declaration with ‘=’ directive in Orgams, and ‘EQU’ directive in Rasm): their value is fixed once and for all when they are declared in the source, and then in the assembly process,
- Variables (declaration with ‘=’ directive in Rasm): their value can be modified in the source, and then in the assembly process. Orgams users can write for example:
16 ** [
BYTE # ; # is a variable here, counting from 0 to 16-1
To do the same, Rasm users can write something more explicit:
variable1 = 0REPEAT 16
DEFB variable1variable1 = variable1 + 1 REND
There are many advantages to avoiding hard coding:
- Flexibility: only one line to be modified to affect everything that refers to it, since there is now a dependency between all occurrences,
- Cleanliness: by grouping all statements at the beginning of your source, you have an overview of them and avoid possible conflicts.
- Understanding: gives a clearer meaning to the values being manipulated (‘screenAddress’ makes more sense than #8000).
A first example of what to avoid, here to copy an image from a bank to the screen:
The problem is that everything is local here: when your program will be longer, this code will be drowned in hundreds of other lines, and it will become difficult and risky to modify it. So you need to put all these explicit data in global source header declarations, alongside all the others, to get a better overview of what your source does.
In order to avoid the risk of errors and to make the code more flexible, it’s much preferable to do:
bnk_image = #c4
org_image = #4000
des_image = #c000
lng_image = #4000
exe_main_code = #2000
So, if you want to store your image in bank #c5, you will only have one line to modify, at the beginning of the source, which will be next to the other bank declarations (otherwise, you might be tempted to put it in #c6, forgetting that later in the code you already use it for music, for example!). In the same way, if you want to change the origin, the destination, or the length of the image, or the address where the code to be executed is located, everything is in the same place.
But there is a better solution: place all these datas in the header of the image, when you save it (except of course the destination address). This way, you avoid manipulating any data directly: your code becomes generic.
Second example of common clumsiness: when reading a table, a value is often put to signify that it’s finished (for example, to make the text of a scrolling loop back). It sometimes looks like:
text BYTE "Beb likes to eat chestnuts.",255
When you read ‘255’, you know you have to loop back. But this requires further a routine like this:
The problem here is that the value ‘255’ is mentioned twice without being linked. It would therefore be preferable to do:
EOF = 255 text BYTE "Beb likes to eat chestnuts.",EOF ... ld a,(hl) cp EOF jr nz,no_loop_text
Third example, provided by Golem. One exposes oneself to the same problems when one is tempted to write the classic:
label ld a,0
The ‘+n’ (here ‘+1’) is still hard code: if, later, you replace register A with IXL, for example, you will have to put ‘+2’, which could mean changing all the lines where you refer to this label. To avoid this problem, you may prefer:
label = $+1
The mention of the ‘+n’ is only made in one place (easier to modify), and it is local, i.e. close to the address actually concerned (less risks of errors). Some purists will go further and write (safer but less readable):
ld a,0 label = $-1 ... ld (label),a
Fourth example. Small hard shifts for jumps, such as JP address+3, should be avoided as much as possible. A rare exception to the rule would be:
jr nc,$-3 ; Go to 'in a,(c)'
In this case, it’s very local and the code has little reason to change. But we will avoid any hard code for event counters, the height of your different splittings, etc. For example, you can sometimes clumsily initialize the palette like this:
data_ini_GA BYTE 0,84,1,68,2,85,3,92,4,88,5,93,6,76,7,69 BYTE 8,77,9,86,10,70,11,83,12,64,13,71,14,78,15,75 ... ld hl,data_ini_GA ld b,16*2 call ini_GA ; send B datas stored in HL to the Gate Array
A safer version would be:
data_ini_GA BYTE 0,84,1,68,2,85,3,92,4,88,5,93,6,76,7,69 BYTE 8,77,9,86,10,70,11,83,12,64,13,71,14,78,15,75data_ini_GA_
... ld hl,data_ini_GA ld b,data_ini_GA_-data_ini_GA call ini_GA ; send B datas stored in HL to the Gate Array
This way you can remove several couples from the list and the routine will automatically adapt to the assembly. It’s dynamic, not static coding.
We could multiply the examples ad infinitum, but I think you get the point!
Rule #3: Macros are Your Friends
A macro is a small code that is usually declared at the beginning of the source, and which, on assembly, is injected into the source where it is mentioned. It can takes parameters or not.
There are many advantages to using them:
- Legibility: it visually occupies only one line instead of a code section of several lines,
- Flexibility, again: it makes it possible to factorise identical pieces of code and therefore to modify all of them by modifying only the declaration,
- Flexibility, ever: macros are configurable, and are therefore similar to functions that take arguments and produce a specific code each time.
Take, for example, a simple routine addressing the CRTC:
It’s preferable to declare at the beginning of the source a generic macro taking the register number and the value to be sent as a parameter:
; *** Send m value on the n CRTC register *** ; - Modified: bc MACRO set_CRTC n,m ld bc,#bc00+n out (c),c ld bc,#bd00+m out (c),c ENDM
And you will be able to write in your code:
During assembly, the assembler will inject your macro according to your parameters. But sometimes, we are tempted to do otherwise: relocate the contents of the macro in a subroutine, to gain RAM since it will not be copied each time. And instead of the macro, you then write:
Here, too, a macro is needed, and it’s better to do:
MACRO call_set_CRTC n,m
Technically, your macro will be replaced by ‘load’ and ‘call’, but if later you want to change the way you address the CRTC (example: use DE instead of HL), you will only have to change your macro in one place.
Another possible use: you often need to wait for a specific number of NOPs or lines. DEFS can do the job, and when it comes to lines, small loops containing DEFS are also suitable. But the solution for elegant coders is more oriented towards:
; *** Wait n NOPs ***
; - Modified: b, z flag
MACRO nops n
IF n-1 AND &ffc
ld b,n-1 /4 ; n > 4
FILL n-1 MOD 4,0
FILL n,0 ; n ≤ 4
; *** Wait n lines (n*64 NOPs) ***
; - Modified: a, bc, z flag
MACRO lines n
ld bc,n*8 -1
Note that these macros modify registers and flags. Thus, it will be enough to write:
nops(32) ; wait 32 NOPs
lines(52) ; wait 52 lines
A last thing. If your favorite assembler does not recognize an instruction, avoid the porky version of writing:
DEFB #ed,#71 ; out (c),0
DEFB #ed,#71 ; out (c),0
And when you need it, just type:
As with the first rule, there would be an infinite number of examples to give, but the primary goal is to understand the general idea and apply it to your specific cases.
Rule #4: Comments are Welcome
It’s often said that a clean code must be an explicit code: by reading it, one is easily able to tell what it does and how it does it. Consequently, in high-level languages, comments are sometimes criticised because they make explicit what is only implicit in the code. And the best thing would be to rewrite a more explicit code, not to comment on an implicit one.
In assembler, our problem is quite different: our instructions only make sense once they have been grouped together in packages, and it’s often useful to indicate in a commentary what this or that package does to avoid having to go through it in detail. A clean assembler code is therefore a well-commented code.
A vestige of a time that Batman Group fans didn’t know, before the arrival of Orgams and the X-MEM (or cross-dev), coders tended to regularly delete their comments, and end up not making any more, to gain space in their memory. Sources could not exceed a certain length, and comments took up a lot of space.
But not all comments are welcome: the absence as well as the excess of comments is detrimental to the cleanliness of the code. Moreover, there are several types of comments, which can be divided into at least three categories, depending on where they are located: next to a set of instructions (very local), at the beginning of a subroutine (local), or at the beginning of a source (global).
First of all, very local comments, which explain what a particular line or small portion of code does. It’s necessary here to banish any comment overload, reassuring for beginners, but harmful in the long run. For example, you should avoid:
ld e,(hl) ; get LSB of sprite address
inc l ; next byte
ld d,(hl) ; get MSB of sprite address
inc l ; next byte
It’s immediately obvious that the comments are useless because they are redundant with an already explicit code: it’s obvious that ‘inc l’ allows you to move to the next byte, and that registers E and D are used to form 16-bit data. So prefer the sober:
ld d,(hl) ; DE = sprite address
It’s clear and it helps to read again. A bad idea here would be to introduce a macro with an explicit name like ‘get_sprite_address()’ for so little.
Then, more global comments, which are at the top of the main subroutines. It’s a good idea to include some essential information: what the routine does, how long it takes, the parameters it asks for, the registers it modifies, various remarks, ways of improvement. For example:
; *** Collision detection ***
; Input: - HL : first sprite datas (X1,Y1,dX1,dY1)
; - DE : second sprite datas (X2,Y2,dX2,dY2)
; Output: - Carry = 1 if collision
; CPU: 2 rl + 32 us (max)
; TODO: - ...
Once again, your source will suddenly make more lines, but you will save a lot of time in the medium term, it will be more ventilated, and you will be able to export your routines to other sources more easily.
Finally, particular care can be taken with the header of the main source, specifying the different versions, the TODOs, which is updated each time, known bugs, etc. Another original idea with Orgams, as Madram recommends, is to insert a small ‘Table of Contents’ to quickly access to the different subroutines via a simple CTRL+ENTER at the desired location (return via CTRL+RETURN). It could look like:
; *** Table of Contents ***
; - start_code
; - main_loop
; - exe_RVI
; - exe_player
; - ...
This makes it easy to navigate through the subroutines and keep an overview of the source. This time the comments have been deviated from their original function.
… To be continued!
[Updated on July 11, 2021]
Thanks to Golem13 and Grim for their precious suggestions following the proofreading of a first version of this article.