Some thoughts for modern assembler in Amstrad world

by Krusty / Benediction.

In this article, we list some interesting features of modern z80 assemblers and discuss of what would be interesting for the mainstream assemblers for Amstrad CPC.

It was so nice before the crash of mir station

Decades ago maxam 1 and dams 2 were the mostly used assemblers (at least in the French demoscene). Both were very good but limited to features of that period. Nowadays, there are plenty of modern choices for native progamming (orgams3) or cross-developping (basm4, fasmg5, glass6, pasmo 7,rasm8, vasm 9, sjasmplus10, SpectNetIde11, uz8012, winape13,wla-dx 14, zasm15, …). Today, I guess the mainstream ones are orgams for native coding and rasm for cross-development. As long as no directive is used, they are mainly compatible (beware of the precedence of operators), however once you have to use a single directive you take the risk to write non-compatible z80 code. Of course, each assembler also has its own limitations.

Here are some examples of nice features that seem to be supported by different subsets of assemblers:

  • Fake instructions (e.g. rasm, sjasmplus) for ease of writing (e.g. push bc,de,hl ; ld hl,bc) that are straigthforwardly converted to appropriate list of real instrutions. Look at the interesting list for sjasmplus.
  • Possibility to execute code from a high level language that produces ASM code or binary data (e.g. lua for sjamsplus). Such ability allows more possibilities than macro ones, especially when generating data for a demo effect.
  • Complete handling of structures to create data with fixed fields and easily obtain relative position of fields (e.g. vasm).
  • One line notation for simple code repetition (e.g. orgams)
  • Support of any emulator or real CPC artifacts such as snapshots, AMSDOS files or image discs (e.g. rasm).

Some features, such as macros, are quite common in the assemblers. However, their definition and/or use vary a lot among the assemblers. This is probably the most problematic issue when switching from one assembler to another one (both in term of file conversion, or command memorization). It is a bit sad, but we can live with that. Of course, most assemblers alias the simple directives so we can find the most commonly accepted directive notation (e.g. DB, DEFB, DEFM, BYTE), but the complex directives can have really different syntax. For example, here is a macro definition and use from various assemblers:

    ; glass
    SET_CRTC: macro ?reg, ?val
        ld bc, 0xbc00 + ?reg
        out (c), c
        ld bc, 0xbd00 + ?val
        out (c), c
    endm
    SET_CRTC 4, 0
    
    ; fasmg
    macro SET_CRTC reg, val
        ld bc, 0xbc00 + reg
        out (c), c
        ld bc, 0xbd00 + val
        out (c), c
    END MACRO 
    SET_CRTC(4, 0)
    
    ; orgams
    macro SET_CRTC reg, val
        ld bc, 0xbc00 + reg
        out (c), c
        ld bc, 0xbd00 + val
        out (c), c
    endm
    SET_CRTC(4, 0)
    
    ; Pyradev
    SET_CRTC: macro #REG, #VAL
        ld bc, 0xbc00 + #REG
        out (c), c
        ld bc, 0xbd00 + #VAL
        out (c), c
    endm
    
    ; rasm
    macro SET_CRTC reg, val
        ld bc, 0xbc00 + {reg}
        out (c), c
        ld bc, 0xbd00 + {val}
        out (c), c
    mend
    SET_CRTC 4, 0
    
    ; SpectNetIde
    SET_CRTC:
    .macro(reg, val)
        ld bc, 0xbc00 + {{reg}}
        out (c), c
        ld bc, 0xbd00 + {{val}}
        out (c), c
    .endm
    SET_CRTC 4, 0
    
    ; vasm
    macro SET_CRTC reg, val
        ld bc, 0xbc00 + \reg
        out (c), c
        ld bc, 0xbd00 + \val
        out (c), c
    endm
    SET_CRTC 4, 0
    
    ; wla-dx, sjasmplus
    .macro SET_CRTC reg, val
        ld bc, 0xbc00 + reg
        out (c), c
        ld bc, 0xbd00 + val
        out (c), c
    .endm
    SET_CRTC 4, 0
    
    ; zasm (one of its syntax)
    .macro SET_CRTC &reg, &val
        ld bc, 0xbc00 + &reg
        out (c), c
        ld bc, 0xbd00 + &val
        out (c), c
    .endm
    SET_CRTC 4, 0

Despite their strengths and qualities, there is a big lack of innovation within these assemblers: features of today are not more original than those of assemblers written decades ago. Very few features are innovative and focus demo-coding constraints. The mainstream assemblers are far behind what can be done for other machines such as KickAssembler16 cross-assembler for c64 or Madass17 for Atari XL.

Wake up !

The generalization of top-notch features present on only few assembler is a bit a necessity (structures handling for example) but is not revolutionnary neither. I dream of additional directives and functionalities that could ease at least demo-coding, but can still be useful for any other kind of application programing. Here is a list of points that seem interesting to discuss (some of them are already implemented by existing assemblers but seem to not be well known).

Nops counting

Demo coding often needs stable code for a perfect synchronization with the monitor beamer. The main rule of thumb on CPC is: you have 64 NOPs to handle the current line and you have 312 lines (not all visible) in a frame. Back in the time, it was up to the coder to count the number of NOPs taken by the instructions of some routines. Is it really necessary in 2021 to remember the timing of each instruction? Or to refer to public tables? rasm provides some facilities with the ticker directive that stores in a variable the number of NOPs to execute a bunch of instructions:

    ticker start, teffect
        ld a, (ix+0) : inc ix
        add (iy+0) : dec iy
        ld l, a
        ld a, (hl) : out (c), a
        inc b : inc c : out (c), c
        inc h : ld a, (hl) : out (c), a
    ticker stop, teffect
    defs 64-teffect

This is a really nice start, but it is not directly usable when an instruction has a varying duration:

    ticker start, teffect
        ld bc, 3
        ldir
    ticker stop, teffect
    defs 64-teffect

Here ldir corresponds to 3 ldi and 2 jumps. In such case, the assembler should be a bit clever to treat such corner cases and properly count the number of nops. I do not think it is mandatory to include a z80 emulator within the assembler, but to do so could ease the counting. It becomes even more complex when some information are not provided to the directive but necessary for the contained instructions:

    ticker start, teffect
        ldir ; what is the number of ldi?
    ticker stop, teffect
    defs 64-teffect

In this example, it is necessary to provide the value of BC for a correct count. Such syntax could guide if it is not possible to get contextual information from previsously assembled instructions:

    ticker start, label
        ticker set, BC=3
        ldir
    ticker stop, label

I do not know if it is really necessary to dig deeper in the complexity of such technique, but we could specify the execution based on available information during computation:

...
data_table
    db 4, 1, 6
...
    ticker  start, label
        ticker set, BC=memory(data_table+2)
        ldir
    ticker stop, label

If you are interested by the duration of a simple opcode, a simpler syntax could be based on the use of a dedicated function:

    ld l, nops(inc de)

Although it is not documented at the moment of writting this document, rasm provides a similar facility with getnop (the code is however embedded in a string).

From nop counting to automatic code stabilization

Even if the assembler would be able to count the NOPs in complex cases, it still remains complicated to write a real algorithm with a conditional execution that needs to remain stable. Indeed, it is totally anti-ergonomic to add such ticker directive everywhere, and I’m pretty sure any coder would continue to manually count NOPs to write stable but easy to read code.

Maybe some facilities have to be provided by the assembler to reduce at maximum manual NOPs counting and compensation by the user and automate such thing. It means the assembler should have the ability to track all the possible execution paths of a conditional algorithm, count the number of NOPs of each sub-path and infer the amount of NOPs to inject. I guess it is a really complicated task to solve and the feature should be limited to code respecting some constraints (no cycle in the graph of the possible execution paths).

For example, this piece of code extracted from a player of Arkos Tracker 218:

PLY_AKYst_RRB_IS_NoSoftwareNoHardware:           ;50 cycles

        ;For all the IS/NIS subcodes to spend the same amount of time
        ;ds PLY_AKYst_NOP_LongestInState - 50, 0
        ld d,32                                  ;Waits for 182 - 50 = 132 cycles
        dec d
        jr nz,$ - 1
        cp (hl)
        nop

        ;No software no hardware
        rra                                      ;Noise?
        jr c,PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise
        jr $ + 2                                 ;Waits for 8 cycles
        jr $ + 2
        cp (hl)
        jr PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise:
        ;There is a noise. Reads it
        ld de,PLY_AKYst_PsgRegister6
        ldi                                      ;Safe for B, C is not 0. Preserves A

        ;Opens the noise channel
        res PLY_AKYst_RRB_NoiseChannelBit, b
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End:

could be written this way to let the assembler automatically compute the amount of time and produce a stable version (the duration is stored in the constant provided as an argument):

        STABLIZE start, PLY_AKYst_RRB_IS_NoSoftwareNoHardware_duration
PLY_AKYst_RRB_IS_NoSoftwareNoHardware:          

        ;For all the IS/NIS subcodes to spend the same amount of time
        ;ds PLY_AKYst_NOP_LongestInState - 50, 0
        ld d,32                                  ;Waits for 182 - 50 = 132 cycles
        dec d
        jr nz,$ - 1
        cp (hl)
        ; XXX automatic wait will be put here

        ;No software no hardware
        rra                                      ;Noise?
        jr c,PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise
        ; XXX automatic wait will be put here
        cp (hl)
        jr PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise:
        ;There is a noise. Reads it
        ld de,PLY_AKYst_PsgRegister6
        ldi                                      ;Safe for B, C is not 0. Preserves A

        ;Opens the noise channel
        res PLY_AKYst_RRB_NoiseChannelBit, b
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End:
        STABILIZE stop

The constraint is to have all the information self contained inside the directive. As for ticker, it should be possible to specify some register values.

Also, it could be possible to force a specific duration as indicated in the comment of the original source. Here instead of providing an undefined label to store the result, an expression (that can be evaluated direclty) would be provided:

        STABLIZE start 50
PLY_AKYst_RRB_IS_NoSoftwareNoHardware:          

        STABILIZE here ; we can put the wait loop here

        ;No software no hardware
        rra                                      ;Noise?
        jr c,PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise
        ; automatic wait will be put here
        cp (hl)
        jr PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise:
        ;There is a noise. Reads it
        ld de,PLY_AKYst_PsgRegister6
        ldi                                      ;Safe for B, C is not 0. Preserves A

        ;Opens the noise channel
        res PLY_AKYst_RRB_NoiseChannelBit, b
PLY_AKYst_RRB_NIS_NoSoftwareNoHardware_ReadNoise_End:
        STABILIZE stop

Easier self-modifications

Using self-modified code is quite standard in demo-coding. For example a standard rotozoom routine consists in tracing a line in a texture to retreive pixels and draw them on screen. Whatever is the rotation angle, the routine to call is exactly the same (there is always the same number of pixels on screen), but the line slope parameters are different. One can manage this difference by patching the instructions that handle it. Let say these instructions could be inc/dec for d/e or nop, the patching code could look like:

OP_INC_E equ 0x1c
OP_DEC_E equ 0x1d
OP_NOP   equ 0x00

    ld a, OP_INC_E
    ...
    ld (roto_pixel1), a

If you do not want to maintain these instruction/value mapping, you can use this file. However it explodes the number of labels to process, is not self explanatory, and surely does not match your naming convention.

A better syntax would be to explicitely express to the assembler that we want the byte that represents a given opcode:

    ld a, opcode(inc e)
    ...
    ld (roto_pixel1), a 

multi-bytes opcodes where only one of them is of interest could by handled like:

    ld a, byte1(opcode(inc ix)) ; get the byte that differs between inc ix and dec ix

Here I indicate byte1, because I am not sure that all assemblers are able to handle 4 bytes number directly.

Maybe it could be interesting to know the size of an opcode:

    macro BUILD_STUFF opcode
        if len(opcode({opcode})) == 1 
        then
            print "Do my stuff for one byte opcode"
        else
            print "Do my stuff for multibyte opcode"
        endif
    endm

There is also something uncomfortable with the labelling of addresses where to patch data. There is currently two common ways of doing that:

; version 1
my_data equ $+1
    ld a, 0xff

; version 2
    ld a, 0xff
my_data equ $-1

Version 1 is probably easier to read as we expect a label to be present before the instruction of interest, whereas version 2 is less error prone because there is no need to change it when the instruction will be changed to ld ixl, 0xff that takes 3 bytes and expect version 1 to take into account this extra byte. However none of these notation is satisfactory. A better readable code would be:

; version 3
my_data 
    ld a, 0xff

with a little something when using the address to specify we are interested by the data byte and not the instruction one such as:

    ld (opcode_arg1(my_data)), a

Code optimization and rewritting

The drawback when using macros and repetitions is that the generated code is often larger or slower than the one we would write by ourselves. Let say that we use several times the previously defined macro SET_CRTC:

    SET_CRTC 4, 0
    SET_CRTC 9, 0
    SET_CRTC 7, 255

the generated code would be:

    ld bc, 0xbc00 + 4
    out (c), c
    ld bc, 0xbd00 + 0
    out (c), c
    ld bc, 0xbc00 + 9
    out (c), c
    ld bc, 0xbd00 + 0
    out (c), c
    ld bc, 0xbc00 + 7
    out (c), c
    ld bc, 0xbd00 + 255
    out (c), c

whereas a version that is faster BUT modifies some registers would be:

    ld hl, 0x0409
    ld de, 0x07FF
    ld bc, 0xbc00
    out (c), h
    inc b
    out (c), c
    dec b 
    out (c), l
    inc b 
    out (c), c
    dec b 
    out (c), d
    inc b
    out (c), e  

A possible directive could take the form:

    rewrite start
        rewrite strategy, SPEED    ; Focus rewriting strategy on execution speed
                                   ; Other values could be SIZE or COMPRESS
        rewrite set, AF=10         ; specify the current value of some registers
        rewrite modifiable, hl, de ; specify the registers that can be modified
        
        SET_CRTC 4, 0
        SET_CRTC 9, 0
        SET_CRTC 7, 255
    rewrite stop    

If such kind of directive is adopted, we could even go further by allowing the user to implement his own strategy of rewritting that fits best his use cases. For portability, the strategy definition should be expressed in a Domain Specific Langage that could be understood by any assemblers.

One can argue that it is up to the user to provide different macros or write tests not to include unneeded code at the end of loops (you know the incrementation/decrementation at the end of each repetition not useful for the latest one). Sure, but as soon as we trust what is produced by the rewritten code, it is simpler to maintain a smaller set of macros or not to test end of loops to remain simple.

MDL (https://github.com/santiontanon/mdlz80optimizer) seems to be an interesting project to provide hints on code that could be optimized.

Hygienic macros?

rasm and orgams have a totally different way of handling the expansion of their macros parameters. orgams is more hygienic and considers that everything that happens here must be kept secret (except the z80 opcode generated, of course) while rasm (and most other assemblers) does not care of that. As a consequence, orgams macros cannot generate labels accessible from elsewhere than the macro, while rasm allows to do anything including any side effets related to existing variable manipulations. Both approaches are problematic:

  • orgams does not allow to generate labels that could be used elsewhere and there is no workaround for this issue. For example here is an extract of code assembled by my own assembler basm (rasm must also assemble it too)  that cannot be simulated by orgams (this is a macro dedicated to the production of labels and not code of my future demo):
  macro BUILD_JUMPBLOCK_LABEL, label

    switch {pageset}$
        case 0xc0
{label}_from_main
            break

        case 0xc2
{label}_from_extra
            break

        default
            fail "Impossible memory configuration"

        endswitch

  endm
  • rasm can generate the same label several times (which implies an assembling error), but it is possible to prefix inner macro labels with @ to force them to be accessible only from the macro.

Maybe the expected behavior of a macro should be discussed with users to standardize some points:

  • do labels must be exported outside of the macro (I tend to favor rasm approach)?
  • do constants must be exported outside of the macro (I would say it should be similar to the labels)?
  • do macros have access to information (i.e. symbol table) other than the one presents in their arguments? (I would say yes, but maybe someone can argue against that)
  • I’m pretty sure there are other questions to treat and discuss

Functions definition

Macros produce z80 code, wheras functions would generate values that can be used in any expression. Such facility would ease data and code generation with reducing the need of external tools. Of course, it will also ease coding by reducing code duplication.

A low cost implementation could be based on hygyenic macro-like syntax associated to some directives:

    ;;
    ; The function `name` takes 2 arguments arg1 and arg2, 
    ; uses a local variable
    ; and returns a value (the sum of the two arguments)
    FUNC name, arg1, arg2, arg3

        IF arg3 > 0
                  local1 = arg1 + arg2
        ELSE
                  local1 = arg1 - arg2
        ENDIF
        
        return local1
        
    ENDFUNC
    
    ; Use the function name
    ld a, name(exp1, exp2, exp3)

A prrof of concept has been successfully implemented in basm.

However, it is not really nice to read (KickAssembler relies on a nicer syntax19) and cannot rely on functionalities available on the host machine. Crossdev assemblers should rely on a standard high level language such as sjamsplus with lua to improve the function expressiveness.

To be really useful, lots of utility functions shoud be embedded within the assembler (e.g. pixel manipulation, address computing, any chips flags, …) However, we have to take care to standardize them to increase source code compability (and usefulness of functions).

Libraries handling

Often, code (any z80 code, macros or functions) written for one production is useful for another one. On my side, I am always reusing code related to debugging, wait loops or Gate Array color values. In a high level language, we can create libraries, use some functions from them and let the compiler/linker remove those that are not used.

In z80, it is not possible to follow the same path as most assemblers are not accompagnied by a linker (maybe vasm would allow that). The only possibility is to include the full content of a file. It means that all z80 code of a library file would be assembled even if never called.

Some assemblers partially solves the issue of unneeded code inclusion by providing the directive IFUSED:

   IFUSED my_function
my_function
    ...
    ret
    ENDIF

It works only (at least for rasm) if this code is assembled AFTER potential use of my_function, which is quite limiting (basm does not suffer of this issue).

Another issue when including more and more files is the higher probability of having labels clashes.

orgams partially solves this with the IMPORT directive that adds a supplementary indirection on the label values:

    import lib.o
    ...
    call lib.my_function

This approach is quite limitating for cross-assemblers where paths can be more complex. In basm we have chosen this approach

    include "here/there/lib.asm" namespace "lib"
    call lib.my_function

Sof far of my knowledge, sadly no assembler provides a complete solution to solve both issues. I guess an ideal solution would be a mix of these two worlds:

; file: lib.asm

    unit start, my_function
    ...
plenty
of
labels
    ret
    unit stop
    
    unit start, my_function2
    ...
    unit stop
    
; file: main.asm

    import lib
    ...
    call lib.my_function
    
    ifdef lib.plenty
        error "labels inside a unit are not exported"
    endif
    
    ifdef lib.my_function2
        error "my_function2 is not used and must not be accessible"
    endif

The import can be done anwhere we want to include the corresponding code, even before the use of its units (that can imply more passes for some assemblers). The labels inside a unit remain private and inaccessible (maybe an export directive could be used to manually export targetted labels).

It can also be interesting to have an import directive that lets the assembler choosing where to inject the appropriate code.

Public libraries

By the way, there is a high probability that some libraries (both in the sense of bunches of z80 units, macros and functions) have an interest for any one. For example, wait macros, vsync macros, amsdos macros, pixel functions and so on. It would be great if some common subsets would be integrated in each assembler or if a repository of libraries (with the accompanying tooling to retreive them, format them to be compatible with the targetted assembler) would be accessible.

A dedicated directive (innerimport or innerinclude) would be used to import these file that are not supposed to be present in your project source code. Think about rasm for exemple: it allows to compress files and provide the z80 macros to uncrompress them; however you have to manually include the file containing this source code. If next rasm version fixes both the C cruncher and the Z80 decruncher, you have to manually update your decruncher file (and the probability to forget that is very high). Such directive would cancel this error as the z80 decrunch code would be embeded in rasm executable and provided on demand.

Basic file generation

The Amstrad CPC OS gives a direct access to a Locomotive Basic interpreter. Programs are loaded thanks to its RUN command: Basic programs are loaded and start transparently, whereas binary programs are loaded and start after the screen is cleared and various ROM configurations are changed. This is a problem when you want your demo (that is a binary program) starts by doing a neat transition from basic to its first effect. The current solution consists in creating a Basic program that embeds the binary program and automatically launches it. The best is to have several lines of the Basic program hidden to keep only comments of interest (we can also imagine some comments art here).

However, this hybrid basic program has to be manually created (either with using file manipulation and modification, or using DEFB directive to include Basic tokens at the beginning of the file without forgetting to change the header of the file). A modern assembler should be able to directly generate such kind of file. Two possibilities:

  1. Use a directive to tag a place where any Basic command can be written (user has the control on anything)
  2. Provide a macro that uses a template bootstrap code (similar to BasicUpstart20: in KickAssembler)

In the first case, syntax could look like:

    locomotive start, code ; Start the Basic part and provide a list of labels to import
    HIDE_LINES 20
10 ' Hello world
20 call {code}
    locomotive stop

code
    ld a, 7
    call 0xbb5a
    jp $
    locomotive stop

to generate a 2 lines program where the second line is not visible when doing a list from the Basic interpreter. basm assembler has a minimal support of that.

In the second case, it could be

    BOOTSTRAP_BASIC code
code
    ld a, 7
    call 0xbb5a
    jp $

that can be easily simulated with a macro:

  macro BOOSTRAP_BASIC code
    locomotive start, code ; Start the basic part and provide a list of labels to import
    HIDE_LINES 10
20 call {{code}}
    locomotive stop
  endm

Linting

Most high level programming languages are accompanied of linters to detect non respect of programming standard, bad programming patterns, or potential errors in the program. Although they are often used independantly of the compilation procedure, some compilers can be configured to warn messages of the linter when compiling the program.

Why not providing such feature in the asm world? Here, the linter could be used to detect bad programming patterns that to be replaced by other more efficient ones or correct ones proposed by the assembler.

The linting rules should be defined with the community, numbered and activable/deactivable on demand through a dedicated directive or comment. Some easy examples can be:

  • xor a instead of ld a, 0
  • jr instead of jp
  • labels too short or too similar of another one
  • a sequence of instructions that could be replaced by another one
  • ambiguous code for permissive assemblers that could be treated in the wrong way (eg. label/macro without argument in rasm)

High level programming inclusion

sjasmplus allows to use lua to generate z80 code: such specificity is quite nice and could be extended to all other assemblers to allow user to do more complex things that what is allowed in macros. However, this is not the purpose of this section ;) What interests us here is to embed high level language that has to be assembled in z80 and merged with z80 code. We want to reverse the current process of embedding asm in C code to a process of embedding C code in asm. This gives more importance to the asm code than the high level code (that can be in another langage than C, as soon as it translate pretty well in z80).

MENU_WIDTH=20
MENU_HEIGHT=40

    highlevel start c
    void show_menu() {
        char *text = MENU_CHARS; 
        for (int j=0;j<MENU_HEIGHT;j++) {
            for (int i=0;i<MENU_WIDTH;j++) {
                print_char(*text);
                ++text;
            }
            print_char('\n');
        }
    }
    highlevel stop

    ...
    
MENU_CHARS 
    db "********************"
    db "                   *"
    db "       MENU        *"
        
         ...

launch_game
    call show_menu

Although the previous example is short and simple, it is probably not straightforward to implement. We have to differentiate external variables that come from the z80 from the variables that come from c. How are handled the parameters of print_char? Is it written in z80 or C?

A bunch of well written macros associated to a permissive assembler that allows to assemble at addresses that already contain code (good assemblers are supposed to forbid that) may reduce the need of using such high level langage inclusion. Look at this interesting project21 for example.

Standardization of asm documentation

This point is not realated to the assembler but stays quite important. Most languages provide tools to generate documentation from source code (think about doxygen, javadoc, or rustdoc). It would be nice to define some comment conventions to allow the generation of code documentation. Meta information extracted from the documentation could be useful to navigate in the sources (such as orgams support of label interaction with CTRL + Enter shortcut, but with more complex information provided in comments). We can observe that the visual studio code extension z80-macroasm22 is able to extract comments before labels definition and display them when hovering on code that uses them.

Debug facilities

Debugging z80 code is not that much easy. Most assemblers provide assertions to make some checks at assembling pass, but it is not enough. Unit tests (discussed in next section) can also be an additional help, but once again it is not sufficient.

In addition to the mandatory binary, assemblers should be able to generate additional information that could be used in a monitor (either executed on the real machine or embededded in an emulator) that makes use of this information to help in the debugging. We can think for example of:

  • label names accessible when executing step by step the code
  • conditional breakpoints setup at assembling phase. Only non conditional breakpoints are currently possible by using DB 0xed, 0xfd instructions in any assembler (but it takes 2 bytes and 2 NOPs) or by using a dedicated directive in cross-assemblers that only stores that in a dedicated chunck in snapshot. orgams on its side injects a RST 6 when you want a breakpoint.
  • watch points that stop execution when the value of an expression changes; in their simpler form, we can see them as extended breakpoints that look at the content of memory are a register. Although trivial to use in an emulated context, it could be more complicated to handle in orgams.
  • anything available in a monitor that could be provided by the assembler.
  • a real link between the generated binary and the original source code. On modern platform to compile a program with debug facilities includes lots of information in the binary to refer to the original source. Of course we cannot do the same in our case. However, there is no reason (at least in a cross-development context) to generate an extra binary file able to make the link between an address and the file it corresponds to.

It means that such information has to be standardized to be used on different contexts (a real monitor in the CPC, a snapshot with dedicated extra blocks properly handled by several emulators, a data file to load in an emulator when a programing is running). Of course, one can argue that ace, winape or cpcec are able to load some if these information, but it is focused only to labels and breakpoint support is quite limitated.

Unit testing

Most highlevel languages propose unit test functionalities to check the validity of implemented code. There is no reason not to propose such functionality for our asm code.

Of course, several z80 assemblers provide the ASSERT directive that could be used for unit-tests. But its usage is quite limited as it only evaluates expressions at assembling stage instead of the behavior of z80 code at execution stage!

Hopefully, it is possible to generate assertions executed at the execution stage instead of assembling phase23 at the cost of more instructions to execute. Such technique could be used to properly write unit tests and the assembler could provide a bunch of macros to generate them and tooling to generate unit test programs that execute the tests and print their success or failure.

One first step, would be to add functionalities to improve assertions that target generated code as it can be done with KickAssembler 24. Such assertions do not aim to check the behavior of executed z80 code but to check the produced code or data. This is really useful to check that generated code or data (by macros for example) respect some constraints and be aware of mistakes before it crashes on the real machine. For example:

codea_start
    ex, de, hl
codea_end

codeb_start
    db 0xeb
codeb_end


    ASSERT_EQ memory(codea_start, codea_end-codea_start), memory(codeb_start, codeb_end-codeb_start)
    ; it can become complicated to handle if the blocs are in different pages
    
    ASSERT_EQ assemble( ex de, hl), assemble( db 0xeb)

Another step, would be to add functionalities to generate test programs to run on the CPC that also check the content of registers:

    xor a 
    ASSERT_EQ 0, cpu(a)

    ...

    ld hl, 0xdeac
    inc l
    ld (0xbeef), hl
    ASSERT_EQ 0xdead, word(0xbeef), "INC L does not seem to work properly"

The final step would be related to the generation of the test suits from its description:

    include "multiplication.asm"
    include "effect.asm"

    test_suit start, "My pretty test suite"
        test start "xor a resets a"
            xor a 
            ASSERT_EQ 0, cpu(a)
        test stop

        test start "effect init"
            call effect_init
            ASSERT_EQ binary_or(memory(0xc000, 0xffff)), 0
        test stop
    test_suit stop

    test_suit start, "My other test suit"
        test start "16 bits multiplication"
            ld hl, 0x01
            ld de, 0x20
            call multiply_hl_de
            ASSERT_EQ cpu(ix), 0
            ASSERT_EQ cpu(hl), 0x20
        test stop
    test_suit stop

That could be displayed like that on the CPC (or properly integrated in the IDE for crossdev):

Running My pretty test suit
running 2 tests
test xor reset a ... ok
test effect init ... ok

test result: ok. 2 passed: 0 failed.

Running My other test suit
running 1 test
test 16 bits multiplication ... FAILED

failures:

---- test 16 bits multiplication ----
assertion failed: cpu(hl) == 0x20
 left: 0
right: 0x20

failures:
  test 16 bits multiplication 

test result: FAILED. 0 passed: 1 failed.

Probably the test results would have to be written on text files on disc and then shown in a text editor. Or they should be printed on the fly on screen with a pause when the screen is full.

Generating a CPM program would allow such feature to be runned on any z80 machine supporting this OS. However, it is not really used in the demoscene community. So choice is probably to generate platform specific test programs and specify clearly what is the memory paging, which memory zones are dedicated to the test routines or variables to ensure tested code does not interfer with it.

Of course, cross-dev IDE have to individually execute the tests in an emulator and identify errors quicker.

SpectNetIde 28 or zasm 29 are the most advanced Z80 assemblers for such kind of features. It is worth noting that c64 users can easily use such features 25,26,27 but orgams seems to provide a close functionnality: its runtine is probably a good start to standardize such feature.

It is interesting to note that Winape assembler allows to read the memory of previous assembling pass. We have reproduced that for basm too.

Ergonomics

Now we have listed several big features of interest, there are still minor features that could greatly increase the ergonomics of the assemblers. Here is a random list of them (once again some can be implemented in few assemblers, but they are still not considered as standard)

  • Multiline comments is a great addition. It can even help to quickly disable huge consecutive part of code without using conditional instruction that can be parsed by the assembler.
  • Better syntax compatibility between assemblers is exepected either by explictely importing file written for another assembler (e.g. decoding for binary format of dasm or orgams, or translating for others) or accepting the syntax of the others.
  • FILL-like directives could use an expression that relies on the order of the elements to be produced to avoid to write loop-base directives30.
  • Labels need to be unique, which is quite boring for non important labels. It is often possible to alleviate this issue using proximity labels that start with a dot, but it does not solve the problem when the code has lot of self-similarities and a label should be repeated several times. KickAssembler solve this issue using the multi-label ability31. This is also an interesting thing to use (although lots of attention has to be done when modifying code). orgams limits this issue by using hygienic imports.
  • Accept all combinations of $, &, 0x, # for hexadecimal format. & may be problematic when using binary expressions and is probably better not to keep.
  • Memory management should be easier; there are lots of interesting initiatives with different assemblers such as range, section or bank that should be democratized (rasm has a good support of memory organizing, and basm adds the ability to section its code).
  • Obtain the size of imported binary before compression. Something similar to data import of KickAssembler could be used:
// Load the file into the variable ’data’
.var data = LoadBinary("myDataFile")

// Dump the data to the memory
myData: .fill data.getSize(), data.get(i)
  • Better integration between cross-assembler and real machine. basm can send the generated snapshot directly on CPC using the M4 to verify in realtime on real hardware if program works. Any other assembler can do that easily; other hardware communication protocol can be added too. However there is still no possibility to do remote debugging (either by debugging in real time the program on the CPC from the PC or by debugging it from a snapshot automatically transfered from the CPC to the PC).

Conclusion

I have tried to list what I find interesting for the future of assembler in the context of z80 programming for Amstrad CPC. Of course I have forgotten lots of things you can think about: please share these ideas. If we are lucky few of these ideas could be implemented in some assemblers.

Big kisses to Roudoudou and Madram for the various exchanges that helped to improve this article.


  1. https://www.cpcwiki.eu/index.php/MAXAM↩︎
  2. https://github.com/pseguy/dams↩︎
  3. http://orgams.wikidot.com/↩︎
  4. https://github.com/cpcsdk/rust.cpclib↩︎
  5. https://flatassembler.net/docs.php?article=fasmg↩︎
  6. https://www.grauw.nl/projects/glass/↩︎
  7. https://pasmo.speccy.org/↩︎
  8. https://github.com/EdouardBERGE/rasm↩︎
  9. http://sun.hasenbraten.de/vasm/↩︎
  10. https://github.com/z00m128/sjasmplus↩︎
  11. https://dotneteer.github.io/spectnetide/documents/main-features.html↩︎
  12. http://cngsoft.no-ip.org/uz80.htm↩︎
  13. http://www.winape.net/↩︎
  14. https://wla-dx.readthedocs.io/en/latest/↩︎
  15. https://k1.spdns.de/Develop/Projects/zasm/Documentation/↩︎
  16. http://theweb.dk/KickAssembler/Main.html#frontpage↩︎
  17. http://mads.atari8.info/↩︎
  18. https://www.julien-nevo.com/arkostracker/↩︎
  19. http://www.theweb.dk/KickAssembler/webhelp/content/cpt_FunctionsAndMacros.html↩︎
  20. http://www.theweb.dk/KickAssembler/webhelp/content/ch14s02.html↩︎
  21. https://github.com/jhlagado/struct-z80.↩︎
  22. https://marketplace.visualstudio.com/items?itemName=mborik.z80-macroasm↩︎
  23. https://open.amstrad.info/2021/03/12/asserts-en-assembleur/↩︎
  24. http://theweb.dk/KickAssembler/webhelp/content/ch16s03.html↩︎
  25. https://64bites.com/64spec/↩︎
  26. https://bitbucket.org/Commocore/c64unit/src/master/↩︎
  27. https://github.com/barryw/sim6502↩︎
  28. https://dotneteer.github.io/spectnetide/documents/unit-testing-basics#article↩︎
  29. https://k1.spdns.de/Develop/Projects/zasm/Documentation/z39.htm↩︎
  30. http://www.theweb.dk/KickAssembler/webhelp/content/ch03s06.html↩︎
  31. http://www.theweb.dk/KickAssembler/webhelp/content/ch03s04.html↩︎