Sampling and digitracking

By Targhan / Arkos.

Hi all! Welcome to Demoni… Oh, it seems there’s been a change. This is a first! I am writing an article for Memory Full… Hicks asked me to talk about samples. Let’s have a broader view and also include Digitracking, that is, the art of playing samples on several channels.

Let’s start by dotting our i’s: don’t count on me to talk about theory and signal processing: these are rather unknown to me, and my results come most of the time of empirical experiments. Very often, subtlety is useless on CPC. One rule would be “brute force!”. Playing samples is not the thing our dear AY does best, first because of its poor 4 volume bits, second because of their logarithmic curve. Third, addressing the AY is a slow operation, due to the CPC architecture…

Samples

Let’s start at the beginning: terminology. What is a “Sample”? A “sample” is the atomic part of a sampled sound, that is to say in our context, a value. I will thus use the word “sound” to speak about the signal (the content of a WAV file for example), and “sample”, the sampled value. It can be 4, 8, 16, 24 or 32 bits depending on the hardware.

First, we have to feed the CPC with nice sounds. In most cases, they will come from external sources, such sounds from a MODule or WAVs found on the internet.

Our PSG can only output 4 bits per channel, we are thus limited by these 4 tiny bits. One advantage is that we could optimize the sound in memory: it is possible to encode two samples into one byte. Practically speaking, this is only useful for playing a raw sound. If you are digitracking, speed is critical, and knowing what nibble of the byte must be selected and played is too slow. However, nothing prevents you from storing the samples in this “compressed” way, and unpack them when your code is loaded (though I wouldn’t both: a good compressor will probably give you the same result!).

What sampling frequency? There again, my experience shows that a frequency between 8 and 12khz is enough on CPC. For the record (pun intended), I had tested 16khz sounds for the introduction music of Orion Prime: how disappointed was I when I figured it didn’t sound better than at 8khz! The same for the “moooo” from the CPC Meuuhting, played at 44khz, more for the technical challenge than for the sound quality: dividing its frequency by 4 wouldn’t have changed anything. To sum up, it is useless to have a high sampling frequency. To give you an example, if you want to play a sound at 20khz, you must play 20000 samples per second. But if your code only replays at 10khz, to play the sound in its original frequency, you will have to read one sample out of two. Morality: half of the samples are useless! Ideally, on CPC, have a replay frequency higher than the sampling frequency of the sounds, and all will be fine.

Signed or unsigned? Most modern formats use signed values. On CPC, this won’t be the case. With 4 bits at our disposal, 8 will be our “0”, our “middle”, with of course 0 being the lowest value, 15 the highest.

Should the sounds be processed to compensate the AT logarithmic curve? I have much experimented on which conversion curve was the best: Zik had even created, thanks to an oscilloscope, an accurate look-up table to convert an 8-bit linear curve to a 4-bit logarithmic curve. Strikingly, the conclusion to this was that the quality didn’t improve. Don’t spend cycles using a conversion table: directly send the samples to the AY (“no subtlety!”)!

Then, we must not forget our target: a small speaker, or low-end speakers at best. Don’t expect outputting a large spectrum of frequencies, especially in 4 bits. Low frequencies require power, which we don’t have. High frequencies may be kept, even boosted. There, no magical formula, only your tests will tell what frequencies should be amplified/reduced, but a simple low-cut (at 100hz for example) may be very effective. Ideally, do this on every sound separately to hear how it sounds afterwards. More on this later.

Last but not least. Here is THE secret to good-sounding samples on CPC: compression. Even though compression and over-compression is a real problem in nowadays music, it is a simple and effective solution to many sound problems on CPC. Indeed, try to convert a PC sound directly on CPC: the result will be mediocre at best. Why? Because the AY has no dynamic at all. 16 poor values against at least 65536 on PC. The original samples will almost never reach the limits (0 or 65535) but will concentrate in “the middle”. Raw conversion to 4 bits will generate bytes going from 6 to 9 at best. Not enough to sound like anything relevant. The solution is simple: take any audio editor, normalize the sound, then increase the volume of 120, 150, even 200%! Yes, it will saturate, but don’t worry. Make some tests to determine what is the best level for this particular sound. Note that this process can be done in Basic! Purists will be happy.

Hand job

Must all this processing be done by hand, slowly and painfully? If you want to. But you know I’m an efficient person. A cross-platform tool will help you doing all this, at the speed of light. As you save time, you can multiply the tests and find the best result. The software is a command-line tool called Sox. I won’t delve on how to use it, but here is an example I used to convert a percussion from Imperial Mahjong.

This creates a new WAV file with less basses and more trebles (please refer to the software manual to know the default frequencies), then amplifies the whole like crazy:

sox.exe samples/Bongo.wav generated/Bongo.wav bass -40 treble 26 gain 10

This creates a raw sound file (without header), in 8 bits, 8khz, mono, unsigned:

sox.exe generated/Bongo.wav -b 8 -c 1 -r 8000 -e unsigned-integer -t raw generated/Bongo.raw

Only the 8-bit to 4-bit conversion is not done by Sox: you will have to do it by yourself, for example via a C/Java/Python/whatever script. A simple integer division by 16 will be enough. You can also do it on CPC, which will allow you to have 8-bit sounds till the end: this would make possible the output to Digiblaster extension if you want!

ST samples

One question from a fellow attendee: how an Atari ST manages to play 8-bit sounds, whereas it has (almost) the same sound general as us? They use a trick that can not be properly done on CPC. The output signal is the sum of the volumes of the three channels. Thus, by cleverly adjusting their volume, they can reach a higher accuracy and get rid of the logarithmic curve. For example, the difference between the 14 and 15 volume is large. But if you set a volume of 1 on the second channel, with 14 on the first, you get a little more than 14, yet much less than 15. Better accuracy!

The ST architecture also allows to send all the registers to the PSG at once. Some guys have generated a look-up table converting an 8-bit value to 3 4-bit values to send to each channels. This table is built to get a near-linear 8-bit output.

On CPC, modifying the volume of the three channels is slow as hell and does not give any good result (Crown has already experimented this (with two channels) in his ProTracker: use the “mode 2” in the software, and try to hear the difference…!). Would it work on a CPC? Let’s say the channel 1 is the main volume, and channel 2 is supposed to add some accuracy by setting a small volume. A non-negligible lag between the sending of the two volumes will happen: there will be a “stair” effect whereas only one “real” value was supposed to be heard. Some microseconds later, the channel 1 receives a new value. However, the channel 2 will still be added to the output! This makes no sense, sound-wise. This technique can not work properly on CPC.

One other advantage on ST: being mono, plugging a speaker will still work, whereas on a CPC, the signal will be split into two outputs. At best, one will sound right, the other will not have the accuracy bonus. ST wins.

A bit of Z80!

A lot of articles show how to send a sample to the PSG, so I won’t explain this in detail. To sum up, you just have to send a volume to one channel, as fast as possible. Ideally, the channel is selected once, at the beginning of the code, then won’t be modified anymore.

; Selecting the channel (done only once)
ld bc,#f400 + channel
out (c),c
ld bc,#f6c0
out (c),c
ld bc,#f600
out (c),c

; Sending a value
ld bc,#f400 + value
out (c),c
ld bc,#f680
out (c),c
ld bc,#f600
out (c),c

One simple optimization consists in using the “out (c),0” instruction at the end of last step. Example:

ld bc,#f400 + value
out (c),c
ld bc,#f680
out (c),c
out (c),0

Like Hicks discovered by dissecting Imperial Mahjong in the latest Another World, an optimization consists in setting the bit 7 in every sample: the PSG ignores it when sending the value to the #f4 port (only the first 5 bits are used), but only the bits 6 and 7 are read by the port #f6: a register is saved! This trick was given to me by Grim during an old yet memorable IRC chat in 2005. I guess he invented it.

ld bc,#f400 + value
out (c),c
ld b,#f6
out (c),c
out (c),0

Watch out during the mix: adding two values with the bit 7 set will set the Carry. This may have some side-effect on your code (or not). Finally, when selecting a register, you can avoid having to store the #c0 value: a “out (c),b” gives the same result! Thus:

ld bc,#f400 + channel
out (c),c
ld b,#f6
out (c),b
out (c),0

To mix or not to mix?

Playing a sound is one thing, digitracking is another. Must we play the samples on the three channels, or mix them into one? The first option is technically possible, but very expensive in term of CPU: switching from a channel to another one is slow and from my tests, this loss of speed provokes a loss in replay frequency that the use of the full dynamic of the volume (4 bits) per channel does not repay. By far, the most efficient technique consists in using one channel (the second, to be in “the center” of the stereo) and to mix 2 or 3 channels. What about 4? I tried. It is possible, but it gets really ugly (Antoine also did: it was even uglier). The lack of registers forces us to juggle with registers a lot, the replay frequency lowers, the result sounds like crap.

The mixing itself is very simple: in theory, simply add the values of the 2 or 3 samples and everything is fine. In practice, watch out for overflow! Not as simple as it seems (I have a patented trick for this, but won’t tell about it here: search for yourself!).

Playing notes

Playing a sound is nice, but how to vary the frequency to play a C, D, E or any other note? The technique is simple: instead of reading each sample with a step of 1, it must be done with a step of 1.1, 1.25, etc. according to the note. Know that playing the samples with a step of 2 will play the sound at the next octave, playing a sample twice, the lower octave.

Two questions: how to know this “step”? It can be calculated, but I won’t show how here. First because there are many information about it on the net, second because I have never done it: I always work by ear, which is faster if you are a bit of a musician. Only 11 steps have to be found after all (because 12 notes per octave)! As for the second question: how to move of 1.2 or 1.8 bytes? How to do that in Z80 assembler?

The arrival of the fixed point number

We have to use what is called “fixed point arithmetic”. In a 16-bit register such as HL, H will be the integer part, L the decimal part. To advance of 0.5, add #0080 (#80 being half of #0100):

ld hl,#0000 ; HL is an offset to the sound. 0 indicates we are at its beginning
ld de,#0080
add hl,de

Now, HL is #0080. H, the integer part, indicates we are still on the first sample (because equals to 0). Let’s make another iteration:

add hl,de

HL is now #0100. H = 1, thus points on the second sample! Victory is ours. You just read the sound twice slower than the original. We just have to vary DE to change to another note. This technique is simple and gives good results. Prodatron uses it in his Digitracker, but it can be optimized even further. It also allows portamento effects by increasing/decreasing DE.

Step table

Another technique, which I use in Orion Prime, allows a significant gain of speed, allowing me to replay at 18.3khz (world record!). Later on, I realized Crown (him again!) used it too. He really had thought well about his Protracker when he did it. This technique consists in precalculating a step table indicating, for each note, how fast to move within the sound.

For example, for the base note, it will look like this:
1, 1, 1, 1, 1, 1...
For the higher octave:
2, 2, 2, 2, 2, 2...
For the lower octave:
1, 0, 1, 0, 1, 0

Generating the table is analogue to the fixed-point technique. To optimize further, each subtable will be 256 bytes: the same pointer can be used for the three channels, without even having to manage the looping if the increment is done on 8 bits.

Two inconveniences about this technique: this step table takes a bit of memory: by limiting it to three octaves only (which is most of times enough for Digitrack music), and by allocating 256 bytes per note, it weighs 9kb. You can optimize it by storing only the subtables of the notes used by the song. Second inconvenience: the step is locked to what is read in the tables. No portamento effect is possible!

Finally, I have recently found an even faster technique, without effect limitation, which I hope to use in a production one day. The aim is not to play higher frequencies, but rather use the saved cycles to do effects DURING the replay.

Looping

How to manage the looping of sounds? On modern architecture, we have enough power to test, after each sample is played, whether the end of the sound is reached. On CPC, we certainly can not! As a fall-back, we check this “when we can”. Ideally, try to do that at the end of each frame. You will need to spend some time to read the data of the music at one time or another, so this will also be the moment to check the looping. This is not extremely accurate, but it is enough most of the cases.

SID sample

Let’s quickly talk about the SID sample used in Imperial Mahjong. I told you at the beginning of this article that playing samples on 3 channels is too cumbersome, due to the cycle amount it requires on CPC. Yet, that’s exactly what I do in Imperial Mahjong! It works because playing SID sample is a bit different. The wave generated by the PSG is accurate, only the one the SID sample generate is not. The overall quality is still acceptable even if the SID sample has a low replay frequency.

A difficulty about SID sample concerns the very small samples, that have to loop perfectly. The “checking whenever I can” technique explained above does not work anymore. So how to test the end of 3 sounds on 3 channels? Doing a 16-bit subtractions, besides modifying our registers, is too slow. I thus used a trick I love, and which you can use every time you have a looping table. It will however cost some memory, and requires the stack to be diverted. Let’s pretend I want to play the following samples, in a loop: 0, 5, 10, 15.

I encode the whole like this:

TableStart:
        dw 0
        dw $ + 2      ; Points on the two next bytes
        dw 5 * 256
        dw $ + 2
        dw 10 * 256
        dw $ + 2
        dw 15 * 256
        dw TableStart ; Let's loop!

You have to encode each value as 16 bits, followed by another 16 bits value pointing where is the next value. It can be two bytes further (“normal” case), or to the looping point. All we need to do is read the table this way:

TablePt: ld sp,TableStart
         pop af           ; Gets the sample in A
         pop hl           ; New value where to point to
         ld sp,hl         ; SP points to our new value

The looping is automatically managed! As I use 8-bit values, my data is multiplied by 256, else POP AF would put it in F, which wouldn’t be useful. However, a clever coder could use F! We can imagine setting the bit 0 (carry), or bit 6 (Z). Thus, just after the “ld sp,hl”, a JR C/Z could be done (“pop hl” and “ld sp,hl” do not modify F, so this is safe).

The main inconvenience of this technique is that it is memory-consuming. However, it was a life-saver in the SID sample context! For the anecdote, the SID sounds of the Imperial Mahjong introduction weigh 60k! They are generated just before the music starts, from a base wave sound.

Wrapping up

I think I told you all you needed to play beautiful samples, in an optimized way, on CPC. Don’t forget that, in the audio domain just like in any other domain, strange ideas and empiric tests give excellent results! I hope this article was an interesting read. See you soon!