By Grim.
A hash function computes a number from an arbitrary block of data. This number is called hash value, hash code, digest, or simply hash and is typically used either as a fingerprint (of the data) or an index into a fixed-size table called a hash table.
As a fingerprint (similar to a checksum), the longer the data, the longer the hash should be to reduce collision rate (eg. hashing tens of Kb to a 8-bit hash won’t be very reliable).
As an index, there usually are relatively few items to hash and their size is small (a few hundred bytes at most), so a 8 or 16-bit hash is often enough.
A possible use-case of a hash function could be eg. in a text-parser of an adventure game (eg. SRAM). Such game contains a dictionary of known words (strings) which is used to look up player’s text-input (eg. GO, NORTH, OPEN, DOOR, etc) and act on it. Storing many words in plain ASCII in this dictionary can quickly take a lot of memory and each word having different length will slow down the lookup.
Using a hash table of (hashed) known words instead will save memory (because hashes will be shorter than most words in plain ASCII) and looking up hashed player’s inputs will boil down to a quick 8 or 16-bit comparaison (longer hashes would be overkill in this case).
Hash functions are often confused with Checksums. They are all related but each one has its own uses and is designed differently.
Other posts in the series:
- Checksum functions
- Cipher: AES-128
FNV-1
- Authors: Glenn Fowler, Landon Curt Noll, Phong Vo
- Published: 1991
- References:
The basis of the FNV hash algorithm was taken from an idea sent as reviewer comments to the IEEE POSIX P1003.2 committee by Glenn Fowler and Phong Vo back in 1991. In a subsequent ballot round: Landon Curt Noll improved on their algorithm. Some people tried this hash and found that it worked rather well. In an EMail message to Landon, they named it the “Fowler/Noll/Vo” or FNV hash. FNV hashes are designed to be fast while maintaining a low collision rate. The FNV speed allows one to quickly hash lots of data while maintaining a reasonable collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, hostnames, filenames, text, IP addresses, etc.
Landon Curt Noll
Algorithm
hash = FNV_basis FOREACH byte IN data { hash = hash * FNV_prime hash = hash XOR byte }
32-bit Implementation in Z80 Assembly
;; FNV-1 32-bit hash
;;
;; FNV_Prime = &01000193 = %0001 0000 0000 0000 0001 1001 0011
;; FNV_Basis = &811C9DC5
;;
;; The 32-bit MUL (Hash * FNV_Prime) is hard-coded with a few shift-and-adds:
;; H = H + (H << 1) + (H << 4) + (H << 7) + (H << 8) + (H << 24)
;;
;; Size : 106 bytes
;; Speed: 7.5k/s
;;
;; Notes:
;; This implementation has a clumsy register usage.
;;
;; Input:
;; IX = data
;; BC = length
;;
;; Output:
;; HLIY = 32-bit hash
;; DE,BC,IX,AF,AF' are modified
fnv1_32:
exx
; HLIY (r) = offset_basis
ld hl,&811C
ld iy,&9DC5
exx
_fnv1_32_loop
exx
; DEBC (h) = HLIY (r) = current hash value
ld d,h
ld e,l
ld b,iyh
ld c,iyl
; r += (h<<24)
ld a,c
add a,d
ld h,a
; h = h<<1
sla c
rl b
rl e
rl d
; r += (h<<1)
add iy,bc
adc hl,de
; h = h<<2
sla c
rl b
ex de,hl
adc hl,hl
; h = h<<3
sla c
rl b
adc hl,hl
; h = h<<4
sla c
rl b
adc hl,hl
ex de,hl
; r += (h<<4)
add iy,bc
adc hl,de
; h = h<<5
sla c
rl b
ex de,hl
adc hl,hl
; h = h<<6
sla c
rl b
adc hl,hl
; h = h<<7
sla c
rl b
adc hl,hl
ex de,hl
; r += (h<<7)
add iy,bc
adc hl,de
; h = h<<8
sla c
rl b
rl e
rl d
; r += (h<<8)
add iy,bc
adc hl,de
; r ^= (data)
ld a,iyl
xor (ix+0)
inc ix
ld iyl,a
exx
; loop
dec bc
ld a,b
or c
jr nz,_fnv1_32_loop
exx
ret
;; Unit tests
;; "Semilanceata" => &1E12175C
;; "Longueteau" => &7F7CC956
;; "Severin" => &9A0DA2E9
;; "Damoiseau" => &0A5D56CF
;; "foobar" => &31F0B262
;; "chongo was here" => &98A0BF6C
Pearson Hash
- Author: Peter K. Pearson
- Published: June 1990
- References: Pearson hashing (Wikipedia)
Pearson hashing is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires basically one XOR operation and a table lookup. The LUT is 256-byte long, containing a permutation of the values 0 through 255.
It can be easily extended to any hash size of k
*8-bit, with k
an integer greater than 1, by calculating k
Pearson hashes of the input, each with an unique initial seed value.
Algorithm
8-bit
The hash function requires a 256 bytes lookup table.
hash = 0
FOREACH byte IN data
hash = LUT[hash XOR byte]
16-bit
A 16-bit Pearson hash function could be done exactly like the 8-bit version but using 16-bit numbers everywhere (including the lookup table, which would grow to the unacceptable size of 64K!). A better alternative, for a 8-bit platform, is to compute two 8-bit Pearson hashes, both initialized with a different value.
hash1 = 0
hash2 = 1
FOREACH byte IN data
hash1 = LUT[hash1 XOR byte]
hash2 = LUT[hash2 XOR byte]
hash = hash2*256 + hash1
Implementation in Z80 Assembly
To simplify the routines and speed things up, the lookup table must be aligned on a 256 bytes address boundary (ie. &xx00).
8-Bit hash function
;; Pearson hash 8-bit
;;
;; Size : 16 bytes + 256 bytes lookup table
;; Speed: 88.7k/s
;;
;; Input:
;; HL = data
;; BC = length
;;
;; Output:
;; A = 8-bit hash
;; HL,DE,BC and Flags are modified
pearson8:
; Adjust 16-bit length for 2x8-bit loops
inc b
dec bc
inc c
; LUT pointer
ld d,_pearson_lut / 256
; Initialize hash to zero
xor a
_pearson8_loop
xor (hl)
inc hl
ld e,a
ld a,(de)
dec c
jr nz,_pearson8_loop
djnz _pearson8_loop
ret
16-Bit hash
;; Pearson hash 16-bit
;;
;; Size : 26 bytes + 256 bytes lookup table
;; Speed: 57.4k/s
;;
;; Input:
;; HL = data
;; BC = length
;;
;; Output:
;; DE = 16-bit hash
;; HL,BC,AF are modified
pearson16:
; Adjust 16-bit length for 2x8-bit loops
inc b
dec bc
inc c
; LUT pointer
ld d,_pearson_lut / 256
; Initialize both hashes
xor a
ex af,af'
ld a,51
_pearson16_loop
xor (hl) ; First 8-bit hash
ld e,a
ld a,(de)
ex af,af'
xor (hl) ; Second 8-bit hash
ld e,a
ld a,(de)
inc hl
dec c
jr nz,_pearson16_loop
djnz _pearson16_loop
; Return both hash values in DE
ld d,a
ex af,af'
ld e,a
ret
Lookup table generator
The permutation lookup table for the Pearson hash function must contain all values from 0 to 255 in random order. So we could easily generate it using something as simple as this:
ld hl,_pearson_lut AND &FF00
_pearson_lut_generator
inc l
ld (hl),l
jr nz,_pearson_lut_generator
But it is far too simple (ie. not scrambled at all) to usually give satisfactory results.
We can improve this a bit:
ld hl,_pearson_lut AND &FF00
xor a
_pearson_lut_generator
add a,51 ; any odd number here will work
ld (hl),a
inc l
jr nz,_pearson_lut_generator
That’s better, but might still not be satisfactory enough. So we could ultimately use an 8-bit LFSR based PRNG (ie. with a 256-bytes period) to generate the lookup table.
Pre-generated lookup table
If you can afford its storage space in your program and do not want to bother with a generator, here is a pre-generated lookup table you can use “as-is“:
ALIGN 256
_pearson_lut
DB 98,6,85,150,36,23,112,164,135,207,169,5,26,64,165,219
DB 61, 20,68,89,130,63,52,102,24,229,132,245,80,216,195,115
DB 90,168,156,203,177,120,2,190,188,7,100,185,174,243,162,10
DB 237,18,253,225,8,208,172,244,255,126,101,79,145,235,228,121
DB 123,251,67,250,161,0,107,97,241,111,181,82,249,33,69,55
DB 59,153,29,9,213,167,84,93,30,46,94,75,151,114,73,222
DB 197,96,210,45,16,227,248,202,51,152,252,125,81,206,215,186
DB 39,158,178,187,131,136,1,49,50,17,141,91,47,129,60,99
DB 154,35,86,171,105,34,38,200,147,58,77,118,173,246,76,254
DB 133,232,196,144,198,124,53,4,108,74,223,234,134,230,157,139
DB 189,205,199,128,176,19,211,236,127,192,231,70,233,88,146,44
DB 183,201,22,83,13,214,116,109,159,32,95,226,140,220,57,12
DB 221,31,209,182,143,92,149,184,148,62,113,65,37,27,106,166
DB 3,14,204,72,21,41,56,66,28,193,40,217,25,54,179,117
DB 238,87,240,155,180,170,242,212,191,163,78,218,137,194,175,110
DB 43,119,224,71,122,142,42,160,104,48,247,103,15,11,138,239
Changelog
- 2020-11-07 – Recycled by MrEarwig
One thought on “Hash algorithms”
Comments are closed.