12  Semi Numericals

We now turn our attention away from numericals into semi-numerical applications. Well over 5 decades ago, ambitious engineers attempted to leverage the existing analog telephone network for digital transmission. With an acoustic coupler at each end, a dialup network could be established. The key to the success of this was ensuring the integrity of the data exchange; e.g. what you type at a terminal at one end had to be received at the computer exactly and vice versa. The solution that has evolved many generations started with parity checking at the individual character level, cyclic redundancy check at a block level and message digests of various flavors.

The problem of verifying the integrity of message content has not gone away.

A projectlet to compute and verify parity bits for characters will be our starting point. Further explorations with a 16 bit CRC computation and finally a message digest using the MD5 algorithm will be reviewed. For each technique, the payload will be corrupted and then ensured that the method does indeed provide different outputs.

The choice of the approach for any application is of course needs careful consideration of many factors. However all these approaches continue to be actively used and researched.

12.1 Learning Objectives

  • Parity computations with low level bit manipulation

  • Data Corruption simulation

  • CRC computation

  • Library exploration for message digests

12.2 Parity and Bit level manipulations

In a serial communications, computing a parity bit (Ref: https://en.wikipedia.org/wiki/Parity_bit) and transmitting it along with the data for comparing at the receiving point has been a standard solution from the early days. Instead of transmitting the 8 bits for each byte, the parity bit makes the payload 9 bits. Support for this is included in most serial port driver chips even now. In other words parity is handled mostly at the digital electronics level. Concepts will be explored in this projectlet for pedagogic purposes.

The basic specifications related to parity may be listed as:

~/bin/codemd ../../toolkit/adalib/src/parity.ads -x Basic -l
0005 |    type ParityType is (Odd, Even);
0006 | 
0007 |    type ParityBitType is (Low, High);
0008 | 
0009 |    type BitCountType is new Natural range 0 .. Unsigned_8'Machine_Size;

Though odd parity may be most common, other choice is even parity. Given a byte the parity bit computation of course depends on which is chosen:

~/bin/codemd ../../toolkit/adalib/src/parity.adb -x Parity -l
0019 |    odd_mask : constant := 2#0000_0001#;
0020 |    function ParityBit
0021 |      (byte : Unsigned_8; spec : ParityType := Odd) return ParityBitType
0022 |    is
0023 |       bc : constant Unsigned_8 := Unsigned_8 (BitCount (byte));
0024 |    begin
0025 |       if odd_mask = (bc and odd_mask) then
0026 |          case spec is
0027 |             when Odd =>
0028 |                return Low;
0029 |             when Even =>
0030 |                return High;
0031 |          end case;
0032 |       else
0033 |          case spec is
0034 |             when Odd =>
0035 |                return High;
0036 |             when Even =>
0037 |                return Low;
0038 |          end case;
0039 |       end if;
0040 |    end ParityBit;

As shown above, bitwise and, or etc are supported on Unsigned types; as are bit shift operations as shown below:

~/bin/codemd ../../toolkit/adalib/src/parity.adb -x Count -l
0004 |    function BitCount (byte : Unsigned_8) return BitCountType is
0005 |       result : Natural    := 0;
0006 |       mask   : Unsigned_8 := 1;
0007 |    begin
0008 |       for b in 1 .. 8 loop
0009 |          if mask = (mask and byte) then
0010 |             result := result + 1;
0011 |          end if;
0012 |          mask := Shift_Left (mask, 1);
0013 |       end loop;
0014 |       return BitCountType (result);
0015 |    end BitCount;

Even if basic parity computations are not implemented in software these days, the core features are useful in for example explicitly corrupting data blocks - for testing purposes:

~/bin/codemd ../../toolkit/examples/integ/src/integ.adb -x Corrupt -l
0018 |    function Corrupt (b : Unsigned_8) return Unsigned_8 is
0019 |       mask : Unsigned_8 := 0;
0020 |       rv   : Float ;
0021 |    begin
0022 |       loop
0023 |          rv := GNAT.Random_Numbers.Random (Gen);
0024 |          mask := Shift_Left (2#1#, Natural (rv * Float (mask'Size)));
0025 |          if mask /= 0
0026 |          then
0027 |             exit ;
0028 |          end if ;
0029 |       end loop ;
0030 |       return mask xor b;
0031 |    end Corrupt;

12.2.1 Sample Usage

What then is the result of corrupting 1 bit. In theory, the parity bit should then flip. With odd parity:

~/bin/integ p S r I n I
Argument : S 53
BitCount  4 Required Parity Bit HIGH
Corrupted to 13 Required Parity Bit LOW
Argument : r 72
BitCount  4 Required Parity Bit HIGH
Corrupted to 7a Required Parity Bit LOW
Argument : I 49
BitCount  3 Required Parity Bit LOW
Corrupted to 59 Required Parity Bit HIGH
Argument : n 6e
BitCount  5 Required Parity Bit LOW
Corrupted to 4e Required Parity Bit HIGH
Argument : I 49
BitCount  3 Required Parity Bit LOW
Corrupted to 59 Required Parity Bit HIGH

and with even parity:

~/bin/integ P I n I r S
Argument : I 49
BitCount  3 Required Parity Bit HIGH
Corrupted to 4d Required Parity Bit LOW
Argument : n 6e
BitCount  5 Required Parity Bit HIGH
Corrupted to 2e Required Parity Bit LOW
Argument : I 49
BitCount  3 Required Parity Bit HIGH
Corrupted to 69 Required Parity Bit LOW
Argument : r 72
BitCount  4 Required Parity Bit LOW
Corrupted to 70 Required Parity Bit HIGH
Argument : S 53
BitCount  4 Required Parity Bit LOW
Corrupted to d3 Required Parity Bit HIGH

12.3 CRC

Let us see how the CRC fares with 1 bit corruption:

~/bin/integ c "Hello World. This is a long sentence."
CRC is f861
With 1 bit corruption Corrupting char  8 (o) to '/'
CRC is f811
With 1 bit corruption Corrupting char  27 (g) to 'ç'
CRC is 3ac8
With 1 bit corruption Corrupting char  37 (.) to '/'
CRC is 38a0
With 1 bit corruption Corrupting char  35 (c) to '#'
CRC is 2c60
With 1 bit corruption Corrupting char  15 (h) to 'l'
CRC is 0b53
With 1 bit corruption Corrupting char  27 (g) to '''
CRC is 3934
With 1 bit corruption Corrupting char  2 (e) to 'm'
CRC is d660
With 1 bit corruption Corrupting char  17 (s) to 'ó'
CRC is 30fe

Except in rare cases - where the identical character is getting corrupted in an identical way, the resulting CRC should be different.

Of course with a 16 bit CRC, there are only 65535 (non 0) unique values and thus theoretically clashes in the final CRC cannot be ruled out. For practical applications, CRC can serve as a means to detect data corruptions.

12.4 Digest

Even stronger signatures with MD5 digests:

~/bin/integ h "Hello World. This is a long sentence."
Digest : db19e87d923af6dd2cfeb9a6fe71cbb4
With 1 bit corruption Corrupting char  8 (o) to 'm'
Digest : f4eebad140d0d093cd35539fb8e806c3
With 1 bit corruption Corrupting char  8 (o) to 'g'
Digest : fd9d44eaf8bdf909facb16eb3189a01a
With 1 bit corruption Corrupting char  27 (g) to 'o'
Digest : 352ebc3e6734fdffd6e1d81c4c67a72e
With 1 bit corruption Corrupting char  15 (h) to '`'
Digest : 9e5cf1bc4a89c74d23608c4412ef2cd3
With 1 bit corruption Corrupting char  9 (r) to '2'
Digest : a2bb1aa176dc5267fb31b232d8884d18
With 1 bit corruption Corrupting char  23 ( ) to NUL
Digest : c5941698c3d952f644f705b73a79b53e
With 1 bit corruption Corrupting char  24 (l) to 'n'
Digest : 8b7aaf36f403b07e7809b122edd65c53
With 1 bit corruption Corrupting char  25 (o) to '/'
Digest : dc2edd74d6950ef1a06a1fc5a329ca31

For CRC as well as the digests, all the results should result in a change.

12.5 Stretch

  • Experiment with corruption of file contents and the use of CRC and/or digests to detect such corruptions. It is almost standard practice now for file servers to share the digital signatures to verify the downloaded versions are not currupt.

  • How sensitive are the CRC and hash signatures to the order of characters in the input? Even if a corruption candidate and the corrupted value are identical, if the position within the target string is different, does the signature always change?

  • The CRC used above is only 16 bits. Perhaps a 32 bit will provide even more security.

  • Different popular tools like zip, use their specific versions of the CRC. Comparision of their performances might be highly educational.

  • Widely used systems like git have chosen to adopt a higher order digital signature discipline over the MD5 algorithm. An analysis of file patterns that may result in MD5 clashes will be interesting.

12.6 Implementation

Repository: toolkit
Directory: examples/integ