by Neal R. Wagner
Copyright © 2002 by Neal R. Wagner. All rights reserved.
NOTE: This site is obsolete. See book draft (in PDF):
Codes that correct errors are essential to modern civilization and are used in devices from modems to planetary satellites. The theory is mature, difficult, and mathematically oriented, with tens of thousands of scholarly papers and books, but this section will describe only a simple and elegant code, discovered in 1949.
Richard Hamming found a beautiful binary code that will correct any single error and will detect any double error (two separate errors). The Hamming code has been used for computer RAM, and is a good choice for randomly occurring errors. (If errors come in bursts, there are other good codes.) Unlike most other error-correcting codes, this one is simple to understand.
The code uses extra redundant bits to check for errors, and performs the checks with special check equations. A parity check equation of a sequence of bits just adds the bits of the sequence and insists that the sum be even (for even parity) or odd (for odd parity). This section uses even parity. Alternatively, one says that the sum is taken modulo 2 (divide by 2 and take the remainder), or one says that the sum is taken over the integers mod 2, Z2.
A simple parity check will detect if there has been an error in one bit position, since even parity will change to odd parity. (Any odd number of errors will show up as if there were just 1 error, and any even number of errors will look the same as no error.)
One has to force even parity by adding an extra parity bit and setting it either to 1 or to 0 to make the overall parity come out even. It is important to realize that the extra parity check bit participates in the check and is itself checked for errors, along with the other bits.
The Hamming code uses parity checks over a portion of the positions in a block. Suppose there are bits in consecutive positions from 1 to n-1. The positions whose position number is a power of 2 are used as check bits, whose value must be determined from the data bits. Thus the check bits are in positions 1, 2, 4, 8, 16, ..., up to the largest power of 2 that is less than or equal to the largest bit position. The remaining positions are reserved for data bits.
Each check bit has a corresponding check equation that covers a portion of all the bits, but always includes the check bit itself. Consider the binary representation of the position numbers: 1 = 12, 2 = 102, 3 = 112, 4 = 1002, 5 = 1012, 6 = 1102, and so forth. If the position number has a 1 as its rightmost bit, then the check equation for check bit 1 covers those positions. If the position number has a 1 as its next-to-rightmost bit, then the check equation for check bit 2 covers those positions. If the position number has a 1 as its third-from-rightmost bit, then the check equation for check bit 4 covers those positions. Continue in this way through all check bits. The table below summarizes this.
Here is a table showing the parity checks for the first 17 positions of the Hamming code. (Check bits are in positions 1, 2, 4, 8, and 16, in red italic in the table below.)
The table below assumes one starts with data bits 1101101 (in black below). The check equations above are used to determine values for check bits in positions 1, 2, 4, and 8, to yield the word 11101010101 below, with check bits in red italic here and below.
Intuitively, the check equations allow one to ``zero-in'' on the position of a single error. For example, suppose a single bit is transmitted in error. If the first check equation fails, then the error must be in an odd position, and otherwise it must be in an even position. In other words, if the first check fails, the position number of the bit in error must have its rightmost bit (in binary) equal to 1; otherwise it is zero. Similarly the second check gives the next-to-rightmost bit of the position in error, and so forth.
The table below gives the result of a single error in position 11 (changed from a 1 to a 0). Three of the four parity checks fail, as shown below. Adding the position number of each failing check gives the position number of the error bit, 11 in this case.
The above discussion shows how to get single-error correction with the Hamming code. One can also get double-error detection by using a single extra check bit, which is in position 0. (All other positions are handled as above.) The check equation in this case covers all bits, including the new bit in position 0. In case of a single error, this new check will fail. If only the new equation fails, but none of the others, then the position in error is the new 0th check bit, so a single error of this new bit can also be corrected. In case of two errors, the overall check (using position 0) will pass, but at least one of the other check equations must fail. This is how one detects a double error. In this case there is not enough information present to say anything about the positions of the two bits in error. Three or more errors at the same time can show up as no error, as two errors detected, or as a single error that is ``corrected'' with a bogus correction.
Notice that the Hamming code without the extra 0th check bit would correct a double error in some bogus position as if it were a single error. Thus the extra check bit and the double error detection are very important for this code. Notice also that the check bits themselves will also be corrected if one of them is transmitted in error (without any other errors).
The Hamming code can accommodate any number of data bits, but it is interesting to list the maximum size for each number of check bits. The table below includes the overall check bit, so that this is the full binary Hamming code, including double error detection.
|Check bits||Max Data bits||Max Total size|
For example, with 64 bits or 8 bytes, one gets 7 bytes of data (plus 1 bit) and uses 1 byte for the check bits (actually, only 7 bits). Thus an error-prone storage or transmission system would only need to devote 1 out of each 8 bytes 12.5% to error correction/detection