Introduction & Block Codes, Weight, and Distance#

ECFF 1992 Pretzel, Oliver. Error-Correcting Codes and Finite Fields. Oxford University Press Applied Mathematics and Computing Science Series.




The theory of error-correcting codes deals with the general problem of transmitting messages reliably. (Transmission in space (wires) and in time (storage).) In the classical theory no assumption is made about the nature of the message to be transmitted. Thus there is no statistic on which to base a guess of the correct message. Redundancy is built into the message at the transmitter so that the receiver can use that redundancy to correct errors. This goal can be achieved with high probability but not with certainty. A weakness of the classical theory is that errors that occur before the transmitter encodes and transmits the message cannot be corrected much less detected.




Strings#


Alphabet#

Definition: Alphabet

A message is composed of symbols (or characters) from a fixed finite set called the alphabet \(A\). All alphabets are assumed to have a null character which may be designated by \(0\).

In the case of English, an alphabet is composed of uppercase and lowercase letters, (the digits,) punctuation marks, and whitespace characters. The null character is the space.

Binary Alphabet#

Definition: Binary Alphabet

An alphabet with just one symbol is useless. So the next simplest alphabet is the binary alphabet \(\textbf{B}\) consisting of the two symbols \(0\) and \(1\):

\(\textbf{B} = \{ 0, 1 \}\)

The elements of \(\textbf{B}\) are called bits.




The Channel Model#


The situation to be modeled

Information is transmitted via a channel. The channel is prone to errors. The distorted information is processed at the receiving end to restore the original message as nearly as possible. Information is passed through the channel in separate lumps (discrete channel) or in a continuously varying form (continuous channel).

Examples of channels

  • Modulator (translates information into electrical signals), transmitter, receiver, demodulator: radio, television, satellite

  • Cable: computer file transfer

  • Storage medium + the means of storing and retrieving information from it (errors can occur either by a failure of the storage or reading device, or by deterioriation of the medium)

  • language

The random error channel is one of the simplest models of a channel.

Definition: Random Error Channel

A channel is called a random error channel if for each pair of distinct symbols \((a, b)\) of the alphabet there is a fixed probability \(p_{(a, b)}\) that when \(a\) is transmitted \(b\) is received.

The main point of this definition is that \(p_{(a, b)}\) does not depend on anything else, such as whether the previous symbol was correctly transmitted or not. For simplicity we assume that \(p_{(a, b)}\) is independent of the symbols \(a\) and \(b\) given that \(a \ne b\).

Definition: Burst Error Channel

If the previous symbol was in error then that will increase the probability of the current symbol being corrupted.

Definition: Erasure

Definition: Symmetric Random Error Channel

A random error channel is called symmetric if the probabilities \(p_{(a, b)}\) are the same for all possible choices of pairs \((a, b)\) with \(a \ne b\).




Coding for the binary symmetric channel#

We assume the discrete symmetric random error channel.

Definition: Symbol Error Probability

The symbol error probability \(p\) is the probability that an error occurs in a single symbol (i.e., that a single symbol is received in error).

It can be assumed that

\(p \lt 1/2\)

as follows. If it were the case that \(p \gt 1/2\) then the received symbol would be more likely to be incorrect than correct. The solution is to flip every received symbol (e.g., if we receive a \(1\) we would interpret it as a \(0\) and vice versa). By flipping, the error probability becomes \(1 - p\) and satisfies \(1 - p \lt 1/2\). Thus, the problem can always be reduced to a channel with \(p \lt 1/2\) via symbol-flipping.

The assumption that \(p \lt 1/2\) is a standard simplification in communication theory because it represents the most useful scenario for communication systems. Channels with \(p \ge 1/2\) are considered impractical for direct communication without preprocessing (e.g., bit-flipping) as they effectively transmit noise rather than information. Error-correcting codes like Hamming codes, Reed-Solomon codes, or modern low-density parity-check (LDPC) codes are designed under the assumption that \(p \lt 1/2\). (Coin-flip analogy: If the coin has a bias such that heads occurs less than 50% of the time \((p \lt 1/2)\), you can reasonably expect tails to occur more often than heads. If the bias were such that heads occurred more than 50% of the time \((p \gt 1/2)\), you could redefine the “success” outcome as tails by reversing the interpretation, effectively flipping the probabilities.)

If \(p = 1/2\) then the output of the channel is independent of the input and we might as well stop transmitting.

Definition: Encoder

The encoder takes an input signal called the message and modifies it in order to make it possible to detect and perhaps also correct any errors that the channel is likely to induce. At the other end of the channel is a decoder which retrieves the original message. Before the decoder is applied, the error processor attempts to correct or detect errors in the received message. According to the circumstances, it may modify the received message to enable the decoder to translate it (error correction), or it may send an error signal in which case the decoder will ignore part of the incoming message (error detection). Often the decoder and error processor are lumped together and the error signal is called a decoding failure. But it is better to keep them separate conceptually, even though in some implementations it is natural to combine them.

                          +--------------------------+
                          |                          |
                          |  Random error generator  |
                          |                          |
                          +--------------------------+
                                       |                                                            +-----------+
                                       |                                                            |           |  Message
         +-----------+                 |                      +-------------------+---------------->|  Decoder  |----------->
Message  |           |  Signal         v   Distorted signal   |                   |  code word u'   |           |  word x'
-------->|  Encoder  |--------------->|+|-------------------->|  Error processor  |                 +-----------+
word x   |           |  code word u        received word v    |                   |
         +-----------+                                        +-------------------+----------------------------------------->
                                                                                     v + error signal



How to assess the performance of a code over a given channel? The basis for this assessment is the Hamming distance which is the number of places in which two words differ.

The worst-case error-processing performance of a code is determined by the minimum distance between code words. Then some elementary probability theory can be used to assess the performance of a code.




Block Codes#

                                                         +--------------------------+
                                                         |                          |
                                                         |  Random error generator  |
                                                         |                          |
                                                         +--------------------------+
                                                                      |                                                            +-----------+
                                                                      |                                                            |           |  Message
           +----------------+           +-----------+                 |                      +-------------------+---------------->|  Decoder  |----------->
  Message  |                |  Message  |           |  Signal         v   Distorted signal   |                   |  code words u'  |           |  words x'
---------->|  Preprocessor  |---------->|  Encoder  |--------------->|+|-------------------->|  Error processor  |                 +-----------+
  string   |                |  words x  |           |  code words u       received words v   |                   |
           +----------------+           +-----------+                                        +-------------------+----------------------------------------->
                                                                                                                    v + error signal

Definition: block#

The encoder’s preprocessor divides the input message of arbitrary length into blocks (words) of a fixed number of symbols \(m\) and then the encoder translates each word into a codeword of fixed length \(n\). Such a code is called a block code.

Definition: \(A\)-word

If \(A\) is an alphabet then an \(A\)-word of length \(n\) is a sequence of \(n\) symbols from \(A\).

  1. The set of \(A\)-words of length \(n\) is denoted by \(A^n\).

  2. If \(A\) has \(q\) symbols then there are \(q\) choices for the symbol in each place in an \(A\)-word of length \(n\) and so the total number of such words is \(q^n\).

  3. Thus \(|A^n| = |A|^n = q^n\).

Definition: block code#

Definition: Block Code

An \((n, m)\)-block code \(C\) over the alphabet \(A\) of size \(q\) consists of a set of precisely \(q^m\) code words in \(A^n\).

\(n\) is called the code’s block length.

\(m\) is called the code’s rank.

\(m/n\) is called the code’s rate and satisfies \(m \le n \iff m/n \le 1\)

We require precisely \(q^m\) codewords to ensure that an encoder exists. There are \(q^m\) possible message words and each must correspond to a distinct code word. Any further code words are not used and may as well be discarded.

For example, a binary code of rank \(m\) must have \(2^m\) code words.

Definition: encoder#

Definition: Encoder

An encoder \(E\) for \(C\) is a map from \(A^m\) to \(C\).

\(\boxed{E : A^m \to C \quad\quad x \mapsto u}\)

It translates any \(A\)-word \(x\) of length \(m\) into a code word \(u = E(x)\). The encoder is a bijection so that each message word corresponds to one and only one codeword and each codeword represents a unique message word.

Often, the encoder preserves the message word \(x\) as the first part of the code word \(u = E(x)\). Such an encoder is called standard or systematic, in which case the code word is divided into message symbols and check symbols and the decoder simply strips the check symbols.

Definition: decoder#

Definition: Decoder

The corresponding decoder \(D\) is the inverse map of \(E\).

\(\boxed{D : C \to A^m \quad\quad u \mapsto x}\)

It takes every code word \(u = E(x)\) back to \(x\).


In a binary symmetric channel the error-processing capabilities of the coding system do not depend on the encoder and decoder but only on the set of code words because these are all that the channel sees; the choice of encoder and decoder is thus only a matter of practical convenience.

Most of coding theory is concerned with (1) the construction of codes \(C\) and (2) efficient error processors.


Definition: received word#

When errors occur in transmission, the receiver reads a word \(v\) although the transmitter sent a word \(u\).

Definition: Received Word

If \(u = (u_1, u_2, \dotsc, u_n)\) and \(v = (v_1, v_2, \dotsc, v_n)\) are words in \(A^n\) then we refer to \(u_j\) as the entry of \(u\) in place \(j\), and we say that \(v\) differs from \(u\) in place \(j\) if \(u_j \ne v_j\).

The word \(v\) to be analyzed by the error processor is called the received word.

(In this context the words position and location are synonyms for the word place.)

Definition: error of weight \(k\)#

Definition: Error of weight \(k\)

If the received word \(v\) differs from the transmitted one in \(k\) places we say that an error of weight \(k\) occurred (or that \(k\) errors occurred).

Suppose \(u\) is transmitted and \(v\) is received.

\( \begin{aligned} u &= (1, \textcolor{red}{0}, 0, 1, \textcolor{red}{1}, 0) \\ v &= (1, \textcolor{red}{1}, 0, 1, \textcolor{red}{0}, 0) \\ \end {aligned} \)

Then an error of weight \(2\) has occurred.

Definition: Hamming distance and weight#

It is useful to formalize the notion of differing places between words by regarding the number of places in which two words differ as a distance between them.

Definition: Hamming distance and weight

The Hamming distance \(d(u, v)\) between two words \(u\) and \(v\) is the number of entries in which they differ.

The Hamming weight \(\text{wt}(u)\) of \(u\) is the number of non-null entries in \(u\).

If \(\textbf{0} = (0, \dotsc, 0)\) then \(\text{wt}(u) = d(u, \textbf{0})\).

Definition: distance function#

The Hamming distance satisfies the properties of a distance function.

Definition: Distance Function

A function \(f(x, y)\) on pairs of elements of a set \(X\) is a distance function if it satisfies the following conditions.

  1. \(f(x, y)\) is always a non-negative real number.

  2. \(f(x, y) = 0\) iff \(x = y\).

  3. \(f(x, y) = f(y, x)\)

  4. Triangle Inequality: For any three elements \(x, y, z \in X\) it is the case that \(f(x, z) \le f(x, y) + f(y, z)\)

Proof

Conditions 1, 2, and 3 follow from the definition of the Hamming distance. Condition 4:

Let

\( \begin{aligned} x &= (x_1, \dotsc, x_n) &\textcolor{#50C878}{= (1, 0, 1, 1, 0, 1, 0, 0)} \\ y &= (y_1, \dotsc, y_n) &\textcolor{#50C878}{= (0, 0, 1, 0, 1, 1, 0, 1)} \\ z &= (z_1, \dotsc, z_n) &\textcolor{#50C878}{= (0, 1, 1, 0, 0, 0, 1, 1)} \\ \end {aligned} \)

Then \(d(x, z)\) is the number of places in which \(x\) and \(z\) differ. If we denote the set of these places by \(U\) then

\(d(x, z) = |U| = |\{ i \mid x_i \ne z_i \}| \textcolor{#50C878}{= |\{ 0, 1, 3, 5, 6, 7\}| = 6}\)

Let

\( \begin{aligned} S &= \{ i \mid x_i \ne z_i \land x_i = y_i \} \textcolor{#50C878}{= \{ 1, 5, 6 \}} \\ T &= \{ i \mid x_i \ne z_i \land x_i \ne y_i \} \textcolor{#50C878}{= \{ 0, 3, 7 \}} \\ \end {aligned} \)

Then \(U\) is the disjoint union of \(S\) and \(T\). Thus

\(d(x, z) = |S| + |T|\)

It is immediate from the definition of \(d(x, y)\) that \(x\) differs from \(y\) in all the places in \(T\). Thus

\(|T| \le d(x, y)\)

On the other hand if \(i \in S\) then \(y_i = x_i \ne z_i\). Thus

\(|S| \le d(y, z)\)

\(d(x, z) = |S| + |T| \le d(x, y) + d(y, z)\)

\(\blacksquare\)

Definition: error processor (and error detector)#

An error processor \(P\) for \(C\) could just test each word \(v\) it receives to see if it is a code word or not and send an error signal for a non-code word, or it could attempt to correct certain non-code words. To give a flexible formal definition, we assume that when it receives a word \(v\) the processor puts out a word \(u\) and a signal (“good” or “bad”). The signal says whether the processor is putting out a code word or not.

Definition: Error Processor

An error processor \(P\) for \(C\) is a map that accepts a received word \(v\) of length \(n\) and produces a pair \((a, u)\) where \(a\) takes on two values (“good” or “bad”) and \(u\) is a word of length \(n\). The signal \(a\) has the value “good” when \(u\) is a code word and “bad” otherwise. An error processor that always leaves the received word unchanged is called an error detector and an error processor that always produces a code word is called perfect.

An error processor must start by testing the received word \(v\) to see if it is a code word or not. If \(v\) is a code word then the error processor’s job is done. It has no means of knowing what the message was and so it can do no better than accept the word it received as correct. It will transmit it unchanged with a “good” signal. This means that error patterns that distort one code word into another code word are undetectable and thus incorrectable.

Definition: minimum distance of a code#

Suppose that we want to be able to detect all errors of weight at most \(s\). It is necessary that any two distinct code words are at Hamming distance \(s + 1\). It makes sense to introduce the notion for the smallest possible distance between distinct code words.

Definition: Minimum Distance

Let \(C\) be an \((n, m)\)-code. The minimum distance \(d(C)\) of \(C\) is the smallest Hamming distance between distinct code words of \(C\).

This measure is so important that we sometimes call attention to it be describing \(C\) as an \((n, m, d)\)-code.

Claim: condition for error detection#

The condition for error detection derived above is both necessary and sufficient.

Claim

Let \(C\) be a code. Then it is possible for an error processor for \(C\) to detect all errors of weight \(\le s\) iff \(d(C) \ge s + 1\).

\(\boxed{\exists D_{(C, s)} \iff d(C) \ge s + 1}\)

+-----------+
|        s  |  1
|     +-----|-----+
|     u     |     v
+-----------+

Proof

First Direction \(\lnot Q \implies \lnot P\)

If two code words \(u\) and \(v\) are at a distance at most \(s\) then one can be distorted into the other by an error of weight at most \(s\). In that case no error processor for \(C\) can detect all errors of weight at most \(s\).

Second Direction \(Q \implies P\)

Conversely if any two code words are at distance at least \(s + 1\) then any error of weight \(s\) will distort a code word into a non-code word. An error detector can check whether a received word is a code word or not, for instance by looking it up in a table. Thus all errors of weight at most \(s\) are detectable.

\(\blacksquare\)

Claim: condition for error correction#

There is a similar condition for error correction. Suppose we wish to be able to correct all single errors. Then given a received word and the information that an error of weight one occurred there must be only one code word that could have been transmitted. In other words no two code words \(u\) and \(v\) may be at distance at most \(1\) from the same word \(w\). Thus by the triangle inequality the code must have minimum distance at least \(3\).

A similar argument works for larger errors. If we are to be able to correct all errors of weight up to \(t\) then given a word and the information that an error of weight at most \(t\) occurred there must be a unique code word that could have been transmitted. So no two code words may be at distance at most \(t\) from the same word \(w\). Thus the code has minimum distance at least \(2t + 1\).

Claim

There exists an error processor for the code \(C\) that corrects all errors of weight up to \(t\) iff \(C\) has minimum distance \(2t + 1\).

\(\boxed{\exists P_{(C, t)} \iff d(C) \ge 2t + 1}\)

+-----------+     +-----------+
|        t  |  1  |  t        |
|     +-----|-----|-----+     |
|     u     |     |     v     |
+-----------+     +-----------+

Proof

First Direction \(\lnot Q \implies \lnot P\)

Suppose the code contains two code words \(u\) and \(v\) at distance at most \(2t\). Let \(w\) be a word that agrees with \(u\) at all places that \(u\) and \(v\) agree. Further let \(w\) agree with \(u\) at the first \(t\) places where \(u\) and \(v\) disagree and with \(v\) in the remaining places where \(u\) and \(v\) disagree. (If \(d(u, v) \lt t\) then take \(w = u\).) Then \(d(u, w) \le t\) and \(d(v, w) \le t\). Now suppose \(w\) is received together with the information that at most \(t\) errors occurred. Then either \(u\) or \(v\) could have been transmitted (and possibly even some other code word). There is no way that from the given information an error processor can decide with certainty which code word was transmitted. So it will fail to correct some errors of weight \(\le t\).

Second Direction \(Q \implies P\)

Conversely suppose that the code has minimum distance \(2t + 1\) and a word \(w\) is received together with the information that an error of weight at most \(t\) has occurred. If there were two code words \(u\) and \(v\) at distance at most \(t\) from \(w\) then by the triangle inequality \(d(u, v) \le 2t\), contradicting our hypothesis. Thus there is a unique code word \(u\) at distance at most \(t\) from \(w\) and we can deduce that \(u\) must have been transmitted.

\( \begin{aligned} u &= (1, 0, 1, 1, 0, 1, 0, 0) \\ v &= (0, 0, 1, 0, 1, 1, 0, 1) \\ w &= (0, 0, 1, 0, 0, 1, 0, 1) \\ \end {aligned} \)

\(\blacksquare\)


The minimum distance \(d(C)\) determines the worst-case error-detecting and -correcting capabilities of code \(C\).


Claim: condition for simultaneous error detection and correction#

It may be the case that we do not just want to (1) detect all errors up to a given weight, or perhaps (2) correcting all errors that lie within the theoretical capabilities of the code is too time consuming or expensive. It is possible to take a hybrid approach and correct errors of weight up to some usually small value \(t\) and at the same time detect errors of weight up to \(t + s\).

Claim

A code \(C\) can correct all errors of weight up to \(t\) and at the same time detect all errors of weight up to \(s + t\) iff \(d(C) \ge 2t + s + 1\).

\(\boxed{\exists P_{(C, s, t)} \iff d(C) \ge 2t + s + 1}\)

+-----------------------+
|                       |
|     +-----------+     |     +-----------+
|     |        t  |  s  |  1  |  t        |
|     |     +-----|-----|-----|-----+     |
|     |     u     |     |     |     v     |
|     +-----------+     |     +-----------+
|                       |
+-----------------------+

Informally this says that error correction costs about twice as much as error detection. For every bit you attempt to correct you lose two bits in the number of errors you can detect. That is because if you are employing error correction an error that pushes a code word \(u\) close enough to another code word \(v\) will cause the error processor to choose \(v\) rather than content itself with the statement that the received word is erroneous.

Proof

Consider an error processor for the given code \(C\) that works as follows: if \(w\) is received and there is a unique code word \(u\) with \(d(u, w) \le t\) then correct \(w\) to that code word. Otherwise send an error signal.

First Direction \(Q \implies P\)

Assume \(d(C) \gt 2t + s\). Then by the error correction condition our decoder will correct errors of weight \(\le t\) successfully.

Suppose that \(u\) is transmitted and \(w\) is received, where

\(t \lt d(u, w) \le t + s\) (thus \(s \ge 1\))

(the number of errors exceeds the error-correction capability of the code but does not exceed its error-detection capability)

The following says that the error processor will not correct the received word to any other code word. It says that if the received word were to fall within the error correction space of any other codeword, then the triangle inequality would violate our assumption about the minimum distance of the code.

Then by the triangle inequality a code word \(v\) with \(d(v, w) \le t\) would have

\(s \le d(u, v) \le 2t + s\)

\( \begin{aligned} d(u, w) &\le t + s \\ d(v, w) &\le t \\ d(u, v) \le d(u, w) + d(v, w) &\le 2t + s \\ \end {aligned} \)

Contradition: we assumed \(d(C) \gt 2t + s\).

Thus there is no such code word and the error processor will send an error signal.

Second Direction \(\lnot Q \implies \lnot P\)

Suppose that

\(d(C) \le 2t + s\)

If \(d(C) \le 2t\) then by the error correction condition \(C\) cannot correct all errors of weight \(\le t\). So we may assume that

\(d(C) \gt 2t\)

\(\textcolor{#50C878}{2t + 1 \le d(C) \le 2t + s}\)

Then by the triangle inequality there is never more than one code word at distance \(\le t\) from any received word \(w\).

We can see this in the following way.

\(t + t + 1 \quad \le \quad d(C) \le d(u, w) \le \quad d(u, v) + d(v, w)\)

If \(t \le d(u, v)\) then \(t + 1 \le d(v, w)\). In other words \(d(u, v)\) and \(d(v, w)\) cannot both be \(\le t\).

Let \(u\) and \(v\) be two code words with \(2t + 1 \le d(u, v) \le 2t + s\).

In other words, let the difference between \(u\) and \(v\) fall within the error detection capability of the code.

Divide the places in which \(u\) and \(v\) differ into two sets \(S\) and \(T\) with

\( \begin{rcases} \phantom{t \lt} |S| \le t \\ t \lt |T| \le t + s \\ \end {rcases} |S| \le t \lt |T| \le t + s \)

In other words

\(d(u, v) = |S| + |T| \le 2t + s\)

This makes sense since \(2t + s\) is the maximum difference between \(u\) and \(v\).

Define \(w\) so that \(w\) agrees with \(u\) and \(v\) outside \(S \cup T\), with \(u \in S\), and with \(v \in T\).

\( \begin{aligned} S &= \{ i \mid w_i = u_i \ne v_i \} && |S| = d(v, w) \\ T &= \{ i \mid u_i \ne v_i = w_i \} && |T| = d(u, w) \\ \end {aligned} \)

Then

\(d(v, w) \le t \lt d(u, w) \lt s + t\)

Thus if \(u\) is transmitted and \(w\) is received an error of weight \(\le s + t\) has occurred. Yet an error processor that corrects all errors of weight \(\le t\) will not send an error signal. Instead it will correct \(w\) to \(v\) because, as is seen, \(v\) is the only code word with \(d(v, w) \le t\). Thus the code cannot successfully detect all errors of weight up to \(s + t\).

\(\blacksquare\)

Let \(C\) be a code with block length \(n = 64\) and minimum distance \(d(C) = 10\).

\( \begin{aligned} 10 &\ge s + 1 \iff 9 \ge s && \text{error detection up to 9 (theoretical limit)} \\ 10 &\ge 2t + 1 \iff 9/2 \ge t && \text{error correction up to 4 (wasteful because only requires d(C) = 9)} \\ 10 &\ge 2t + s + 1 && t = 1, s + t = 8 \text{ (theoretical limit)} \\ &&& t = 2, s + t = 7 \text{ (theoretical limit)} \\ &&& t = 3, s + t = 6 \text{ (theoretical limit)} \\ &&& t = 4, s + t = 5 \text{ (theoretical limit)} \\ \end {aligned} \)

A coding scheme does not have to use the full capability of the code. Practical considerations may make it necessary to limit the operation of the error processor.




Shannon’s Theorem#

the theoretical optimum for average coding performance