## Efficient Reed-Solomon encoding / erasure decoding

Reed-Solomon code are "optimal" code over the erasure channel in the sense that they do not require any overhead. However, their decoding algorithm are slower than the ones for Low density based code. But, on modern hardware, I believe the speed is sufficient for many applications and the "optimal" behavior can simplify many aspect of the implementation.

You can find here an open source implementation in C of various Reed-Solomon encoder and erasure decoder. The approach used is basically described in Efficient erasure decoding of Reed-Solomon codes but there is many more things involved in the actual implementation. I am currently writing an extended version of this paper that will explain most of the algorithms implemented here.

I believe that if someone is interested in using theses codes, many ideas can be found in the following sources in order to build the most efficient algorithm for the needed parameters. Right now there is actually three different stand alone archive for the different algorithms. I know this is not really clean, but I lack time to improve this.

reed_solomon.fast.tgz
reed_solomon.fermat.tgz (New version, faster and one initialisation bug corrected. Also include a fast product computation. It require C++ for the complex class but can be easily converted to C. I will do it when I have time.)

A brief description of what you will find inside follows. Each archive also comes with a README and a sample program to get an idea of how it works and of the encoding decoding speed on your architecture. An older version of some of the code can still be downloaded here.

### General overview

The implementation was done with forward error coding (fec) of Internet packets in mind. But we believe it can be used in any application where one wants to use systematic Reed-Solomon erasure coding.

Our Reed-Solomon code is defined by a few parameters. In general we use a finite field of the form GF(2^n_field) except for the FFT based algorithm that work on GF(2^16+1). We use a systematic Reed-solomon code of dimension K and length N. In our implementation, each packets is S bytes long and the first K packets contain the information. The code is such that we can recover the information from any set of K packets amongst the N.

We give in the following table some timing on an Intel Core 2 CPU 6320 1.86GHz for some algorithm and code parameters (K,N). The number represent a speed in MByte/s and correspond to the information size divided by the encoding time for the first number and the decoding time for the second. The field used is GF(2^16) except for the FFT version that work on GF(2^16+1) which gives rise to some problems that we will discuss later.

(128, 256) (512, 1024) (16384, 32768)
Karatsuba 12.26/6.725.68/2.970.77/0.39
FFT Niedermayer ~3.93~4.24~2.82
FFT lagrange 9.68/4.798.62/4.276.17/2.73
FWT normal 1.6/1.61.47/1.471.09/1.1
FWT sse2 5.97/6.105.68/5.763.79/3.94

Remark that these timing are just here to give an idea of the performance. There is many version of the algorithms and many option for a given set of parameters. For instance, the Karatsuba implementation is really fast here because it is especially designed for these rate 1/2 code. We could actually optimize similarly the FWT version to divide the encoding time by roughly two.

To give meaningful measure, we actually generate a random message of a few MB into memory and encode/decode it using many code blocs. Remark that if we do not work on GF(2^8) or GF(2^16), the packets have to be packed/unpacked into symbols before we can apply the functions we provide. Actually, in some places the timing is wrong if we use other finite field because I do not take into account this is the computation (I should correct this).

You will find here many quadratic algorithm for encoding and decoding. I don't use exactly the same algorithms as in the literature, but they are quite close. The code is defined on the finite field GF(2^n_field).

I implemented various method for the multiplication in the field. There is the direct approach, the approach using the exponential/logarithm table. An original approach using only one table and finally the XOR code approach presented in the paper "An XOR-Based Erasure-Resilient Coding Scheme" by J. Blo, M. Kalfane, M. Karpinski, R. Karp, M. Luby, and D. Zuckerman. The last two approaches are the fastest ones with some tradeoff with the packet size and the field size.

There is also various quadratic algorithm. There is an almost classical one, but you can also find an incremental version that do some work after each received packet. Finally, the fastest algorithm in many case is actually sub-quadratic and is based on a Karatsuba version of the fast algorithm.

### Fast archive (FWT)

In this archive, I implemented the fast algorithm described in the paper Efficient erasure decoding of Reed-Solomon codes. It work on any field of the form GF(2^n_field) and use only Fast Walsh Transform (FWT). This version is optimized for GF(2^16). There is also an sse2 version of the algorithm that is quite faster than the normal one.

I think this algorithm is a good choice for long Reed-Solomon code. It uses a bit of memory but it is really clean and everything fit nicely into 16 bits. Moreover it really makes use of the vectorial instruction which is more difficult to do with the finite field FFT based algorithms.

### Fermat archive (FFT)

The algorithm there is the one with the smallest asymptotic theoretical complexity (N log N). But the are based on FFT over a finite field, which is difficult to implement and a bit slow in general. They could work on any field but we implemented them only on GF(2^16+1) because we have here a really fast FFT in the field. Remark that 2^16+1 is the last known prime Fermat number.

Remark that this code is not as optimized as the other and is not completely usable in real life, because we use 32 bits per field symbol. If one want to use it, a method to avoid one symbol and code a field symbol into 16 bits need to be implemented. You can contact me to have information on this. Maybe the easiest way is to postprocess each packets to find a non used symbol and to include in the packet a field saying that this symbol is actually 2^16.

Remark finally that it is also possible to use a classical approach to get an O(N log N) decoding algorithm based on FFT. I will explain this in the new version of the paper. In the meanwhile, I have some slides on the subject, see this. An implementation of something similar by Michael Niedermayer can be found at this address. This is a really nice implementation of the standard error and erasure decoder of Reed-Solomon code but it can actually be simplified a lot when there is only erasures involved.