In this dissertation we present a construction method of BCH and RS encoder/decoder with special emphasis on minimizing power consumption and increasing the throughput per unit time to reduce the energy-delay product.
A new long polynomial division algorithm in finite field based on the lookahead of partial-remainder(LAPR), is proposed. Since our algorithm is based on partial-division on orthogonal group basis and lookahead technique exploiting the linearity in finite field arithmetic, it is possible to completely eliminate polynomial multiplications leading to highly increased throughput per unit time. The inherent regularity and feed-forward nature of our algorithm make it possible to be fully pipelined. When pipelined, its throughput is 1 quotient and 1 remainder per clock cycle regardless of the degree of dividend polynomial, which is orders of magnitude faster than the conventional architecture using LFSR(Linear Feedback Shift Register). An area efficient sequential architecture based on LAPR is also presented. Although, the throughput rate of sequential architecture is rather lower than that of the pipelined one, as far as the authors know, it is still higher than that of any division architecture ever reported. Those will be shown to be efficient, regular and easily expandable, hence, naturally suitable for VLSI implementation. We verified the general validity of the division algorithm based on LAPR by mathematical manipulation and simulation. To verify the relative performance of the proposed division architectures over the conventional one using LFSR, we designed three popularly used BCH/RS coding applications 1) (32,28) RS encoder, 2) (63,51) BCH encoder 3) syndrome generator for (63,51) BCH decoder construction in COMPASS ASIC development environment using 0.8 μm double metal CMOS technology. Experimental verification for three benchmark circuits show that at identical throughput, pipelined architectures based on LAPR consumes about 32, 65, 67 times smaller power respectively compared with conventional one using LFSR. The corresponding improvements of the sequential architectures based on LAPR are 14, 22, 28.
We suggest a construction method of 4 block pipelined multi error correction RS decoder which incorporating the superscalar dual GFCU(Galois Field Computation Unit). Employing the SSGFCU(Superscalar Galois Field Computation Unit), we can make the computation of the coefficients of the error locator polynomial systematically with maximum resource utilization and smaller latencies. By designing the SGU(Syndrome Generation Unit) using LAPR sequential division architecture, latencies of the SGU can be reduced to one half compared with the architecture using conventional LFSR one. And also, error magnitude evaluation and error correction was performed employing superscalar datapath with reduced latencies. The smaller latencies of the SSGFCU and SSEMECU (Superscalar Error Magnitude Evaluation and Correction Unit) means that those can be pipelined with the SGU and CESU (Chien's Error Location Searching Unit). Then, the overall latencies are just 4N constant clock cycles, where N is the length of the codeword, leading to the overall RS decoder can correct multi errors sustaining high throughput rate up to clock frequency(@50MHz, 50M Byte/Sec) without increasing internal clock speed. Since the suggested RS decoder architecture shows the high-speed decoding capability, it can be used for multi error correction high-speed and/or low-power dictated application such as Digital TV, DBS, DVD, DVCR, etc.