This thesis describes a three-stage pipelined floating point unit, especially floating point arithmetic logic unit(FALU) and floating point reciprocal unit(FREC), for a Very Long Instruction Word(VLIW) Digital Signal Processor (DSP) which is targeted on 3D Graphics.
FALU has two operation modes, Twin mode and Normal mode. So, the hardware of FALU is splittable for Twin mode. FREC must have some ROM tables. This thesis covers the optimization of ROM tables based on execution clock cycle.
As a result, FALU can achieve maxmum 2 times performance by additional 15% hardware in single precision. With small hardware for FREC, division operation can be about 2 times faster.