The block matching algorithm is the most popular motion estimation in coding of image sequence. In this paper, we propose a VLSI architecture for implementing a recently proposed fast block matching algorithm, which is called the MRMCS. The proposed architecture consists of a basic unit based on a systolic array and two shift register arrays. And it covers a search range of -32 ~ +31. By using a basic unit repeatedly, we can reduce the number of gates. To implement the basic unit, we can select the one among various conventional systolic arrays by trading-off between speed and hardware cost.
In this paper, the architecture for the basic unit is selected so that the hardware cost as well as the size of internal memory can be minimized. The proposed architecture is fast enough for low bit-rate applications (frame size of 352×288, 30 frames/sec) and can be implemented by less than 20,000 gates. Moreover, by simply modifying the basic unit, the architecture can be applied for the higher bit-rate application of the frame size of 720×480 and 30 frames/sec.