Cluster systems employ high speed interconnection network such as Myrinet to gain high performance and scalability. Because communication between host and netwok can be a bottleneck in such systems, recent researches implement highly efficient communication layers. But performance of those varies much. Furthermore, such communication layers have different APIs, which makes difficult to use common applications in the cluster system.
Our goal is to make a high-level communication layer which gives both high performance and ease-of-use. High performance can be obtained by using a high-performance low-level communication layer such as VMMC with few additional overhead, and ease-of-use can be achieved by supporting standard API such as MPI.
We implement MPI-VMMC by adding a communication sub-layer working between VMMC and MPI. This sub-layer is composed of a pair of send and receive queue. To gain high performance we implemented several optimization techniques: lazy pointer for reducing network traffic, transfer redirection for reducing receiver overhead, and separate control message for reducing sender overhead. As a result, MPI-VMMC achieved maximum 90.7Mbytes/sec bandwidth. This result is about 95% of the base layer's. MPI-VMMC is one of the fastest MPI implementations on Myrinet.