Single Disk I/O(SDIO) can bring into cluster computing environment fast and reliable distributed storage. The data distribution scheme used for SDIO was mainly RAID level 0 or RAID level 1 for now. By using RAID level 5 scheme for SDIO instead of RAID level 0 or 1, one can achieve good performance, good reliability and high disk space utilization simultaneously. But RAID level 5 also has some weak points. Those are network overhead problem, small write problem, and write sharing problem.
We suggest four approaches to solve those problems, or at least alleviate them. The names of the approaches are Datanode-based Parity Computation, Distributed Parity Computation, Parity Queuing, Parity Cumulating. The first two of them manage network overhead problem. Eventually Distributed Parity Computation minimizes the number of network messages to two messages by dividing parity computation into two computation, one for data node and the other for parity node. Parity Queuing attacks small write problem by delaying two of four disk access operations until disk idle time. Parity Cumulating improves buffer requirement of Parity Queuing and reduces operations for idle time.
We experimented each approaches for sequential data and for random data. The approaches are compared to Naive Method. Naive Method is the simplest method designed without any optimization techniques. As a result, Distributed Parity Computation reduced disk I/O operation time by 12% in respect to Naive Method. Also, Parity Queuing adds more reduction of disk I/O operation time by 11% in respect to Distributed Parity Computation. Parity Cumulating reduces parity buffer size of Parity Queuing. With sequential data, the parity buffer size is reduced by 33% in respect to Parity Queuing. With random data, the parity buffer size is reduced by 3%.