Cache coherence protocols have been designed to eliminate data inconsistency in shared memory multiprocessor systems. There are two major coherence schemes, write_invalidate and write_update, and their variations have been proposed and evaluated.
But most coherence protocols including them are devised by analysis of write operation patterns, so they are blind to the effects of read operations in spite of the fact that reads after write or continuous reads could have much influences on the system performance.
We propose reader-initiated cache protocol introducing four new read primitives that are exploited to enhance the performance of shared memory multiprocessor systems. Those primitives are inserted properly instead of general read instructions into the parallel applications, and by them, total execution time and transferred messages of the applications can be reduced, that is, coherence overhead can be decreased.
The results of performance evaluation using execution-driven simulation show that the execution time declines by up to 12 percent if each primitive is placed adequately. The effectiveness of the primitives is explored through code analysis, which is assumed to be performed by an application programmer.