The distributed shared memory (DSM) across a network of workstations (NoW) gains wide interests due to good scalability and cost/performance empowered by easy programming paradigm. Among various implementation approaches, page-based DSM utilizes an existing paged virtual memory scheme for providing shared memory view. Page-based DSM systems, however, suffer from coherence overhead mainly due to false sharing since they use a large page as a coherence unit. The optimal page size is required to satisfy communication efficiency as well as computational locality which are dynamically affected by application characteristics. Therefore, a fixed-size page cannot satisfy various applications even if it is small as a cache line size.
At the same time, for some applications an invalidation-based protocol is preferred while for others an update-based protocol is more desirable. Even within a single application, different portion of shared data may be better handled by one protocol or the other. That means, a single protocol cannot satisfy various applications. Many previous researches use multiple coherence protocols according to a programmer's annotation. Unfortunately, it is very difficult for every programmer to understand dynamic behavior of applications. Consequently, a desirable DSM system should be able to dynamically maintain the page size and the coherence protocol to satisfy the sharing behavior of a given application without disturbing programmers.
In this thesis, a mechanism which can dynamically detect data sharing patterns and apply multiple coherence protocols is proposed. Data sharing patterns can be largely divided into a migratory fashion and a producer-consumer fashion. The proposed detection mechanism can distinguish the page of each pattern from the pages showing both characteristics regardless data is associated with a synchronization primitive or not. Three software-only coherence protocols, BCP (Buddy Coherence Protocol), ERP (Exclusive Read Protocol) and GUP (Group Update Protocol), are also proposed to optimize performance for each case.
BCP is devised to support multiple page sizes in case that multiple patterns are detected on a single page. In BCP, the address of a remote access and the address of the most recent local access is compared. If they are to the different halves of a page, BCP considers it as false sharing and demotes the page to two subpages of equal size. If two contiguous pages belong to the same node, BCP promotes two pages to a superpage to reduce the number of the following coherence activities.
ERP works for the migratorily shared page to reduce the number of write requests and invalidations. It does not make a copy but migrates a migratory page for a remote read request. GUP reduces the number of read requests for producer-consumer sharing. It updates the copies of the all would-be consumers when a remote read request happens.
Performance of the proposed mechanism is compared to the single writer / multiple writer write-invalidate protocol using various fixed-size pages. Our simulations show that the proposed mechanism reduces execution time by upto 87% for some application when a large page size is given and the single writer protocol is used. It also outperforms almost all the best cases of the single writer write-invalidate protocols using fixed-size pages. BCPs improve performance by 25% for some applications when compared against them. When compared to the multiple writer write-invalidate protocol especially devised to guarantee no false sharing, BCPs improve performance to 93% of the multiple writers' case for some application without any annotation.
The sharing pattern detection mechanism and the suggested protocols are simple enough to be easily implemented on existing systems, and do not preclude other DSM techniques such as relaxed memory models. Also the right size of coherence unit is dynamically determined by the system, and thus one of the most difficult decisions in DSM design is resolved. Though our approach is assumed to be implemented in page-based DSM, we believe that it will be effective for the systems in other categories of DSMs.