Adaptive granularity(AG) architecture which integrates bulk transfer into shared memory paradigm achieves high performance and exploit spatial locality with memory replication. But AG is based on sequential consistency and can not achieve maximal performance. In this thesis, we applied release consistency and used invalidation inheritance in order to exploit more parallelism. Trojan simulator was used for simulation and modified to support release consistency model. Performance was evaluated for FFT, LU and Radix. With release consistency, AG performed better than the original AG by 10%-30%. Although AG achieves high performance with remote data replication, data is replicated on local memory and eventually incurs cache miss. As another work, in order to reduce performance degradation due to this cache miss, prefetching techniques are used with hardware controlled and software controlled approaches. With hardware controlled prefetching, overall performance was improved marginally. Hardware controlled prefeching is limited to only locally replicated data and performance improvement was very small. With software controlled prefetching, performance was improved from 2.5% to 18% according to the characteristics of applications. Instead, software approach suffered from software overhead. As future research, we are going to suggest more efficient prefetching scheme for AG to reduce software overhead.