The efficiency of a parallel program is sensitive to the grain size and scheduling scheme. While their tradeoff is well understood, the effect of their interaction with underlying hardware structure is still under investigation. The role of cache memory becomes more important as the memory latency of shared memory multiprocessor increases. In this thesis, we investigate the effect of the grain size and scheduling scheme on the cache memory. For the measurement, a set of parallel programs are executed on an execution-driven simulator, and measured metricsinclude cache hit ratio and execution time. The simulation results indicate that for the algorithm we have studied synchronization overhead is not siginificant, and the shared bus is not a bottleneck of a shared-bus multiprocessor. For the task granularity, while smaller task granularity allows larger parallelism, it requries more synchronization overhead. While larger task granularity decreases the synchronization overhead, it limits the exploitation of the available parallelism.