한국과학기술원 도서관

서지주요정보
HPMR : Prefetching and pre-shuffling in shared mapreduce computation environment = 맵-리듀스 공유 사용자 환경에서의 프리패칭과 프리셔플링기법
서명 / 저자	HPMR : Prefetching and pre-shuffling in shared mapreduce computation environment = 맵-리듀스 공유 사용자 환경에서의 프리패칭과 프리셔플링기법 / Sang-Won Seo.
발행사항	[대전 : 한국과학기술원, 2010].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8021848

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 10048

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, $\emph{prefetching}$ and $\emph{pre-shuffling}$, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (High Performance MapReduce Engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.

맵리듀스는 머신러닝, 데이터마이닝, 과학 시뮬레이션 등의 대규모의 데이터 처리 응용프로그램에 쉬운 인터페이스를 제공해 병렬로 연산을 가능하게 하는 프로그래밍 모델이다. 구글이 처음 모델을 제안했지만, 구현체를 공개하지 않았다. 이에 야후, 아마존, 페이스북 등의 거대 인터넷 기업들을 중심으로 공동 개발 오픈소스 형태로 하둡이라는 맵리듀스 모델의 구현체가 등장했고, 현재까지 널리 쓰이고 있다. 보통 하둡을 이용해 페타바이트 정도의 대규모 데이터를 처리하는 클러스터를 구성하는데, 일반적으로 여러 사용자들이 리소스를 공유하는 환경이다. 그러나 하둡 자체가 일괄처리 목적으로 구현되어 있어서 사용자가 리소스를 공유하지 않는 다는 것을 가정한 것이 문제다. 결과적으로 공유하는 환경이 점차 일반화되는 상황에서 하둡으로는 맵리듀스의 성능을 보장하기 어렵다. 본 논문은 이러한 문제를 해결하기 위해 프리페칭과 프리셔플링 기법을 제안한다. 하둡 0.18.3에 이 두 가지 기법을 플러그인 형태로 구현해 기존 하둡 시스템과의 호환성을 유지한다. 본 논문에서 구현한 플러그인을 적용한 하둡 시스템을 HPMR(High Performance MapReduce Engine) 이라 부른다. 실제 야후의 클러스터에서 다양한 워크로드로 실험을 했으며 최대 73%까지 성능을 향상했다.

서지기타정보

서지기타정보
청구기호	{MCS 10048
형태사항	vi, 34 p. : 삽화 ; 26 cm
언어	영어
일반주기	저자명의 한글표기 : 서상원 지도교수의 영문표기 : Seung-Ryoul Maeng 지도교수의 한글표기 : 맹승렬
학위논문	학위논문(석사) - 한국과학기술원 : 전산학과,
서지주기	References: p. 32-34

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서