한국과학기술원 도서관

서지주요정보
Accelerated resource scaling mechanisms for energy efficient deep learning cluster with power budget constraint = 딥 러닝 클러스터의 에너지 비용 절감을 위한 자원 스케일링 가속화 기법
서명 / 저자	Accelerated resource scaling mechanisms for energy efficient deep learning cluster with power budget constraint = 딥 러닝 클러스터의 에너지 비용 절감을 위한 자원 스케일링 가속화 기법 / Dong-Ki Kang.
발행사항	[대전 : 한국과학기술원, 2019].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8034704

소장위치/청구기호

학술문화관(문화관) 보존서고

DEE 19056

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Recently, demands of heterogeneous applications (web service processing, web streaming, image processing, big data analytics) that show various resource usage patterns (in views of processor-intensive, memory-intensive, and IO-intensive) have been rapidly increased by multiple service users, in high performance computing (HPC) clusters and data centers. Large-scale computing resources are being constructed for high-speed processing of parallel tasks that require a computation time of several days or more due to their millions of huge low data set. However, as the computing performance of clusters (e.g., data centers) rapidly improved, the associated power consumption also exploded. Currently, 1.3\% of the world 's total energy consumption, or 270TWh of energy use is derived only from data centers, and it is expected to rise to 8\% in 2020. This energy usage is a major obstacle to the expansion of data center infrastructure and the satisfaction of user service quality. Therefore, energy consumption cost for cluster (data center) operation is becoming the most important issue in the resource management in term of economic benefit. In modern HPC clusters, deep learning (DL) tasks for artificial intelligence services are emerging as major workloads in addition to existing HPC tasks and web service processing. Due to the high computing complexity of deep learning (DL) tasks, there is growing interest in GPU-enabled server racks for deployment of HPC clusters dedicated to fast data-rich task processing. Compared to conventional CPU devices, due to parallel processing capability, thousands of streaming processing (SP) cores contained in a single chip of a GPU device have a high processing speed for training of deep neural network (DNN) models that requires repetitive data processing. However, despite the improvements in their development efficiency in terms of flops-to-power, GPU-based clusters still generate non-negligible power consumption compared to CPU-based clusters. Therefore, a power consumption by GPU devices is recognized as a key component in managing the energy efficiency of the entire clusters. In this dissertation, we propose a elaborate and scalable power control algorithm and associated modeling approach to achieve the energy efficient GPU-enabled DL clusters. The first objective of the proposed approach is to enable the sophisticated and highly-scalable power control for GPU-enabled DL clusters give a limited power budget. Especially, we present a theoretical model transformation that can effectively separate the entire control problem so as to find the optimal control decision making in real-time for large-scale clusters under limited power budget. The second objective is to ensure service level agreements (SLAs) required by each DL service user while taking into account the dynamically vibrating electricity costs and renewable power capacity. The proposed algorithm can guarantee the stable energy consumption and acceptable service latency for large-scale clusters even with uncertainty of environmental variables. The technical contributions of this dissertation are following two parts. The first one is an accelerated power control algorithm for energy efficient deep learning processing in GPU-enabled data centers. We present a non-agnostic statistical modeling and energy cost minimization techniques for GPU-enabled DL clusters. We design the GPU architecture agnostic DL processing power consumption and processing performance modeling for heterogeneous GPU servers. By using this, we achieve the real-time online GPU model parameter estimation, by adopting a recursive least square (RLS) approach to our proposed system. Moreover, we propose a highly-scalable GPU power control algorithm based on dual acceleration. We utilize the Lagrangian dual decomposition technique to divide a large-scale power control problem into small sub-problems and enable a distributed and parallel computation for optimal control decision making. Especially, we use the Lipschitz-continuity so as to maximize the theoretical acceleration of dual-optimized convergence to optimal control solution. Our proposed distributed control architecture enables a run-time control optimization within a few seconds for large-scale GPU-enabled clusters containing hundreds of servers, by simply adding local power controllers. The second one is a MACRO and MICRO time scaled based resource management schemes for energy efficient deep learning services. We present a hierarchical time scale management for energy efficient data centers with low DRS switching overheads. We propose the multi time scales (MACRO and MICRO, MAMI) based approach for integrated DRS and FS to efficiently reduce the energy consumption of idle/active servers in data centers. Under MAMI based resource management, we can apply the sophisticated FS to data centers while minimizing DRS switching overheads in response to various environmental variables such as electricity market price, renewable power capacity, and given service quality requirements. Moreover, we propose a stochastic joint optimization technique for risk management of prediction error occurrence. We propose a multi-scenario based stochastic joint optimization method that mitigates the risk cost (undesirable energy consumption cost and unacceptable service latency cost) caused by uncertainty of environmental parameters such as electricity market price and renewable power capacity. We can derive a stable resource management decision that can efficiently cope with various possible real cases via the stochastic optimization approach based on multiple scenario generation technique. Moreover, we present the Logarithm based model transformation approach to convert the non-convex control optimization problem to convex one, so as to solve the problem by using conventional optimizer. In order to investigate the performance of our contributions in this dissertation, we deployed various DNN models such as AlexNet, ResNet, VGGNet and GoogleNet. We built a lab-scale testbed consists of multiple GPU servers based on NVIDIA Pascal architecture (GTX1060/1080), multiple power controllers and coordinator. The proposed statistical DL power and performance modeling method shows the high accuracy of our control decision making for invoked heterogeneous DL tasks, that is, it can minimize the deadline violation of each invoked DL task while ensuring the dynamic power budget constraint. Our dual acceleration approach based on Lipschitz-continuity guarantees a runtime optimal solution within a few seconds by using multiple local power controllers for large-scale clusters (having more than 200 GPU servers). Subsequently, we evaluate the performance of the proposed MAMI scale resource management and stochastic joint optimization approach for energy efficient large-scale data centers, by using real trace data retrieved from the Measurement and Instrumentation Data Center (MIDC) and the Federal Energy Regulatory Commission (FERC). We implemented the proposed system by using Keras deep learning framework. Our proposed MAMI/stochastic approach is able to achieve 25\% energy cost saving over existing meta-heuristics such as constraint genetic algorithm (GA) and reference point based power control methods, while ensuring the various quality of services from tight service latency to loosed one.

고성능 클러스터 및 데이터센터에서 여러 서비스 사용자로부터 다양한 자원 사용 패턴 (프로세서 클럭 지향 작업, 메모리 대역 지향 작업 및 스토리지 입출력 지향 작업) 을 보이는 응용들 (웹서비스 처리, 이미지 및 영상 처리, 빅데이터 처리) 의 요구량이 급속히 증가하고 있다. 수 밀리초이내에 처리해야 하는 사용자 트랜잭션 리퀘스트부터 수일 이상의 계산시간이 필요한 수 백만개 이상의 로우 데이터를 처리하는 병렬 태스크들을 효율적으로 처리하기 위한 대규모의 컴퓨팅 자원 구축이 이루어지고 있다. 그러나 데이터 센터의 컴퓨팅 성능이 급속히 향상됨에 따라 그에 따르는 전력 소비 역시 폭발적으로 증가하였다. 현재 전세계의 총 에너지 사용량의 1.3%, 즉 270TWh 의 에너지 사용량이 데이터 센터로부터 도출되며 오는 2020 년에는 8% 까지 치솟을 전망이다. 이러한 에너지 사용량은 데이터 센터 운용 비용의 막대한 증가를 초래하며 데이터 센터의 인프라 확장 및 사용자 서비스 품질 만족에도 주요한 장애물이 되고 있다. 따라서 데이터 센터 운용에서 에너지 효율성 문제는 기존의 주요 이슈인 보안과 가용성에 앞서 데이터 센터 관리자 측면에서 가장 중요한 이슈로 자리잡고 있다. 최근 데이터 센터에는 기존의 HPC 태스크 및 웹서비스 처리에 더하여 인공지능 서비스 제공을 위한 딥러닝 (deep learning, DL) 학습 태스크가 주요한 워크로드로 부상하고 있다. 딥러닝 학습과 같은 데이터-부유 (data-rich) 한 태스크에 대한 빠른 처리 속도를 위해 GPU 기반의 클러스터에 대한 관심이 증가하고 있다. 기존의 CPU 디바이스와 비교하여 GPU 의 단일 칩에 포함된 수천개의 스트림 프로세싱 (stream processing, SP) 코어는 반복적인 데이터 처리를 해야 하는 대규모의 딥 뉴럴 네트워크 (deep neural network, DNN) 모델 학습에 대해 고속의 병렬 처리 능력을 제공할 수 있다. 그러나 GPU 기반의 클러스터는 딥러닝 처리 성능의 급속한 향상과 함께 바람직하지 못한 전력 소비를 발생시킨다. 플롭수당 전력 소비 성능의 향상에도 불구하고 GPU 기반의 클러스터는 여전히 CPU 기반 클러스터 보다 높은 전력 소비를 발생시키고 있다. 따라서 GPU 디바이스에 의한 전력 소비는 전체 데이터센터의 에너지 효율성 관리에 있어 주요한 이슈로 자리잡고 있다. 데이터 센터의 에너지 소비량을 효율적으로 줄일 수 있는 방법은 크게 3 가지의 카테고리로 나눌 수 있다. 첫 번째 방법은 동적 서버 조정 (dynamic right sizing, DRS) 방식으로 데이터 센터를 구성하는 서버들 중 일부의 전원을 차단 (power-off) 하는 것이다. 데이터 센터의 서버 이용률 (server utilization) 은 평균적으로 10~30% 만이 활성화 (busy) 상태에 있으며 나머지 시간 대부분은 유휴 (idle) 상태를 유지한다. 동적 서버 조정 기법은 유휴 상태를 유지하기 위한 에너지 소비량을 효과적으로 줄일 수 있지만 너무 잦은 전원 변경시에는 상태 천이 오버헤드 (ON/OFF transition overhead) 로 인해 추가적인 에너지 소비 및 바람직하지 않은 서버 성능 저하 (downtime) 를 야기하는 단점을 가진다. 두 번째 방법은 프로세서 주파수 조정 (frequency scaling, FS) 방식으로 서버들의 클록 속도 (clock rate) 를 동적으로 변경하는 것이다. 이는 데이터 센터에 공급되는 전력 가격 (electricity price) 이 일시적으로 증가하거나 정의된 전력 예산 (power budget) 이 제한될 때 이에 대응하여 데이터 센터 상에서 운용되는 각 태스크에 할당되는 전력량을 조정할 수 있도록 한다. 주파수 조정 방식은 높은 미립도 (fine-granularity) 를 가지고 각 서버의 동적 에너지 소비량 (dynamic energy consumption) 을 줄일 수 있으며 상태 천이 오버헤드가 작다는 장점이 있지만 유휴 서버의 에너지 소비량에는 영향을 미치지 못한다는 단점이 있다. 세 번째 방법은 에너지 스토리지 시스템 (energy storage system, ESS) 기반 기법으로서 전력 가격이 저렴한 기간에 실시간으로 전력 구입량을 늘린다음 남는 전력을 저장하거나 혹은 신재생 에너지 생산기 (renewable generator) 로부터 공급받은 전력을 저장하는 방식이다. 이는 사용자의 서비스 요청량이 갑작스럽게 증가하거나 혹은 예상치 못한 전력 사용 비용이 발생할 때 에너지 비용을 효율적으로 낮출 수 있는 장점이 있지만, 동시에 에너지 스토리지 장비의 구축 비용이 비싸고 저장할 수 있는 전력 용량의 크기가 제한적이라는 단점이 존재한다. 우리는 데이터센터의 추가 모듈 구축이 필요없는 첫번째와 두번째 방법을 기반으로 에너지 효율적인 딥러닝 태스크 관리를 달성한다. 본 논문에서는 딥러닝 클러스터 기반의 데이터 센터에 대해 에너지 효율적인 딥러닝 학습 태스크 및 서비스를 제공하기 위한 정교하고 확장성 있는 에너지 소비량 제어 모델링 및 알고리즘을 제안한다. 제안하는 기법의 첫번째 목표는 제한된 전력 예산하에서 GPU 기반의 딥러닝 클러스터를 위한 정교하면서도 확장성있는 전력 제어가 가능하게 하는 것이다. 특히 전력 예산을 공유하는 대규모의 서버에 대해서 실시간으로 최적 제어가 가능하도록 문제를 효과적으로 분산할 수 있는 이론적 접근을 제안한다. 두번째 목표는 동적으로 진동하는 전기 비용 및 신재생 에너지의 전력 용량을 고려하면서도 각 딥러닝 서비스 사용자가 요구하는 서비스 협약 수준 (service level agreement, SLA) 을 보장할 수 있도록 하는 것이다. 제안 기법은 환경 변수의 예측 불확실성 (uncertainty) 하에서도 안정된 에너지 소비량 및 합리적인 서비스 지연시간을 보장할 수 있다. 본 논문의 기술적 기여점은 다음과 같다. 1. 데이터 센터에서의 에너지 효율적인 딥러닝 처리를 위한 가속화된 분산 전력 제어 기법 - GPU 디바이스 비종속적 (non-agnostic) 인 통계적 모델링 및 에너지 비용 최적화 기법 : 서로 다른 전력 소비 및 처리 성능 특성을 보이는 이종의 GPU 서버와 딥러닝 태스크에 대해 특정 GPU 아키텍쳐 모듈에 종속적이지 않으면서 정교한 전력 소비 / 성능 레벨을 도출할 수 있는 모델링을 디자인하였다. 또한 재귀 최소 제곱 (recursive least square, RLS) 방법을 기반으로 실시간적인 온라인 (on-line) 모델 프로파일링을 가능하게 하였다. - 듀얼 가속화 (dual accelerated) 기반의 확장성 있는 GPU 전력 제어 기법 : 라그랑지안 듀얼 분산 (Lagrangian dual decomposition) 기법을 제안하여 대규모의 전력 제어 문제를 작은 규모의 부 문제로 나누고 이를 분산 계산할 수 있도록 하였다. 또한 립쉬츠 연속성을 기반으로 이론적인 듀얼 최적화 수렴 속도를 최대화하였다. 제안한 분산 구조는 단순히 지역 제어기를 추가하는 것으로 수백대 이상의 GPU 서버 규모에 대해 수초이내의 런타임 제어 최적을 가능하게 하였다. 2. 에너지 효율적인 딥 러닝 서비스를 위한 거시 및 미시적 스케일 기반의 자원 관리 기법 - 동적 자원 조정 오버헤드 최소화를 위한 거시/미시적 (MACRO/MICRO) 스케일 관리 기법 : 유휴 및 활성화 상태인 여러 GPU 서버의 에너지 소비량을 효율적으로 절감할 수 있는 통합화된 동적 자원 조정 (DRS) 및 주파수 스케일링 (FS) 기법을 제안하였다. 거시/미시적 스케일 관리를 기반으로 주어진 사용자 서비스 요구사항에 대응하여 동적 자원 조정 상태천이 오버헤드를 최소화하면서도 정교한 주파수 스케일링을 적용할 수 있도록 하였다. - 예측 에러 위험 관리를 위한 스토캐스틱 최적화 (Stochastic joint optimization) 기법 : 진동하는 전기시장비용 (electricity market price) 및 신재생 에너지 용량 (renewable energy capacity) 의 불확실성에 의해 도출되는 데이터 센터 운용 비용 증가 및 서비스 품질 위반 정도를 최소화할 수 있는 다중 시나리오 기반의 스토캐스틱 최적 기법을 제안하였다. 휴리스틱 기반의 시나리오 생성 기법을 통해 발생 가능한 여러 예측량에 효율적으로 대응할 수 있는 자원 관리 결정을 도출할 수 있도록 하였다. 본 논문에서 제안하는 기법의 성능을 평가하기 위하여 우리는 AlexNet, ResNet, VGGNet 그리고 GoogleNet 과 같은 다양한 DNN 모델을 학습시켰다. 우리는 엔비디아 파스칼 아키텍쳐 (GTX1060/1080) 기반의 다중 GPU 서버와 다중 전력 제어기 / 중개자로 구성된 랩스케일의 테스트베드를 구축하였다. 제안한 통계적 딥러닝 모델링 기법은 미리 프로파일링 되지 않은 인보크드 딥러닝 태스크에 대해서 그리고 동적으로 변하는 전력 예산에 대해서도 높은 수준의 제어 정확도를 보이며 각 태스크의 데드라인 위반을 최소화하였다. 륍쉬츠 연속성 기반의 듀얼 가속 기법은 200 대 이상의 GPU 서버에 대해서 소수의 로컬 제어기만으로 수초이내의 런타임 최적해 도출을 보장하였다. 또한 우리는 U.S. Measurement and Instrumentation Data Center (MIDC) 와 Federal Energy Regulatory Commission (FERC)로부터 전기시장비용과 신재생 에너지 용량에 대한 대규모의 실 데이터를 수집하여 제안된 기법의 에너지 절감 성능 평가를 수행하였다. 우리는 Keras 기반의 딥러닝 프레임워크를 기반으로 제안 기법을 구현하였으며 다양한 서비스 품질 요구 수준을 위반하지 않으면서도 기존의 메타휴리스틱스 기법 대비 25 퍼센트의 에너지 비용 절감을 달성하였다.

서지기타정보

서지기타정보
청구기호	{DEE 19056
형태사항	ix, 148 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 강동기 지도교수의 영문표기 : Chan-Hyun Youn 지도교수의 한글표기 : 윤찬현
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 132-144

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서