한국과학기술원 도서관

서지주요정보
(An) energy-efficient multiple-DNN training processor with active input-output dual zero skipping = 이중 희소성 응용 고효율 다중 DNN 학습 프로세서
서명 / 저자	(An) energy-efficient multiple-DNN training processor with active input-output dual zero skipping = 이중 희소성 응용 고효율 다중 DNN 학습 프로세서 / Sanghoon Kang.
발행사항	[대전 : 한국과학기술원, 2022].
Online Access	비공개원문

소장정보

등록번호

8039249

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DEE 22039

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Recently, many deep neural network (DNN) based services are moving towards the edge for on-device intelligence, which led to the growth of research in energy-efficient DNN accelerators. Furthermore, as the functionality of artificial intelligence (AI) advances, their architectures become more complex, incorporating multiple DNNs in a single AI model. A generative adversarial network (GAN) is an example that can perform advanced applications through its multiple-DNN architecture. This dissertation presents GANPU, an energy-efficient multiple deep neural network (DNN) training processor for GANs. It enables on-device training of GANs on performance-limited and battery-limited mobile devices, without sending user-specific data to servers, fully evading privacy concerns. Training GANs require a massive amount of computation, therefore it is difficult to accelerate in a resource-constrained platform. Besides, networks and layers in GANs show dramatically changing operational characteristics, making it difficult to optimize the processor's core and bandwidth allocation. For higher throughput and energy-efficiency, this paper proposed 3 key features. An adaptive spatio-temporal workload multiplexing is proposed to maintain high utilization in accelerating multiple DNNs in a single GAN model. To take advantage of ReLU sparsity during both inference and training, dual-sparsity exploitation architecture is proposed to skip redundant computations due to input and output feature zeros. Moreover, an exponent-only ReLU speculation algorithm is proposed along with its light-weight processing element architecture, to estimate the location of output feature zeros during the inference with minimal hardware overhead. Fabricated in a 65 nm process, the GANPU achieved the energy-efficiency of 75.68 TFLOPS/W for 16-bit floating-point computation, which is 4.85x higher than the state-of-the-art. As a result, GANPU enables on-device training of GANs with high energy-efficiency.

최근 딥 뉴럴 네트워크(DNN) 기반의 많은 서비스들의 엣지 기기에서 직접 구동되고 있으며, 이러한 경향에 기인해 에너지 효율적인 DNN 가속기 연구가 활발히 진행되고 있다. 더불어, 인공지능의 기능이 고도화 되면서, 여러 개의 DNN이 같이 집적 되는 등, 인공지능 모델의 구조의 복잡성이 증가하고 있다. 그러한 예시로 생성형 대립 신경망(GAN)이 있으며, GAN의 다중 DNN 구조를 통해 고도화 된 어플리케이션의 수행이 가능하다. 본 논문에서는 GAN을 위한 에너지 효율적인 다중 DNN 학습 프로세서(GANPU)를 제안한다. GANPU는 성능과 배터리가 제한된 기기 상에서 GAN 학습을 가능케 하여, 사용자 특정이 가능한 데이터를 서버로 보낼 필요가 없어 개인정보 유출 이슈에서 자유롭다. GAN의 학습은 수많은 연산을 필요로 하며, 자원이 제한된 환경에서 가속하는데 많은 어려움이 있다. 추가로, GAN 내부의 여러 네트워크 및 서로 다른 레이어의 연산 밀도 차이가 크기 때문에 프로세서의 코어 자원과 대역폭 자원을 최적화하여 할당하는데 큰 어려움이 있다. 높은 처리 속도와 에너지 효율성을 위해, 본 논문에서는 3가지 방안을 제시한다. 제안된 적응형 시공간 워크로드 분할법을 통해 다중 DNN을 가속하는 데 있어 높은 자원 가동률을 달성할 수 있다. DNN의 추론과 학습 모든 단계에서 ReLU 활성화 함수에서 기인한 희소성 활용을 위해, 이중 희소성 응용 아키텍처를 제안하였다. 이를 통해 입력 연산자와 출력 연산자에 존재하는 0 연산을 모두 뛰어넘어 에너지 효율성과 처리 속도를 향상시킬 수 있다. 또한 부동 소수점 기반 출격 희소성 예측 알고리즘과 연산 아키텍처를 제시하여, 적은 리소스 소모를 통해 출력 0의 위치를 찾을 수 있다. GANPU는 65 나노미터 공정에서 제작되었으며, 16 비트 부동소수점 연산에서 최대 75.68 TFLOPS/W 의 에너지 효율을 달성하였으며, 이는 기존 최고성능보다 4.85배 높은 효율이다. 결론적으로, GANPU는 높은 에너지 효율로 온-디바이스 GAN 학습을 가능하게 하였다.

서지기타정보

서지기타정보
청구기호	{DEE 22039
형태사항	vii, 125 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 강상훈 지도교수의 영문표기 : Hoi-Jun Yoo 지도교수의 한글표기 : 유회준
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 108-125
주제	Deep neural network Deep learning Neural processing unit DNN accelerator On-device intelligence Multiple-DNN acceleration Sparsity exploitation Generative adversarial network 딥 뉴럴 네트워크 딥 러닝 뉴럴 프로세싱 유닛 DNN 가속기 온-디바이스 인공지능 다중 DNN 가속 희소성 활용 생성형 대립 신경망

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서