한국과학기술원 도서관

서지주요정보
Deep robust face state estimation for intelligent systems = 지능형 시스템을 위한 강인한 얼굴 정보 시각 인지 방법
서명 / 저자	Deep robust face state estimation for intelligent systems = 지능형 시스템을 위한 강인한 얼굴 정보 시각 인지 방법 / Byungtae Ahn.
발행사항	[대전 : 한국과학기술원, 2018].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8032615

소장위치/청구기호

학술문화관(문화관) 보존서고

DRE 18004

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

These days, face-related tasks are very important in human-robot-interaction (HRI) areas such as humanoid robots, home service robots, remote control, user preference analysis, and advanced driver assistance systems (ADAS). The intelligent system monitors the driver's attention to the front, identifies whether the user of robot is ready to give orders to the robot, and tells where the customers are most looking. As a result, information about user's head pose and gaze is very important in the service robot industry. For the sake of this, the algorithm should be robust to various environments and processed in real time. In this dissertation, we present a deep learning based face detection, head pose estimation, and gaze estimation method. We also present the use of synthetic data to obtain a sufficient amount of data. The details of this dissertation consist of the following three topics. First, we present a multi-task deep neural network which contains multi-view face detection, bounding box refinement, and head pose estimation. We apply it to intelligent vehicle application scenarios to verify the proposed algorithm. Driver's inattention is one of the main causes of traffic accidents. To avoid such accidents, advanced driver assistance system that passively monitors the driver's activities is needed. In this dissertation, we present a novel method to estimate a head pose from a monocular camera. The proposed algorithm is based on multi-task learning deep neural network that uses a small grayscale image. The network jointly detects multi-view faces and estimates head pose even under poor environment conditions such as illumination change, vibration, large pose change, and occlusion. We also propose a multi-task learning method that does not bias on a specific task with different datasets. Moreover, in order to fertilize training dataset, we establish the RCVFace dataset that has accurate head poses. The proposed framework outperforms state-of-the-art approaches quantitatively and qualitatively with an average head pose mean error of less than 4º in real-time. The algorithm applies to driver monitoring system that is crucial for driver safety. Second, we extend the deep convolutional neural network for pose estimation to 3D model retrieval and pose estimation of industrial components. To achieve this, we propose a method to construct and utilize synthetic data in virtual space. To increase the quantity and quality of training data, we define our simulation space in the near infrared (NIR) band, and utilize the quasi-Monte Carlo (MC) method for scalable photorealistic rendering of manufactured components. Two types of convolutional neural network (CNN) architectures are trained over these synthetic data and a relatively small amount of real data. The first CNN model seeks the most discriminative information and uses it to classify industrial components with fine-grained shape attributes. Once a 3D model is identified, one of the category-specific CNNs is tested for pose regression in the second phase. The mixed data for learning object categories is useful in domain adaptation and attention mechanism in our system. We validate our data-driven method with 88 component models, and the experimental results are qualitatively demonstrated. Also, the CNNs trained with various conditions of mixed data are quantitatively analyzed. Finally, we propose a deep convolutional neural network for gaze estimation with free head pose. Prior to that, a gaze estimation method that does not require user calibration in a fixed head pose is presented. A typical gaze estimator needs an explicit personal calibration stage with many discrete fixation points. This limitation can be resolved by mapping multiple eye images and corresponding saliency maps of a video clip during an implicit calibration stage. Compared to previous calibration-free methods, our approach clusters eye images by using Gaussian Mixture Model (GMM) in order to increase calibration accuracy and reduce training redundancy. Eye feature vectors representing eye images undergo soft clustering with GMM as well as the corresponding saliency maps for aggregation. The GMM based soft-clustering boosts the accuracy of Gaussian process regression which maps between eye feature vectors and gaze directions given this constructed data. The experimental results show an increase in gaze estimation accuracy compared to previous works on calibration-free methods. Furthermore, the proposed head pose-free gaze estimation method uses only a small gray scale image without any specific equipment such as IR illumination device. The proposed 3D gaze estimation method differs from the existing methods estimating only the 2D coordinates on a specific screen. Since it estimates gaze direction in space, it is more widely used such as driver assistance system, psychological analysis, and marketing. In order to fertilize training dataset, we establish and release the synthetic dataset (SynFace) that has accurate head poses, gaze directions, and facial landmarks. The proposed method outperforms state-of-the-art methods with mean error of less than 4º.

오늘 날 휴머노이드 로봇, 가정용 서비스 로봇, 원격 조종, 사용자 기호 분석과 같은 human-robot-interaction (HRI) 분야와 지능형 자동차의 advanced driver assistance system (ADAS) 분야에서 사용자의 얼굴 관련 작업이 매우 중요하게 연구되고 있다. 지능형 시스템은 운전자가 운전 중 전방을 주시하고 있는지를 감시하고, 로봇의 사용자가 로봇에게 명령을 내릴 준비가 되어있는지를 알아보고, 그리고 손님들이 어디를 가장 많이 쳐다보는지 등을 알려준다. 이에 따라 사용자의 머리 자세나 시선에 대한 정보가 서비스 로봇 산업에서 매우 중요한 정보가 되었다. 그리고 이를 위해선 알고리즘이 여러 환경에 강인해야 하고 실시간으로 처리되어야 한다. 본 연구에서는 딥러닝 기반의 얼굴 검출, 머리 자세 추정, 그리고 시선 추정 방법을 제안한다. 또한 보다 충분한 데이터의 확보를 위해서 정확한 머리 자세, 시선 방향, 그리고 얼굴 특징점들이 있는 합성 데이터셋을 구축하고, 실제 영상과 합성 데이터의 도매인 적응 기법도 제안한다. 본 학위 논문의 자세한 내용은 아래의 3가지 주제로 구성된다. 첫 번째로, 멀티-뷰 얼굴 검출과 머리 자세 추정을 위한 다중 작업 심층 신경망을 제안한다. 우리는 이를 지능형 자동차 응용 시나리오에 적용하여 제안된 알고리즘을 검증한다. 주행 중 운전자의 부주의는 교통사고의 주요 원인 중 하나이다. 운전자의 활동을 상시적으로 감시하는 고급 운전자 보조 시스템이 존재한다면 이러한 사고들을 피할 수 있을 것이다. 본 논문에서는 단안 카메라로 머리 자세를 추정하는 새로운 방법을 제시한다. 제안 된 알고리즘은 작은 그레이 스케일 영상을 사용하는 다중 작업 학습 기반의 심층 신경망을 사용한다. 이 신경망은 조명 변화, 진동, 큰 자세 변화 및 가려짐과 같은 열악한 환경 조건에서도 얼굴을 검출하고 동시에 머리 자세를 추정한다. 우리는 또한 특정 데이터셋에 편향되지 않는 다중 작업 학습 방법을 제안한다. 그리고 훈련 데이터셋을 풍부하게 하기 위해서 실제 영상과 정확한 머리 자세로 이루어져 있는 RCVFace 데이터셋을 구축하였다. 제안된 프레임 워크는 실시간으로 4º 미만의 평균 오차로 정량적으로 그리고 정성적으로 최신 방법들의 성능을 능가한다. 이 알고리즘은 운전자 안전에 중요한 운전자 모니터링 시스템을 비롯한 지능형 시스템에 적용 될 수 있다. 둘째, 자세 추정을 위한 심층 신경망을 산업용 부품의 3D 모델 검색 및 자세 추정 작업으로 확장 응용한다. 이를 위해 산업 부품의 가상공간에서의 합성 데이터를 구축하고 활용하는 방법을 제시한다. 훈련 데이터의 양과 질을 높이기 위해 근적외선 (NIR) 대역에서 시뮬레이션 공간을 정의하고 부품의 사실적 렌더링을 위해 몬테카를로 (Monte Carlo) 방법을 사용한다. 이러한 합성 데이터와 상대적으로 적은 양의 실제 데이터로 심층 신경망 구조를 학습시킨다. 첫 번째 심층 신경망 모델은 부품에서 가장 차별적인 정보를 찾고 이를 세분화 된 형태 특성으로 산업 부품을 분류하는데 사용한다. 일단 3D 모델이 식별되면, 범주 특정의 신경망들 중 하나가 제 2단계에서 자세를 추정한다. 88가지 부품 모델을 사용하여 제시된 방법을 검증하고 실험 결과를 정성적으로 입증한다. 또한 혼합된 데이터의 다양한 조건으로 훈련된 심층 신경망을 정량적으로 분석한다. 최종적으로, 우리는 자유로운 머리자세에서 시선 추정을 위한 심층 신경망을 제안한다. 그에 앞서 고정된 머리 자세에서 사용자 보정이 필요 없는 시선 추정 방법도 제시한다. 전형적인 시선 추정 방법들은 많은 별개의 고정 점으로 이루어진 화면을 보면서 개별적인 보정 단계를 거치게 된다. 본 연구에서 제시된 방법은 비디오 클립의 감시 맵(saliency map)과 그에 대응하는 눈 모양의 영상들을 매핑하여 이 보정단계를 암묵적으로 해결할 수 있다. 이전의 보정단계가 없는 방법과 비교하여, 우리의 접근법은 보정 정밀도를 높이고 중복성을 줄이기 위해 Gaussian mixture model (GMM)을 사용하여 눈 영상을 클러스터링 한다. 눈 영상을 나타내는 눈 특징 벡터는 GMM과 함께 적절한 클러스터링이 수행되고 그와 함께 감시 맵도 집계가 된다. GMM 기반의 소프트 클러스터링은 regressor의 정확도를 높여준다. 이 추정기는 구성 데이터를 통해 눈의 특징 벡터를 시선 방향에 매핑한다. 실험 결과는 보정 없는 방법에 대한 이전 연구와 비교하여 시선 추정 정확도가 증가함을 보여주었다. 더 나아가, 제안된 자유로운 머리자세에서의 시선 추정 방법은 IR조명 장치 같은 특수한 장비 없이 작은 그레이 스케일 영상만을 입력으로 사용한다. 본 연구에서 제안된 3D 시선추정 방법은 특정 스크린 위의 2D 좌표만을 추정하는 기존의 방법들과는 달리 공간상에서의 시선 방향을 추정하기 때문에 운전자 보조시스템, 심리분석, 그리고 마케팅 등 그 활용 분야가 더 넓다. 또한, 서로 다른 데이터 도매인들 간의 특징 간격을 줄이기 위한 도매인 적응 방법을 제안한다. 제안된 방법은 도매인 적대적인 신경망 (domain adversarial neural network)를 사용함으로써, 실제 영상과 합성 데이터의 특징 공간을 공통된 하나의 공간으로 매핑한다. 보다 풍부한 훈련 데이터셋을 위해 정확한 머리 자세, 시선 방향, 그리고 얼굴 특징점들이 있는 합성 데이터셋 (SynFace)을 구축하였다. 제안된 방법은 평균 오차가 4º 미만으로 최신 방법들보다 우수한 성능을 보였다.

서지기타정보

서지기타정보
청구기호	{DRE 18004
형태사항	vi, 78p : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 안병태 지도교수의 영문표기 : In So Kweon 지도교수의 한글표기 : 권인소 수록잡지명 : "Real-time Head Pose Estimation Using Multi-task Deep Neural Network". Robotics and Autonomous Systems, v.103, pp.1-12(2018)
학위논문	학위논문(박사) - 한국과학기술원 : 로봇공학학제전공,
서지주기	References : p. 65-74

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서