한국과학기술원 도서관

서지주요정보
심층 역 강화 학습 = Deep inverse reinforcement learning
서명 / 저자	심층 역 강화 학습 = Deep inverse reinforcement learning / 홍정표.
발행사항	[대전 : 한국과학기술원, 2016].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8029240

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 16039

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

We consider an Inverse Reinforcement Learning (IRL) problem of finding the optimal reward function given the expert's trajectories under the Markov Decision Process environment assumption. Current state-of-the-art IRL algorithms use feature functions in order to represent the environment and assume the reward function in hunt is linear in the feature space. This assumption is easily violated in complex problem domains. A typical example is when states are represented in images, like an Arcade Learning Environment(ALE). In this paper, we introduce a novel algorithm that combines convolutional neural networks with relative entropy IRL. Our main contribution is that the algorithm autometically archives feature con-struction for the reward function from raw images. Our results on ALE domain show that the algorithm can successfully recover the reward functions of both the expert's trajectory and of the human player preferences.

역 강화 학습 문제는 마코프 의사 결정 과정을 가정하는 환경에서, 주어진 전문가의 행동 정책에 대해 이를 잘 설명해주는 보상 함수를 찾는 문제이다. 현재 알려진 역 강화 학습 알고리즘들은 환경을 나타내기 위해 특성 함수를 가 주어져 있으며, 또한 보상 함수가 이러한 특성 함수 공간에 선형이라 가정한다. 하지만 문제가 복잡해질 수록, 예를 들면 Arcade Learning Environment(ALE) 같은 환경 상태가 화상으로 나타나는 경우에 대해선 이러한 가정은 문제점을 나타내게 된다. 본 논문은 콘볼루션 신경망과 상대 엔트로피 역 강화 학습 알고리즘을 조합하여 이러한 문제를 해결해보고자 한다. 심층 역 강화 학습 알고리즘은 이미지로부터 보상 함수에 대한 자동으로 특성을 구성한다. ALE에서의 실험 결과는 알고리즘이 효과적으로 전문가의 행동 정책과 사람의 선호를 보상 함수의 형태로 복구할 수 있음을 나타낸다.

서지기타정보

서지기타정보
청구기호	{MCS 16039
형태사항	iv, 20 p. : 삽화 ; 30 cm
언어	한국어
일반주기	저자명의 영문표기 : Jung-Pyo Hong 지도교수의 한글표기 : 김기응 지도교수의 영문표기 : Kee-Eung Kim
학위논문	학위논문(석사) - 한국과학기술원 : 전산학부,
서지주기	참고문헌 : p. 17-18

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서