한국과학기술원 도서관

서지주요정보
음성 합성 시스템을 위한 심층 오토 인코더 기반 스펙트럼 포락선의 저차원 표현 = Reduced dimensional representation of spectral envelope using deep auto-encoder for speech synthesis
서명 / 저자	음성 합성 시스템을 위한 심층 오토 인코더 기반 스펙트럼 포락선의 저차원 표현 = Reduced dimensional representation of spectral envelope using deep auto-encoder for speech synthesis / 최희진.
발행사항	[대전 : 한국과학기술원, 2018].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8032110

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 18091

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

This paper proposes a deep auto-encoder structure to extract a robust spectral feature for speech synthesis. Conventional mel-cepstral analysis efficiently represents low-band speech, but it is difficult to reflect full-band speech. When converting from MGC to spectral envelope, full-band speech cannot be reconstructed completely and information loss is greater when analyzing speech at high sampling frequency. In order to solve these problems, we propose a spectral feature modeling method that replaces the mel-cepstral analysis and the bark-cepstral analysis. The technique allows us to compress the low-dimensional feature from high dimensional spectral envelope without degradation for full-band speech in a data-driven way. The proposed spectral feature is a vector representation that contains the energy value of the spectral information and the bottleneck features of the auto-encoder. We present various experimental analysis processes and results for finding the optimized auto-encoder structure and data preprocessing method for extracting low-dimensional spectral feature required for speech synthesis system. Experimental results showed that an analysis-by-synthesis using the proposed auto-encoder has lower reconstruction error of spectral envelope than conventional mel-cepstral analysis in narrow-band as well as full-band. As a result of comparing the performance of the LSTM-based speech synthesis system, the proposed spectral feature produces more natural synthesized speech. The results of the preference test showed that the difference between the proposed method and mel-cepstral analysis more effectively in high sampling frequency speech and we found that the mel-cepstral analysis has larger compression loss in the high band. Therefore, we confirmed that the proposed method improves the quality of synthesized speech by using a data-driven approach that preserves full-band spectral information.

본 논문에서는 통계적 매개변수 음성 합성 시스템을 위한 스펙트럼 특징 벡터를 추출하는 심층 오토 인코더 구조를 제안한다. 기존의 멜 캡스트럼 분석은 낮은 대역의 정보를 효율적으로 나타내지만 높은 대역의 정보를 반영하기 어렵다. 특히, 높은 샘플링 주파수의 음성을 분석할 때 정보 손실이 더욱 크다. 이러한 문제를 해결하기 위해 본 논문에서는 멜 캡스트럼 분석 및 바크 캡스트럼 분석을 대체하는 스펙트럼 특징 벡터 모델링 방법을 제안한다. 이는 스펙트럼의 모든 주파수 대역을 압축하여 저차원 스펙트럼 특징 벡터를 표현한다. WORLD 보코더를 이용하여 얻은 고차원의 스펙트럼 정보는 심층 오토 인코더를 통해 강인한 저차원 중간 특징 벡터로 압축되며 제안된 스펙트럼 특징 벡터는 오토 인코더의 병목(bottleneck) 특징 벡터에 스펙트럼 정보의 에너지 값을 포함한 벡터 표현이다. 음성 합성 시스템에 요구되는 저차원 스펙트럼 특징 벡터를 추출하기 위해 최적화된 오토 인코더 구조와 데이터 전처리 방법을 찾기 위한 다양한 실험 분석 과정과 그 결과를 소개한다. 기존 멜 캡스트럼 분석 방법과 제안한 방법을 이용하여 음성을 분석-재합성한 결과 멜 캡스트럼 분석 방법보다 제안한 방법이 높은 대역뿐만 아니라 낮은 대역에서도 더 좋은 복원 결과를 보여준다. 또한 멜 캡스트럼과 제안한 방법으로 추출한 저차원 스펙트럼 특징벡터를 이용하여 LSTM 모델 기반 음성 합성 시스템의 성능을 비교한 결과 제안한 방법이 더 자연스러운 합성음을 생성하였다. 높은 샘플링 주파수의 음성에서 제안된 방법과 멜 캡스트럼 분석의 선호도 평가 결과 더 큰 차이를 보이며 낮은 대역의 정보를 효율적으로 나타내는 멜 캡스트럼 분석 시스템이 높은 대역에서 압축 손실이 더욱 크다는 것을 보여준다. 이를 통해, 제안한 방법은 모든 대역 스펙트럼 정보를 유지하며 그대로 압축하는 데이터 기반 접근 방법으로 합성 음성의 품질을 향상시킴을 확인하였다.

서지기타정보

서지기타정보
청구기호	{MEE 18091
형태사항	iv, 60 p. : 삽화 ; 30 cm
언어	한국어
일반주기	저자명의 영문표기 : Heejin Choi 지도교수의 한글표기 : 한민수 지도교수의 영문표기 : Minsoo Hahn
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학부,
서지주기	참고문헌 : p. 56-58

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서