한국과학기술원 도서관

서지주요정보
HiPhi-GAN : improving neural vocoder by fixing the phase constant = HiPhi-GAN : 위상 상수를 고정하여 신경망 보코더를 개선하는 방법
서명 / 저자	HiPhi-GAN : improving neural vocoder by fixing the phase constant = HiPhi-GAN : 위상 상수를 고정하여 신경망 보코더를 개선하는 방법 / Jaryong Lee.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040817

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MCS 23002

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Most modern neural vocoders generate a waveform from a mel-spectrogram, one of the acoustic features. Mel-spectrogram is information about the only magnitude, not phase constant. In other words, all mel-spectrograms obtained from x_ϕ ̅ are the same for all ϕ, when x_ϕ is phase transform by ϕ from the waveform x. Conversely, a neural vocoder that generates a waveform with only mel-spectrogram as input is confused in training because x_ϕ can be ground-truth for all ϕ. In this paper, we propose a universal vocoder consisting of a stage to avoid confusion by fixing ϕ to ϕ ̅ to guarantee the uniqueness (x_ϕ ̅ ) of ground-truth, and a stage to generate a full-band waveform according to the fixed ϕ ̅. Each stage is named phase synchronizer and waveform upsampler. The proposed neural vocoder HiPhi-GAN solves all the existing problems: slow inference speed, lousy audio quality at mid-high-band, and frequent phasing errors.

현대의 대부분 신경망 보코더는 음향적 특징 중 하나인 mel-spectrogram으로부터 파형을 생성한다. Mel-spectrogram은 위상 상수와 관련 없는 세기에 대한 정보이므로, x_ϕ가 파형 x에서 ϕ만큼 위상 변환한 값일 때, 실수 ϕ에 대해 x_ϕ에 대응하는 mel-spectrogram은 동일하다. 반대로 생각하면, mel-spectrogram만 입력 받아 파형을 생성하는 신경망 보코더는 실수 ϕ에 대한 모든 x_ϕ가 ground-truth일 수 있기 때문에 학습 단계에서 혼동한다. 본 논문에서는 ground-truth의 유일성(x_ϕ ̅ )을 보장하기 위해 ϕ를 ϕ ̅로 고정하여 혼동을 피하기 위한 단계와 고정된 ϕ ̅에 따라 전대역 파형을 생성하는 단계로 구성된 범용 보코더를 제안하고, 각 단계를 위상 동기화기 및 파형 업샘플러로 명명한다. 제안된 신경 보코더 HiPhi-GAN은 느린 추론 속도, 중고주파대역에서의 열악한 오디오 품질, 빈번한 위상 오류와 같은 기존 문제를 모두 해결한다.

서지기타정보

서지기타정보
청구기호	{MCS 23002
형태사항	iii, 29 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 이자룡 지도교수의 영문표기 : Daeyoung Kim 지도교수의 한글표기 : 김대영
학위논문	학위논문(석사) - 한국과학기술원 : 전산학부,
서지주기	References : p. 26-27
주제	Neural vocoder Universal vocoder Phase transform Generative adversarial network Diffusion model Real-time speech synthesis Real-time voice conversion 신경망 보코더 범용 보코더 위상 변환 생성적 적대 신경망 확산 모델 실시간 음성 합성 실시간 음성 변환

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서