한국과학기술원 도서관

서지주요정보
Singing melody extraction using multi-column deep neural networks = 다중 심층 신경망을 사용한 가창 멜로디 추출
서명 / 저자	Singing melody extraction using multi-column deep neural networks = 다중 심층 신경망을 사용한 가창 멜로디 추출 / Sangeun Kum.
발행사항	[대전 : 한국과학기술원, 2016].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8029981

소장위치/청구기호

학술문화관(문화관) 보존서고

MGCT 16020

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

While the music market has been growing, the need for new service has also been increasing, such as cover song identification and query by humming. These services use a melody to search songs and so extracting melody, particularly from singing voice, is important to implement the systems. In this thesis, we focus on algorithms to extract the singing melody from audio signals. Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this thesis, we present a classification-based approach for singing melody extraction using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The melody contour is inferred by combining the outputs of the networks. We conduct the Viterbi decoding based on hidden Markov model to capture long-term temporal information. Our system also includes a singing voice detector to select singing voice frames using an additional deep neural network. It is trained with labels of singing voice activity and the output of deep neural networks for melody extraction. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and part of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts.

음악 시장 규모가 성장하면서 새로운 음악 서비스의 필요성 또한 증가하고 있다. 연주되고 있는 곡의 제목 검색, 허밍으로 곡을 검색과 같은 서비스를 예로 들 수 있다. 이를 구현하기 위에서 멜로디를 추출하는 것이 중요하다. 이 논문에서는 특별히 가창 멜로디 추출에 대해 초점을 맞추고자 한다. 가창 멜로디 추출은 여러 음원이 섞인 다성 음악에서 사람이 부르는 노래의 음정 변화를 알아내는 것이다. 단성 음악에서 음고를 구하는 것은 어렵지 않지만, 다성 음악에서 멜로디를 추출하는 것은 여전히 해결하기 어렵다. 기존의 다성 음악 멜로디 추출 알고리즘은 주로 saliency 함수를 사용하거나, 목소리 음원을 따로 분리하여 멜로디를 알아내는 방법이 사용되고 있다. 이 논문에서는 다중 심층 신경망를 이용하여 분류방식의 접근방식을 통하여 멜로디를 추출 방법을 제안하고자 한다. 각 신경망은 노래 목소리의 음고 정보와 스펙트럼을 함께 학습한다. 심층 신경망의 학습 효과를 높이기 위해 학습 데이터의 음정을 변화시켜 전체 데이터 크기를 늘렸다. 이 때 음고는 각각의 심층 신경망에 따라 다른 음고 해상도를 가진다. 각각의 심층 신경망의 결과를 종합하여 높은 정확도와 해상도를 가진 결과를 얻을 수 있다. 그 결과를 은닉 마르코프 모델을 기반한 Viterbi 디코딩으로 후처리하여 시간 정보를 고려한 최적의 멜로디 곡선을 구한다. 이후에 독립적으로 가창 목소리 검출기를 사용하여 음원에서 가창 부분을 판단한 뒤, 예측한 멜로디에서 가창 부분만을 선택하여 최종 멜로디 곡선을 결정한다. 가창 목소리 검출기는 추가적인 심층 신경망으로 사용하였으며, 각 신경망은 스펙트럼, 멜로디 추출 결과, 그리고 목소리의 유무 정보를 학습 데이터로 사용한다. 멜로디 추출을 위한 심층 신경망은 RWC 데이터셋과 가창곡이 포함된 MedleyDB 데이터셋의 일부를 학습 데이터로 사용하였다. 시스템의 성능은 ADC2004, MIREX2005, MIR-1k 데이터셋을 이용하여 평가하였다. 여러 실험을 통해 점진적으로 멜로디 추출 정확도가 향상되는 것을 확인할 수 있다. 이 논문을 통해 다중 심층 신경망을 사용하여 새로운 데이터 기반 접근 방식으로 멜로디를 추출하는 알고리즘을 제안하며, 동시에 최신 기술의 결과와 비교해 본다.

서지기타정보

서지기타정보
청구기호	{MGCT 16020
형태사항	v, 39 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 금상은 지도교수의 영문표기 : Juhan Nam 지도교수의 한글표기 : 남주한
학위논문	학위논문(석사) - 한국과학기술원 : 문화기술대학원,
서지주기	References : p. 33-36

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서