한국과학기술원 도서관

서지주요정보
Improving note-level singing transcription with phonetic information and note label refinement = 발음 정보와 음표 레이블 정제를 통한 노트 단위 가창 채보 개선
서명 / 저자	Improving note-level singing transcription with phonetic information and note label refinement = 발음 정보와 음표 레이블 정제를 통한 노트 단위 가창 채보 개선 / Sangeon Yong.
발행사항	[대전 : 한국과학기술원, 2024].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8042483

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DGCT 24007

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In this thesis, we present a novel method that leverages phonetic information to enhance the performance of note-level singing transcription models. Note-level singing transcription is a task that detects time-aligned musical notes from a given singing voice. Previous studies have utilized hidden Markov models with audio features extracted by signal processing-based methods to detect notes. However, the accuracy achieved by these methods has been relatively low. Recently, deep learning-based approaches have been proposed to improve performance, but due to the scarcity of well-annotated data on a large scale, the performance still lags behind that of other musical instruments with sufficient data. Notably, in the case of singing voices, generating a substantial amount of annotation data is challenging. Unlike a piano, there is no automated tool for recording notes, and synthesizing singing voices through virtual instruments is comparatively difficult. Moreover, manual annotation is both time-consuming and expensive due to the expressive characteristics of singing voices, and there are no definitive standard criteria for annotation. Considering these constraints, we conducted a comparative analysis of publicly available datasets annotated according to different standards and performed experiments to evaluate the impact of refined annotation on singing transcription. Based on the results of this analysis, we developed a dataset to facilitate a more accurate evaluation of singing transcription performance. Additionally, we propose a model that capitalizes on phonetic information to improve the performance of singing transcription and finally extend a proposed model to the scenario with background music by utilizing the pre-trained melody extraction model and speech recognition model with large-scale datasets. Our approach focuses on the fact that singing voices are associated with lyrics, and phonetic information is relatively less affected by the rich expressions of singing voices. We discuss the impact and limitations of integrating phonetic information into singing transcription and introduce methods to overcome these limitations, thereby further enhancing the accuracy of singing transcription.

이 논문에서는 발음 정보를 활용하여 노트 단위 가창 채보 모델의 성능을 개선하는 자동 채보 방법 론을 소개한다. 노트 단위 가창 채보는 주어진 가창 음성으로부터 정렬된 노트 단위 음표를 찾는 작업이다. 기존에는 가창 채보를 위해 신호처리 기반 알고리즘으로 음정과 같은 오디오 정보를 추출한 뒤 은닉 마르코프 모형을 활용하여 노트를 검출하는 방법론이 주로 사용되었으나 다소 낮은 정확도를 기록하였다. 최근에는 딥러닝 기반의 접근 방식이 제시되며 성능이 증가하였으나 적절한 데이터의 부족으로 인해 데이터가 충분한 다른 악기들에 비해 여전히 낮은 성능을 기록하고 있다. 특히 가창 채보를 위한 데이터의 경우 피아노와 같이 자동으로 노트를 기록할 수 있는 도구가 존재 하지 않고 가상 악기를 통해 합성하는 것도 상대적으로 어렵기 때문에 다량의 어노테이션 데이터를 생성하는 것이 어려우며 표현이 풍부하고 발음과 같은 변수가 다양한 가창의 특성상 수동 어노테이션에도 많은 시간과 비용이 소요되며, 이러한 어노테이션을 위한 표준적인 기준 역시 확실하게 제시되어 있지 않다. 이러한 점에 착안하여 우리는 기존에 각기 다른 기준으로 어노테이션된 공개 데이터셋들을 비교 분석하고 정밀하게 정제된 어노테이션이 가창 채보에 어떠한 영향을 주는지 성능 비교 실험을 진행하였다. 또한 이러한 분석결과를 바탕으로 더 정밀한 가창 채보 성능 평가를 위한 데이터셋을 제작하였다. 또한, 일반적인 가창 음성이 발음 정보를 가지고 있고 이러한 발음 정보는 가창의 풍부한 표현의 영향을 상대적으로 적게 받는다는 점에 집중하여 발음 정보를 활용하여 가창 채보의 성능을 향상하는 모델을 제안하고, 멜로디 추출 모델과 큰 규모의 데이터셋으로 학습된 음성 인식 모델을 활용하여 제안한 모델을 배경음악이 있는 상황에서도 동작할 수 있도록 확장하였다. 이를 통해 발음 정보가 가창 채보에 미치는 영향과 한계점, 그리고 그러한 한계점을 극복하여 가창 채보를 더 개선하는 방법에 대해 논한다.

서지기타정보

서지기타정보
청구기호	{DGCT 24007
형태사항	v, 90 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 용상언 지도교수의 영문표기 : Juhan Nam 지도교수의 한글표기 : 남주한 수록잡지명 : "A Phoneme-Informed Neural Network Model For Note-Level Singing Transcription". IEEE International Conference on Acoustics, Speech and Signal Processing, (2023) Including appendix
학위논문	학위논문(박사) - 한국과학기술원 : 문화기술대학원,
서지주기	References : p. 79-88
주제	Singing transcription Music information retrieval Phoneme recognition Deep learning Transfer learning 가창 채보 음악 정보 검색 음소 분류 딥러닝 전이 학습

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서