한국과학기술원 도서관

서지주요정보
Learning hierarchical representations for music classification = 음악 분류를 위한 계층적 표현 학습
서명 / 저자	Learning hierarchical representations for music classification = 음악 분류를 위한 계층적 표현 학습 / Jong Pil Lee.
발행사항	[대전 : 한국과학기술원, 2017].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8031306

소장위치/청구기호

학술문화관(문화관) 보존서고

MGCT 17016

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pre-trained convolutional networks separately and aggregate them altogether given a long audio clip. Finally, we put them into fully-connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms previous state-of-the-arts. We further show that the proposed architecture is useful in transfer learning. Furthermore, recently the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical signals as well but has been not fully explored yet. To this end, we propose sample-level deep convolutional neural networks which learn representations from very small grains of wave- forms (e.g. 2 or 3 samples) beyond typical frame-level input representations. Our experiments show how deep architectures with sample-level filters improve the accuracy in music auto-tagging and they provide results comparable to previous state-of-the-art performances for the Magnatagatune dataset and Million Song Dataset. In addition, we visualize filters learned in a sample-level DCNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency along layer, such as mel-frequency spectrogram that is widely used in music classification systems.

음악 오토태깅은 2D 오디오 스펙트로그램을 이미지 데이터로 간주하여 이미지 분류와 비슷한 방식으로 처 리되는 경우가 많다. 그러나 음악 오토태깅은 태그가 매우 다양하고 그 추상화 수준이 서로 다르다는 점에서 이미지 분류와 구별된다. 이 문제를 고려하여 다중 레벨 및 다중 스케일 기능을 포함하는 CNN (convolutional neural networks) 기반 아키텍처를 제안한다. 아키텍처는 세 단계로 교육된다. 첫째, 서로 다른 입력 크기를 갖는 CNN의 조합을 사용하여 로컬 오디오 특성을 추출하기 위하여 지도학습을 수행한다. 둘째, 사전 훈련 된 CNN 네트워크의 각 계층에서 오디오 특성을 각각 추출하고 한 곡의 오디오에서 추출한 오디오 기능을 모두 요약한다. 마지막으로, 우리는 이들을 DNN 네트워크에 넣고 태그에 대한 최종 예측을 한다. 우리의 실험에 따르면, 다중 레벨 및 다중 스케일 기능의 조합을 사용하면 음악 오토태깅에서 매우 효과적이며 제안된 방법은 이전의 최첨단 성능을 능가하는 성능을 보였다. 또한, 우리는 제안한 아키텍처가 전이 학습에도 유용하다는 것을 보여준다. 더 나아가, 최근에 깊은 CNN 네트워크를 사용하여 원시 데이터로부터 계층적 표현을 학습하는 접근법이 이미지, 텍스트 및 음성 도메인에서 성공적으로 연구되고 있다. 이 접근법은 음악 신호에도 적용되었지만 아직 완전히 연구되지는 않았다. 이를 위해, 우리는 전형적인 프레임 레벨 입력 표현을 인풋으로 갖는 네트워크를 넘어서매우작은파형입자(예: 2또는3샘플)를인풋으로가지는샘플수준의깊은CNN네트워크를제안한 다. 우리의 실험은 샘플 수준 필터를 가지는 아키텍처가 음악 오토태깅의 성능을 향상시키고 MagnaTagATune dataset 및 Million Song Dataset에 대한 이전의 최첨단 성능과 비교할 수 있는 결과를 제공함을 보여준다. 또한 각 계층의 샘플 수준 DCNN에서 학습한 필터를 시각화하여 계층적으로 학습 된 피쳐를 식별하고 음악 분류 시스템에서 널리 사용되는 멜 주파수 (mel-frequency) 스펙트로그램과 같이 레이어를 따라 로그 스케일 된 주파수에 민감하다는 것을 보여준다.

서지기타정보

서지기타정보
청구기호	{MGCT 17016
형태사항	iv, 28 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 이종필 지도교수의 영문표기 : Ju Han Nam 지도교수의 한글표기 : 남주한 수록잡지명 : "Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging". IEEE Signal Processing Letters, (2017) 수록잡지명 : "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms". Sound and Music Computing Conference, (2017)
학위논문	학위논문(석사) - 한국과학기술원 : 문화기술대학원,
서지주기	References: p. 23-26

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서