서지주요정보
Deep neural network for document clustering and speech recognition = 문서 클러스터링과 음성인식을 위한 심층신경망 연구
서명 / 저자 Deep neural network for document clustering and speech recognition = 문서 클러스터링과 음성인식을 위한 심층신경망 연구 / Hyungbae Jeon.
발행사항 [대전 : 한국과학기술원, 2016].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8029767

소장위치/청구기호

학술문화관(문화관) 보존서고

DBIS 16012

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

We address in semantic language modeling, deep neural network and spontaneous speech recognition. Conventional language model uses very limited history information. However natural language has an intension of speaker that extend over full sentence. To find intension of speaker, semantic analysis is openly used. Recently, there have been studies investigating the use of semantic analysis in language modeling. We have hypothesis that if we combine semantic information with original lexical information, then we can expect improvement of language model. And we want to model long distance dependency of semantic information. We employ the information provided by a semantic analyzer to enhance the language model used in automatic speech recognition system. By using semantic information, we can bias the recognizer towards sentences that are more meaningful within our domain. We introduce several ways to use semantic information in language modeling. We used a shallow semantic parser to extract a concept sequence from a sentence. We design phrasal context free grammars (CFG) to define semantic concept. To model joint probability of semantic information, we employ maximum entropy modeling that tightly integrates lexical and semantic features to form a unified semantic language model. Secondary, Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis (LSA), non-negative matrix factorization (NMF), and LDA with n-gram counting. Thirdly, Latent Dirichlet Allocation (LDA) was adopted for efficient layer-by-layer pre-training of deep neural networks, and applied to document classification tasks. Starting from the word-document matrix, the first layer learned the topic representation of the documents by a generative LDA model, and latter was converted into an approximate feed-forward network by pseudo-inverse of the learned word-topic matrix. The rectified linear unit (ReLU) was incorporated to generate non-negative output activations. Then, additional LDA-based layers were stacked from this output activations. Single or two-layer feedforward networks were added at the end, and trained by a supervised learning algorithm. Then, the whole networks were trained again for fine tuning. The LDA-based initialization of deep neural networks was applied to a document classification task with 10 different random initializations. In comparison with other initializations such as random, stacked auto-encodes, and stacked single hidden-layer with supervised learning, the LDA-based initialization demonstrated much better performance with both smaller mean false recognition rates and smaller standard deviations. Finally, We study robust speech recognition for various speaking rate. We propose new deep neural network structure that is designed for robust speech recognition. We propose max-pooling layer at DNN structure and we employ weight coupling constraint at max-pooling layer for robustness of various speaking rate. Proposed DNN model is used at conversational telephony database (switchboard DB). We study on performance of speaking rate invariance.

이 논문에서는 의미 언어모델 학습, 심층신경망, 자연어 음성인식에 대해 연구하고자 한다. 일반적인 언어모델은 의미정보, 화자의 의도 정보를 모델링 할 수 없다. 사람 간의 대화를 인식하거나, 기계와 사람의 의사소통을 자연스럽게 하기 위하여 의미 정보를 모델링하고 이해할 필요성은 분명하다. 본 논문에서는 의미 분석을 통해 의미정보를 부착하고, 의미정보를 언어모델에 통합하여 언어모델의 성능 개선에 대해 살펴본다. 두번째로 Latent Dirichlet Allocation (LDA) 방법에 기반한 언어모델 적응에 대해서 살펴본다. 문서를 주제, 토픽 단위로 자동 분류하고, 도메인 단위 언어모델을 학습한다. 인식 단계에서 음성인식 결과 만을 가지고 기존의 도메인 유사성을 계산하여 rescoring 용 언어모델을 학습한다. 텍스트 데이터를 다루는 영역에서 LDA에 기반한 DNN 초기화 방법에 대해 연구해 본다. 마지막으로 자연어 음성인식에서 다양하게 나타나는 문제점들 중 발화속도에 따른 성능 저하를 해결하고자 새로운 DNN 구조를 제안한다.

서지기타정보

서지기타정보
청구기호 {DBIS 16012
형태사항 v, 52 p. : 삽화 ; 30 cm
언어 영어
일반주기 저자명의 한글표기 : 전형배
지도교수의 영문표기 : Soo-Young Lee
지도교수의 한글표기 : 이수영
수록잡지명 : "Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation". ETRI Journal, v. 38, no. 3, 487-493(2006)
학위논문 학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과,
서지주기 References : p. 46-50
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서