서지주요정보
단어와 클래스 기반의 한국어 언어 모델링 = Word and class-based language modeling for Korean
서명 / 저자 단어와 클래스 기반의 한국어 언어 모델링 = Word and class-based language modeling for Korean / 김길연.
발행사항 [대전 : 한국과학기술원, 2002].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8013074

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 02003

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9008792

소장위치/청구기호

서울 학위논문 서가

MCS 02003 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Statistical language modeling (SLM) is the attempt to capture regularities of natural language. It estimates the probabilistic distribution of various linguistic units, such as words, sentences and whole documents. SLM is crucial for a large variety of language technology applications: speech recognition, document classification, information retrieval, POS (Part-of-speech) tagging, machine translation and many more. In this thesis, we construct word (morpheme) and class-based n-gram language models for Korean. We verify the effectiveness of these models by a thorough experiment based on about 20-million POS tagged text corpus. For word-based n-gram modeling, we compare Katz's backoff method with Kneser-Ney's smoothing technique. For class-based modeling, POS-based class model is compared to automatically clustered class models. Lastly, we combined word and class-based language models by linear interpolation. The result shows that Kneser-ney smoothing outperforms over widely used Katz's backoff technique. Automatically driven word class model is better than POS-based class model. The best model is the combined model. Our results lay the experimental foundation for statistical language modeling for Korean. It can be used various applications such as speech recognition and document classification. All these techniques including n-gram counting, smoothing, clustering and evaluation algorithms are implemented by myself. It is open to public domain named as KLM toolkit. Further investigation is necessary for resolving sophisticated linguistic information (dependency grammar, semantic coherence, etc) to statistical language model.

서지기타정보

서지기타정보
청구기호 {MCS 02003
형태사항 iv, 46 p. : 삽화 ; 26 cm
언어 한국어
일반주기 부록 - 1, KAIST 품사 태그 집합
저자명의 영문표기 : Kil-Youn Kim
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(석사) - 한국과학기술원 : 전산학전공,
서지주기 참고문헌 : p. 44-45
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서