서지주요정보
확률 의존문법 학습 = Probabilistic dependency grammar induction
서명 / 저자 확률 의존문법 학습 = Probabilistic dependency grammar induction / 이승미.
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009905

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 99002

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9006217

소장위치/청구기호

서울 학위논문 서가

DCS 99002 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Syntax analysis is a primary part of most natural language processing systems. Successful syntax analysis requires syntax grammar to be effective and broad-coverage. Because manual acquisition of such a grammar is known to be very difficult and time consuming, there has been many efforts to automatically learn a grammar from a large body of text corpus. Most of the efforts so far aim at learning of phrase structure grammar and partially succeeded. However, languages with (partial) free word order characteristic such as Korean are known to be described well in dependency grammar rather than in phrase structure grammar. But, efforts toward automatic acquisition of dependency grammar are rare. This thesis presents a mechanism of corpus-based automatic induction of probabilistic dependency grammar (PDG). The key part of the work is the probabilistic parameter reestimation algorithm of dependency grammar. Inside-outside algorithm is widely used as a probabilistic parameter reestimation algorithm for context-free phrase structure grammars. This is not directly applicable to PDG because they are not based on constituents but on dependency relations between arbitrary two words. This thesis presents a reestimation algorithm which is a variation of the inside-outside algorithm adapted to PDG. The non-constituent objects, complete-link and complete-sequence, are defined as basic units for dependency structures, and the inside-outside probabilities of the objects are developed. The algorithm reestimates the probabilities of inter-word dependencies through the computation of the inside-outside probabilities of the non-constituent objects. It utilizes a CYK-style chart and has O($n^3$) time complexity, where n is the number of words in the sequence to be processed. We experimented the PDG reestimation algorithm on learning of Korean probabilistic dependency grammar. It showed that the reestimation algorithm learned a stable grammar from an initial grammar with a reduction of 65% to 78% in grammar size depending on the size of training corpus. The correctness of the learned grammar was tested for unseen test sentences. For the best-1 parses of the test sentences, the grammar showed average 62.82% of parsing accuracy. We also present a robust best-n parsing algorithm for PDG. PDG best-n parsing is based on dynamic programming method utilizing CYK-style chart. It constructs the best-n complete-links and complete-sequences, from bottom up and left-to-right. Parsing robustness is guaranteed by smoothing the observed dependency probabilities to prevent any dependency relation from being assigned zero probability. The automatically learned dependency grammar can be applied to language modeling. Language modeling based on dependency grammar can be more effective than the previous n-gram approach and the phrase structure grammar-based approach. Dependency grammar can represent long distance dependencies through the hierarchical structure from head-dependent relation between words, which is not supported in n-gram models. Acquisition of an effective dependency grammar can be automated by the proposed PDG reestimation algorithm. Dependency grammar-based language modeling can avoid the training corpus overfitting problem of n-gram models, and can be less affected by the structural-sparseness problem of the phrase structure grammar-based approach. Experiments on Korean raw corpus showed that the proposed dependency grammar-based model performs better than n-gram models at 11% to 11.5% reductions in test corpus entropy.

서지기타정보

서지기타정보
청구기호 {DCS 99002
형태사항 viii, 112 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Seung-Mi Lee
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 104-107
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서