서지주요정보
조상관계에 기반한 한국어의 확률적 의존구조 분석 = Probabilistic dependency parsing for Korean based on ascending dependency
서명 / 저자 조상관계에 기반한 한국어의 확률적 의존구조 분석 = Probabilistic dependency parsing for Korean based on ascending dependency / 서광준.
저자명 서광준 ; Seo, Kwang-Jun
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 제한공개(로그인 후 원문보기 가능)원문

소장정보

등록번호

8009912

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 99009

SMS전송

도서상태

이용가능

대출가능

반납예정일

등록번호

9006224

소장위치/청구기호

서울 학위논문 서가

DCS 99009 c. 2

SMS전송

도서상태

이용가능

대출가능

반납예정일

초록정보

Dependency parsing analyzes a sentence by associating individual word by dependency links. It is known that dependency parsing fits the elliptical or the variable word order languages such as Korean, Japanese, and Russian. But Dependency parsing has a serious structural ambiguity problem caused by its simple syntactic rules. Probabilistic/statistical disambiguation is the decision making among uncertain candidates based on estimated probabilities from the observation on a large amount of empirical data. This thesis proposes an efficient computational model to estimate the probability of a dependency tree for the structural disambiguation. Some previous probabilistic models used parameters derived from the word order such as distance or the part of speech tag of the left sister in order to enhance their accuracy. However, in the variable word order languages, these parameter values may not have any consistent patterns and so the parameters may not play any role in disambiguation. The proposed model uses the ascending dependency to improve accuracy. The ascending dependency is the relationship between a word and its ascending heads in the dependency hierarchy. A dependency relationship means the influence of the head for the existence of the dependent in a sentence, and so the ascending dependency implies the recursive influence of all ascendants for the existence of the dependent. This parameter is based on the dependency and not on the word order. Therefore, the proposed model in this study is designed to resolve the syntactic ambiguity occurring in the variable word order languages. For applying the proposed model to Korean dependency parsing, a dependency grammar model (KAIST Dependency Grammar) was defined to describe the Korean syntax of a word-phrase unit. And an efficient dependency-parsing algorithm was provided to produce the most probable result for a given sentence with the probabilistic model. Also, a dependency structure annotated corpus (KAIST dependency tree bank) was built for supervised training of the model. KAIST Dependency Grammar is based on the dependency theory (Tesniere, 1959). It adopts two heuristics of head post-positioning property and no-crossing property in order to reduce structural ambiguity. The parsing algorithm is the combination of the Right-to-Left parsing algorithm (Kim, C.H. etal, 1993) with the Best-First-Search algorithm (Dean, 1995). Although the algorithm shows the exponential time complexity, it is applicable to various models and always finds the optimal result by using a probabilistic measure. KAIST dependency tree bank consisted of over 30,000 sentences. And it is built by converting from KAIST tree bank which annotation scheme is a phrase structure grammar with a morpheme unit. It may serve as an important knowledge to resolve problems in Korean processing as well as in theoretical linguistics. In this thesis, two experiments are performed using KAIST dependency tree bank. The first experiment was carried out to evaluate the performance of the two previous models, e.g. (Collins, 1996) and (Eisner, 1996), and the proposed model. The result showed that the proposed model works better in accuracy than the Collins' model by 1.9 % and the Eisner's model by 2.8 % for the unseen test sentences. In the second experiment, the effect of the number of the ascending heads on the parsing accuracy was examined. The results showed that the model considering two ascendants worked better than the others, when applied to KAIST dependency tree bank. Analyzing the experiment results said that the proposed parsing system might take some improvements. In this thesis, two improvements among them were taken: by using the predicative particle as a parameter and by considering the co-occurrence of the same syntactic categories in a parallelism. Therefore, the suggested system showed slightly better in accuracy.

서지기타정보

서지기타정보
청구기호 {DCS 99009
형태사항 ix, 89 p. : 삽도 ; 26 cm
언어 한국어
일반주기 부록 : 한국어 품사 태그
저자명의 영문표기 : Kwang-Jun Seo
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
수록 잡지명 : "A Probabilistic Model for Dependency Parsing Considering Ascending Dependencies". Literary and Linguistic Computing. Oxford University Press, vol. 13, no. 2, pp. 59-63
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 85-89
주제 의존문법
의존구조 분석
확률적 언어처리
Dependency grammar
Dependency parsing
Probabilistic language processing
QR CODE qr code