트리 표현 정규화를 위한 이진 문법과 복합 레이블링 = A binary grammar with composite labels for normalizing tree representations
서명 / 저자 트리 표현 정규화를 위한 이진 문법과 복합 레이블링 = A binary grammar with composite labels for normalizing tree representations / 김성용.
발행사항 [대전 : 한국과학기술원, 2003].
The syntactic analysis of a language requires a grammar which encompasses the syntactic characteristics of the language. Korean is an agglutinative language, in which grammatical morphemes take syntactic roles in a sentence, by being concatenated to a preceding construct. We propose a format of a binary phrase structure grammar with composite labels. The grammar adopts binary rules so that the dependency between two sub-trees can be represented in the label of the tree. The label is composed of two attributes being extracted from each sub-tree so that it can represent the compositional information of the tree. The composite label is generated from part-of-speech tags using an automatic labelling algorithm. Since the proposed rule description scheme keeps in the label of a tree its compositional information, the appropriateness of a construct can be decided with ease by its label. The proposed rule description is binary and uses only part-of-speech information so that it can readily be used in dependency grammar and be applied to other languages as well. It can also be applied to normalizing various tree representations and constructing a unified syntactic corpus, and thereby extracting syntactic information from various types of syntactic corpus in a uniform way. We implement a tool that transforms syntactic descriptions into normalized one based on this proposed scheme. It can also be used for syntactic analysis, which performs higher than the previous syntactic descriptions for Korean corpus. In the best-1 context-free cross validation on 31,080 sentences of a tree-tagged corpus, the labelled precision is 79.30%, which outperforms phrase structure grammar and dependency grammar by 5% and by 4%, respectively. It reveals that the proposed rule description scheme is effective for parsing Korean.


청구기호 {DCS 03023
형태사항 vi, 63 p. : 삽화 ; 26 cm
언어 한국어
일반주기 부록 : A, KAIST 품사태그의 복합 레이블 생성. - B, 구문 레이블로 본 코퍼스의 오류 유형
저자명의 영문표기 : Seong-Yong Kim
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
수록잡지명 : "Automatic generation of composite labels using POS tags for parsing korean". International Journal of computer processing of oriental languages, v.16 no.2, (2003)
수록잡지명 : "Normalizing syntactic structures using POS tags and binary rules". IEICE transactions on information and systems, v.E86-D no.10, (2003)
학위논문 학위논문(박사) - 한국과학기술원 : 전산학전공,
서지주기 참고문헌 : p. 58-63





