The syntactic analysis of a language requires a grammar which encompasses the syntactic characteristics of the language. Korean is an agglutinative language, in which grammatical morphemes take syntactic roles in a sentence, by being concatenated to a preceding construct.
We propose a format of a binary phrase structure grammar with composite labels. The grammar adopts binary rules so that the dependency between two sub-trees can be represented in the label of the tree. The label is composed of two attributes being extracted from each sub-tree so that it can represent the compositional information of the tree. The composite label is generated from part-of-speech tags using an automatic labelling algorithm.
Since the proposed rule description scheme keeps in the label of a tree its compositional information, the appropriateness of a construct can be decided with ease by its label. The proposed rule description is binary and uses only part-of-speech information so that it can readily be used in dependency grammar and be applied to other languages as well. It can also be applied to normalizing various tree representations and constructing a unified syntactic corpus, and thereby extracting syntactic information from various types of syntactic corpus in a uniform way. We implement a tool that transforms syntactic descriptions into normalized one based on this proposed scheme.
It can also be used for syntactic analysis, which performs higher than the previous syntactic descriptions for Korean corpus. In the best-1 context-free cross validation on 31,080 sentences of a tree-tagged corpus, the labelled precision is 79.30%, which outperforms phrase structure grammar and dependency grammar by 5% and by 4%, respectively. It reveals that the proposed rule description scheme is effective for parsing Korean.