Dependency parsing analyzes a sentence by associating individual word by dependency links. It is known that dependency parsing fits the elliptical or the variable word order languages such as Korean, Japanese, and Russian. But Dependency parsing has a serious structural ambiguity problem caused by its simple syntactic rules.
Probabilistic/statistical disambiguation is the decision making among uncertain candidates based on estimated probabilities from the observation on a large amount of empirical data. This thesis proposes an efficient computational model to estimate the probability of a dependency tree for the structural disambiguation.
Some previous probabilistic models used parameters derived from the word order such as distance or the part of speech tag of the left sister in order to enhance their accuracy. However, in the variable word order languages, these parameter values may not have any consistent patterns and so the parameters may not play any role in disambiguation.
The proposed model uses the ascending dependency to improve accuracy. The ascending dependency is the relationship between a word and its ascending heads in the dependency hierarchy. A dependency relationship means the influence of the head for the existence of the dependent in a sentence, and so the ascending dependency implies the recursive influence of all ascendants for the existence of the dependent. This parameter is based on the dependency and not on the word order. Therefore, the proposed model in this study is designed to resolve the syntactic ambiguity occurring in the variable word order languages.
For applying the proposed model to Korean dependency parsing, a dependency grammar model (KAIST Dependency Grammar) was defined to describe the Korean syntax of a word-phrase unit. And an efficient dependency-parsing algorithm was provided to produce the most probable result for a given sentence with the probabilistic model. Also, a dependency structure annotated corpus (KAIST dependency tree bank) was built for supervised training of the model.
KAIST Dependency Grammar is based on the dependency theory (Tesniere, 1959). It adopts two heuristics of head post-positioning property and no-crossing property in order to reduce structural ambiguity.
The parsing algorithm is the combination of the Right-to-Left parsing algorithm (Kim, C.H. etal, 1993) with the Best-First-Search algorithm (Dean, 1995). Although the algorithm shows the exponential time complexity, it is applicable to various models and always finds the optimal result by using a probabilistic measure.
KAIST dependency tree bank consisted of over 30,000 sentences. And it is built by converting from KAIST tree bank which annotation scheme is a phrase structure grammar with a morpheme unit. It may serve as an important knowledge to resolve problems in Korean processing as well as in theoretical linguistics.
In this thesis, two experiments are performed using KAIST dependency tree bank. The first experiment was carried out to evaluate the performance of the two previous models, e.g. (Collins, 1996) and (Eisner, 1996), and the proposed model. The result showed that the proposed model works better in accuracy than the Collins' model by 1.9 % and the Eisner's model by 2.8 % for the unseen test sentences. In the second experiment, the effect of the number of the ascending heads on the parsing accuracy was examined. The results showed that the model considering two ascendants worked better than the others, when applied to KAIST dependency tree bank.
Analyzing the experiment results said that the proposed parsing system might take some improvements. In this thesis, two improvements among them were taken: by using the predicative particle as a parameter and by considering the co-occurrence of the same syntactic categories in a parallelism. Therefore, the suggested system showed slightly better in accuracy.