A practical natural language parser must exhibit robust behavior for an extragrammatical user input. In this thesis, a robust parser with recovery mechanism is proposed to handle extragrammatical sentences, adopting a least-error recognition algorithm. For any input sentence, this algorithm can generate at least one parse tree with least number of errors. And, it can be adapted to any parser which uses a context free grammar as its driver.
However, since the algorithm assumes all possible errors including insertion, deletion, and mutation of constituents, the efficiency is degraded, and too many parse tree are generated. Hence, two kinds of heuristic rules are proposed here in order to improve efficiency and reduce the number of parse trees.
The first kind of these rules are used to assign proper weight to each hypothetical error edge to select the most promising edge during parsing. With these heuristic rules, syntactic analysis can be efficiently and successfully finished without processing all the edges produced by the generic least-error recognition algorithm.
The second type of heuristic rules using clue symbol are introduced to analyze not grammatically but heuristically inserted phrases and parallel phrases. So, we can resolve some ambiguity problems and raise the accuracy of recovery.
The empirical result shows that the accuracy of recovery of this parser ranges 79% ∼ 92%, and the number of resulting syntactic trees is decreased by 40% ∼ 90% from that of generic least-error recognizer.