서지주요정보
코퍼스와 사전을 이용한 동사 의미 분별 = Verb sense disambiguation using corpus and dictionary
서명 / 저자 코퍼스와 사전을 이용한 동사 의미 분별 = Verb sense disambiguation using corpus and dictionary / 조정미.
발행사항 [대전 : 한국과학기술원, 1998].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009250

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 98017

휴대폰 전송

도서상태

이용가능

대출가능

반납예정일

등록번호

9005076

소장위치/청구기호

서울 학위논문 서가

DCS 98017 c. 2

휴대폰 전송

도서상태

이용가능

대출가능

반납예정일

리뷰정보

초록정보

This thesis presents a method for automatic word sense disambiguation using a part-of-speech (POS) tagged corpus and a machine-readable dictionary(MRD). Word sense disambiguation (WSD) is the problem of assigning the appropriate sense to an ambiguous word, using its context. Word sense disambiguation algorithms may be categorized by the method used to overcome the knowledge acquisition bottleneck and the problem of data sparseness. Knowledge based WSD methods use information from a MRD or thesaurus such as LDOCE or WordNet, and corpus based WSD methods use information gained from training on text corpora or tagged corpora. We describe a method that attacks the knowledge acquisition bottleneck and the problem of data sparseness using a POS tagged corpus and a MRD. Typical corpus based WSD methods depend critically on manual sense tagging, which is a laborious and time-consuming process. The need for sense tagged corpus creates a problem as the knowledge acquisition bottleneck. We circumvents this problem by acquiring selectional restriction knowledge from POS-tagged corpus. The selectional restrictions that predicates have for their arguments provide useful information that can help with resolution of sense ambiguity. However, some phenomena in Korean, for example, (1) the postposition shift by auxiliaries, (2) the multiple surface forms of a compound predicate, and (3) the omission of case components, debase the quality of the acquired selectional restriction knowledge. We define a corpus normalization as a process that transforms negative factors in the corpus into appropriate ones to extract the aiming knowledge, preserving the integrity of that corpus. In order to prevent negative effects due to these phenomena we develop corpus normalization rules. Verb sense disambiguation is the clustering of the objects of the ambiguous verb using the normalized selectional restriction knowledge. A word's dictionary definitions are likely to be good indicators for the senses they define. So the verb sense to be clustered is defined as a sense entry described in a dictionary definition. The experiment reveals that the corpus normalization rules are correct with the precision of over 93% and are significantly effective in reducing wrong and sparse data and in result, the recall and precision of WSD are improved. Corpus based WSD methods suffer from the problem of data sparseness. Traditionally, the problem of data sparseness is approached by estimating the probability of unobserved cooccurrences using the actual cooccurrences in the training corpus. This can be done by smoothing the observed frequencies, or by class-based methods. We address this problem in two ways. First, we replace the all-or-none indicator of cooccurrence in noun distribution by a graded measure of noun distribution and verb distribution. Nouns are considered similar if they appear as the object of similar verbs; verbs are similar if they take similar nouns as their object. Using both noun and verb distributions, the data sparseness can be reduced than only noun distribution. Second, we classify the nouns appeared in the corpus by using IS-A relations extracted from the dictionary definition. The experiments show that by using both noun and verb distributions, the recall is improved by about 22% and by using the dictionary, the precision is improved by about 25%.

서지기타정보

서지기타정보
청구기호 {DCS 98017
형태사항 vi, 93 p. : 삽도 ; 26 cm
언어 한국어
일반주기 부록 : A, 실험대상단어와 사전 의미 집합. - B, 각 동사별 이미 분별 실험 결과
저자명의 영문표기 : Jeong-Mi Cho
지도교수의 한글표기 : 김길창
지도교수의 영문표기 : Gil-Chang Kim
수록잡지명 : "Automatic Acquisition of N(oun)-C(ase)-P(redicate) Information from POS-Tagged Corpus". Computer Processing of Oriental Languages. Oriental Languages Computer Society, vol. 11, no. 2, pp. 191-204 (1997)
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 77-82
주제 단어 의미 분별
코퍼스 기반
사전 기반
통계 기반
의미 분석
Word sense disambiguation
Corpus-based
Dictionary-based
Statistical-based
Semantic analysis
QR CODE qr code