서지주요정보
정렬기법을 이용한 일한 대역사전 자동 생성 = Automatic generation of Japanese-Korean bilingual lexicon using alignment method
서명 / 저자 정렬기법을 이용한 일한 대역사전 자동 생성 = Automatic generation of Japanese-Korean bilingual lexicon using alignment method / 김태완.
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009906

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 99003

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9006218

소장위치/청구기호

서울 학위논문 서가

DCS 99003 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

All natural language processing systems use dictionaries proper for the purpose of the systems. Dictionaries for natural language processing systems play an important role. It costs beyond 50% of system development expense to develop the dictionaries. But the dictionaries have been being developed manually using the human tuition based on the microscopic analysis of small amount of real data. The low performance of the natural language processing systems is basically caused by the lack of consistency and generality of dictionary data which are constructed by this method. To solve this problem, it needs a knowledge acquisition method which is able to get consistent and general dictionary data from large amount of real linguistic data. This thesis presents a method for automatic generation of Japanese-Korean bilingual lexicon by applying alignment technology on large amount of parallel corpora which is sentence aligned and used in real world. This method is able to construct the bilingual lexicon of verbal words and frozen expressions. Frozen expression means that the expression is composed of several words and occurs concurrently. Alignment of verbal words and frozen expressions were not attempted in existing research. This method do not use the natural language processing resources such as morphological analyzer, tagger and parser to reduce the additional costs and efforts expended in developing and maintaining the resources. The linguistic resources used in this method are only the grammatical words of Korean and Japanese. The proposed method firstly constructs a part of bilingual lexicon by aligning the words which are written in Chinese characters in Japanese text with correspondence Korean words. This alignment result establish the anchor point between Japanese and Korean text. This process makes the sentence aligned Japanese-Korean corpus to the segment aligned corpus its aligned constituents are smaller than the sentence aligned corpus. After this process, candidate word strings are extracted from the Korean and Japanese text respectively. And then meaningless Korean candidate word strings are removed. The method used in this elimination is exploiting the fact that "text is more informative than a sentence" and "the words will occur in fixed form in the text". Dice coefficient is used for calculating the aligning probability between Korean and Japanese word strings. Finally, the Korean and Japanese frozen expressions are extracted from the Korean and Japanese texts using the statistical method which is able to control the frequency of frozen expressions to be extracted. And then the Japanese and Korean frozen expressions are aligned using dice coefficient. Japanese-Korean bilingual lexicon is generated by merging the aligned Korean and Japanese words and frozen expressions which is made above method. The precision of alignment of Japanese word written in Chinese characters and corresponding Korean words come to 100% because of the its own property. In word level alignment, the precision of alignment of nominal words and verbal words were 66.3% and 63.6%, respectively. In alignment of frozen expression, it is difficult to calculate the precision because the number and accuracy of extracted expressions is variable according to the applied frequency threshold. When the frequency values for extracting the Japanese frozen expressions and Korean frozen expressions were set to 20 and 5, respectively, the precision amounts to 37%.

서지기타정보

서지기타정보
청구기호 {DCS 99003
형태사항 vi, 99 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Tae-Wan Kim
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
수록 잡지명 : "Analysis of Current Commercial Japanese to Korean Machine Translation Systems and Suggestions for Future Development". Journal of Natural Language Processing, vol. 5, no. 4, pp. 127-148 (1998)
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 95-99
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서