서지주요정보
다국어 정보검색을 위한 영-한 음차 표기 및 복원 모델 = An English-Korean transliteration and retransliteration model for cross-lingual information retrieval
서명 / 저자 다국어 정보검색을 위한 영-한 음차 표기 및 복원 모델 = An English-Korean transliteration and retransliteration model for cross-lingual information retrieval / 이재성.
저자명 이재성 ; Lee, Jae-Sung
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 제한공개(로그인 후 원문보기 가능)원문

소장정보

등록번호

8009920

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 99017

SMS전송

도서상태

이용가능

대출가능

반납예정일

등록번호

9006232

소장위치/청구기호

서울 학위논문 서가

DCS 99017 c. 2

SMS전송

도서상태

이용가능

대출가능

반납예정일

초록정보

In Korean documents, many transliterations of foreign loan words are found. They are usually proper nouns and technical terms that play important roles in information retrieval. In the case of cross-lingual information retrieval, these transliterations are a barrier for the automatic term translation because they usually do not appear in a dictionary. Moreover, the transliterations are used variously in the documents, which makes the automatic transliteration more difficult for the cross-lingual information retrieval. Transliteration from English to Korean is assumed to be done in two ways: (1) automatically extracting the pronunciation from English letters in a word and then converting it to a Korean word, and (2) directly converting the English letters to a Korean word. In this thesis, the first one is called the pivot method and the second one is the direct method. In addition to the two methods, a hybrid method is proposed and the three methods are compared with together. For the proper comparison, a statistical transliteration model (STM) is proposed, which automatically learns transliteration rules from bilingual word-aligned data, and introduces pronunciation units to reflect the different sound structures of the two languages. The pivot method uses the STM to produce pronunciations from English letters in the first stage and uses the Korean standard conversion rule of foreign word transliteration to convert the pronunciations to Korean characters in the second stage. The direct method is implemented with the STM, and the hybrid method collects the results of the two methods and selects one of them with higher probability. Experiment was performed for a transliterating process and a retransliterating process; the former converts English words to Korean words, and the latter converts transliterated Korean words to original English words. For the transliteration experiment, transliteration accuracy, variation coverage and the efficiency of information retrieval are used for the measure of the performance. For the retransliteration experiment, the accuracy to find the original English word is used. The experiment showed that the hybrid method was the best in all criteria, and the performance of the direct method was slightly better than that of the pivot method in all tests except the information retrieval test. As a conclusion, the experiment showed that various transliterations are used in Korean documents, and the hybrid method of transliteration and retransliteration was most effective to retrieve the various transliterations and the related documents in cross-lingual information retrieval system.

서지기타정보

서지기타정보
청구기호 {DCS 99017
형태사항 iv, 85 p. : 삽도 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Jae-Sung Lee
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 81-85
주제 음차표기
피봇방식
직접방식
혼합방식
다국어 정보검색
Transliteation
Pivot method
Direct method
Hybrid method
Cross-lingual information retrieval
QR CODE qr code