서지주요정보
한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval
서명 / 저자 한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval / 강병주.
저자명 강병주 ; Kang, Byung-Ju
발행사항 [대전 : 한국과학기술원, 2001].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8012606

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 01016

휴대폰 전송

도서상태

이용가능

대출가능

반납예정일

등록번호

9007696

소장위치/청구기호

서울 학위논문 서가

DCS 01016 c. 2

휴대폰 전송

도서상태

이용가능

대출가능

반납예정일

리뷰정보

초록정보

In Korean text, these days, the use of English words with or without phonetic translations is growing at a high speed. To make matters worse, the Korean transliteration of an English word may vary greatly. The mixed use of English word and their various Korean transliterations in the same document or document collection may cause severe word mismatch problem in Korean information retrieval. When user query and document text use different transliterations from each other, simple word matching is unable to retrieve the document. When a user query uses Korean transliteration and document contains English word or vice versa, simple word matching also fails. In order to resolve the word mismatch problem, it is necessary to find equivalence classes among English words and their various Korean transliterations. However constructing the equivalence classes is not easy due to the inherent difficulties of the problem. There are two possible approaches to tackle the problem. One approach is to transform, i.e. back-transliterate, foreign words into their origin English words and use English words as canonical forms for indexing and querying. The other approach, which is proposed in this thesis, is to transliterate English words into Korean and construct equivalence classes among foreign words by measuring the phonetic similarities among them. We call the former back-transliteration approach and the latter transliteration approach. The back-transliteration approach appears to be more convincing since the original English word is unique whereas its Korean equivalent can be transliterated in multiple ways. However the back-transliteration approach has more difficulties in its actual implementation than the transliteration approach. This is based on the following three observations: (1) back-transliteration is inherently more difficult than transliteration, (2) In Korean text there are generally much more foreign words than English words, (3) English multi-word problem is much more difficult to be handled in the back-transliteration approach than in the transliteration approach. Based on these three observations, we argue that our proposed transliteration approach is more advantageous for the resolution of the word mismatch problem than the previously proposed back-transliteration approach. Our information retrieval experiment results supported the argument. The actual implementation of both the transliteration approach and the back-transliteration approach is not easy at all since they require very good solutions for the following more or less unsolved problems: foreign word extraction, automatic transliteration and back-transliteration, and phonetic similarity comparison between foreign words. Low performance in one of the processing modules would greatly degrade the final accuracy of the equivalence class construction. In this thesis we proposed an effective solution for each of the task of foreign word extraction, automatic Korean-English transliteration and back-transliteration, Korean phonetic similarity comparison, and Korean-English character alignment. The automatic character alignment is inevitable for the automatic generation of the training examples for the automatic transliteration and back-transliteration. Our character alignment algorithm was highly accurate but the solutions for the other tasks were not good enough. Hence the equivalence class generated turned out to be too poor for the practical application. We concluded that for the practical use in Korean information retrieval more effective solutions must be sought for the foreign word extraction, automatic transliteration and back-transliteration, and Korean phonetic similarity comparison. In current situation, in order not to harm the information retrieval performance, a realistic approach is to make more conservative decision whether a word belongs to an equivalence class.

서지기타정보

서지기타정보
청구기호 {DCS 01016
형태사항 v, 130 p. : 삽도 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Byung-Ju Kang
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
수록잡지명 : "Effective foreign word extraction for korean information retrieval". Information processing and management
수록잡지명 : "Two approaches for the resolution of word mismatch problem caused by english word and various korean transliterations in korean information retrieval". Journal of computer processing of oriental languages
학위논문 학위논문(박사) - 한국과학기술원 : 전산학전공,
서지주기 참고문헌 : p. 126-130
주제 정보검색
외래어
단어불일치문제
information retrieval
foreign words
word mismatch problem
QR CODE qr code