서지주요정보
자연언어 정보 검색에서 상호 정보를 이용한 2단계 문서 순위 결정 방법 = Two-level document ranking methods using mutual information in natural language information retrieval
서명 / 저자 자연언어 정보 검색에서 상호 정보를 이용한 2단계 문서 순위 결정 방법 = Two-level document ranking methods using mutual information in natural language information retrieval / 강현규.
발행사항 [대전 : 한국과학기술원, 1997].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8007237

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 97010

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9003973

소장위치/청구기호

서울 학위논문 서가

DCS 97010 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Information retrieval is the process of selecting related pieces of information according to the information needs specified in a query. However, a major role of information retrieval systems is no just to generate a set of relevant documents, but to help determine which documents are most likely to be relevant to the given requirements. The similarity between a query and the documents can be computed in order to rank the retrieved documents in descending order of the query-document similarity. Users are able to minimize their time spent to find useful information by reading the top-ranked documents first. Ranking techniques are used to find the documents in a collection of documents that is most likely to be relevant to the user‘s query. Occasionally, we find out that there could be retrieved documents whose contexts may not be consistent to the query. It is common practice in linguistics to classify words not only on the basis of their meanings but also on the basis of their co-occurrence with other words. Generally speaking, subjects respond quicker than normal to the word 'nurse' if it follows a highly associated word such as 'doctor'. The word 'doctor' is associated with 'nurse', 'sick', 'health', 'medicine', 'hospital', 'man', 'sickness', and so forth. Mutual information is a relation measure which represents relation between a word and another word. So, we will re-evaluate the relation between the terms in the retrieved document and the terms in the query. In this paper, we discuss a model of natural language information retrieval system that is based on a two-level document ranking method using mutual information. At the first-level, we will retrieve documents by using an automatically constructed index terms. For indexing the first-level retrieval, we will construct the inverted file with index terms. For indexing, a typical complex term-weighting schemes, best fully weighted system, uses a cosine normalized tf X idf weight (term frequency times inverse document frequency) for document terms, and an enhanced but unnormalized tf × idf factor for the queries. Ranking is based on similarity that calculated as the inner product between document and query, and documents are ranked based on that similarity. At the second-level, we will reorder the retrieved documents by using mutual information. As the information for second-level reordering, we will construct the inverted file with mutual information and the inverted file with document terms. At the second-level reordering, we will reorder the first retrieved documents using our newly developed formulas based on the mutual information value, the co-occurrence terms normalization, the document normalization, and/or the combination of the foregoing normalizations. Basically, we want to improve the retrieval effectiveness by reordering the document ranking with two-level document ranking method using the mutual information. An empirical study was conducted using a Korean encyclopedia with 23,113 entries (10MB of text), 45 natural language queries collected by end-users, and the relevant information selected by experts. We will show that our method achieves considerable retrieval effectiveness improvement over a traditional linear searching method. Also, we will analyse newly developed seven formulas that reorder the retrieved documents. Among seven formulas, we will recommend one formula that dominates the others in terms of the retrieval effectiveness. Since our method can improve the precision while preserving the recall, we believe that the two-level document ranking method using mutual information is a good candidate for post-enhancement after traditional linear search ranking, relevance feedback, data fusion, or query expansion.

서지기타정보

서지기타정보
청구기호 {DCS 97010
형태사항 vi, 94 p. : 삽화 ; 26 cm
언어 한국어
일반주기 부록 : A, 자연언어 질의. - B, 자연언어 질의에 대한 적합성 정보
저자명의 영문표기 : Hyun-Kyu Kang
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 78-89
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서