서지주요정보
정보검색에서 벡터공간 검색과 클러스터 분석을 통한 문서 순위 결정 모델 = A document ranking model based on vector space retrieval and cluster analysis in information retrieval
서명 / 저자 정보검색에서 벡터공간 검색과 클러스터 분석을 통한 문서 순위 결정 모델 = A document ranking model based on vector space retrieval and cluster analysis in information retrieval / 이경순.
발행사항 [대전 : 한국과학기술원, 2001].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8012640

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 01023

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

The main idea of this thesis is to combine vector space retrieval and cluster analysis to make the documents relevant to a query more accessible. Vector Space retrieval guarantees that the retrieved documents contain query terms. Cluster analysis that groups the related documents can serve as considering the context of the terms in a document. This idea was specified in such fields as information retrieval and cross-language information retrieval. In information retrieval (IR), a document ranking model initially constructs the hierarchical document clusters depending on similarities among documents. For a user query, a ranking model calculates the similarities between a query and documents based on the vector space retrieval, selects clusters from the hierarchical ones depending on the distribution of the retrieved documents, and calculates the similarities between a query and the centroids of these clusters. The new similarities for the retrieved documents are calculated by weighted summation of the similarities based on vector space retrieval and the similarities based on cluster analysis. They determine the ranks of the documents. The above ranking model was evaluated on ETRI-KEMONG test collection for Korean text. The model showed excellent performance improvement. It achieved the maximum 67.16% performance improvement compared with the vector space retrieval with several weighting schemes. In cross-language information retrieval (CLIR), a document ranking model combines vector space retrieval and cluster analysis based on topic detection after a Korean query is translated into an English query using bilingual dictionaries. A ranking model calculates the similarities between a translated query and documents based on vector space retrieval, makes clusters incrementally by the order of similarities of the retrieved documents based on topic detection, and calculates the similarities between a query and the centroids of these clusters. The new similarities are calculated in the same way as in information retrieval. They determine the ranks of the documents. The above ranking model was evaluated on TREC-6 CLIR test collection for English text. This model showed 7.39% performance improvement in case of a disambiguated query by co-occurrence information between terms compared with the vector space retrieval. Even this model showed 16.34% performance improvement in case of a translated query without ambiguity resolution. This result shows that cluster analysis help to resolve ambiguity. Although there are the specific differences between a ranking model for IR and that for CLIR such as the analysis order and the method in creating clusters, with the general difference that ambiguity of a query is higher in CLIR than in IR, the proposed models could improve performance in both IR and CLIR. These results indicate that each cluster itself provide context for a query. In addition, the fact that the results were obtained from both Korean and English text suggests that the proposed models could also be applied to any language retrieval.

서지기타정보

서지기타정보
청구기호 {DCS 01023
형태사항 vii, 97 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Kyung-Soon Lee
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
수록잡지명 : Information processing & management, v.37 no.1, pp. 1-14(2001)
학위논문 학위논문(박사) - 한국과학기술원 : 전산학전공,
서지주기 참고문헌 : p. 83-91
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서