서지주요정보
공기기반 용어간 유사도를 이용한 정보검색 질의확장 비교 연구 = A comparison of query expansion using collocation-based inter-term similarity measures in information retrieval
서명 / 저자 공기기반 용어간 유사도를 이용한 정보검색 질의확장 비교 연구 = A comparison of query expansion using collocation-based inter-term similarity measures in information retrieval / 김명철.
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009907

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 99004

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9006219

소장위치/청구기호

서울 학위논문 서가

DCS 99004 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In this paper, I present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: Average Conditional Probability (ACP) and Normalized Mutual Information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. For the experiments, I used two standard Korean test collections: the KTSET and the ETRI-Kemong SET (EKSET). I applied the previous five inter-term similarity measures to all terms that collocate with a query term, and sorted them in descending order of similarity value. In this manner, the additional search term lists of the five similarity measures were prepared for query expansion. If one user requests a query, the documents are retrieved from the inverted file, and are ranked. Then, the original query is expanded along with the additional search term list that was prepared before. To compare the pure retrieval effectiveness of the similarity measures, the query is expanded automatically, one by one in the list. In the document ranking, the document weights are determined by the normalized tf(idf weighting scheme. The main results of this study are summarized as follows: - In analyzing the properties of the similarity measures, the additional search terms of Jaccard, - Dice, and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. - In overall assessments of query expansion, the Jaccard, Dice, Cosine similarity measures are better than ACP, NMI in terms of retrieval effectiveness, whereas, NMI, ACP are better in terms of execution efficiency. - The collocation based inter-term similarity seems more useful in the large collections, because the collocation is enough to discriminate the relevant documents from non-relevant documents. Likewise, the more additional search terms are needed to reach the peak improvement in the large collection. - The high threshold value might cut off most of additional search terms, because the similarity values are very low, so, it seems better not using the threshold value. - In order to improve the retrieval effectiveness in the query expansion using an inter-term similarity measure, the similarity value should be used as a query weight of the additional search term, and the similarity model between a query and an additional search term is necessary. In my further research, I will present more general ways for selecting the additional search terms and weighting them in the query expansion. Furthermore, I will integrate the global, static inter-term similarity model and the local, dynamic relevance feedback for better retrieval effectiveness.

서지기타정보

서지기타정보
청구기호 {DCS 99004
형태사항 vi, 66 p : 삽화 ; 26 cm
언어 한국어
일반주기 부록 : A, 테스트 모음의 질의. - B, 확장용어 후보 목록 예
저자명의 영문표기 : Myoung-Cheol Kim
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 53-56
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서