서지주요정보
정보검색과 데이타베이스 관리 시스템의 밀결합을 위한 역색인 구조와 질의 최적화 = Inverted index structures and query optimization for tight coupling of information retrieval with database management systems
서명 / 저자 정보검색과 데이타베이스 관리 시스템의 밀결합을 위한 역색인 구조와 질의 최적화 = Inverted index structures and query optimization for tight coupling of information retrieval with database management systems / 박병권.
발행사항 [대전 : 한국과학기술원, 1998].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009253

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 98020

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9005079

소장위치/청구기호

서울 학위논문 서가

DCS 98020 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

A document contains not only text contents such as the abstract and the body, but also attributes such as its serial number, date created, and names of authors. Thus, a document query can be processed based on either the text contents or the attribute data. We call the former a content-based query and the latter an attribute-based query. Traditionally, a content-based query is properly supported by an information retrieval system, while attribute-based query by a database management system (DBMS). Thus, to support an integrated query of both types, which we simply call an integrated query, we need to integrate information retrieval with the DBMS. The integration is classified into loose coupling and tight coupling. In loose coupling, both the information retrieval system and the DBMS exist independently and a separate integration module processes the integrated query. In tight coupling, however, the information retrieval system does not exist independently and the DBMS processes the integrated query by extending its query processor to have information retrieval capability. Thus, in loose coupling, the integrated query is processed only through the interface of the query processor of each system. In tight coupling, however, the integrated query is internally processed in the query processor, which can directly access the index and data files necessary to process the integrated query. As a result, tight coupling outperforms loose coupling in processing the integrated query. A new technique is needed in processing an integrated query in a tightly-coupled DBMS. The type of technique used in conventional information retrieval systems is retrieving the documents containing the keywords given in the query. We call it keyword retrieval. In processing an integrated query in a tightly-coupled DBMS, however, a technique for selecting the documents containing the keywords among those satisfying an attribute-based condition is needed. We call it keyword-docid retrieval. In this paper, we show that conventional inverted index structures commonly used in processing the content-based query in information retrieval systems cannot support the keyword-docid retrieval efficiently. We propose a new inverted index structure called the cascading index, in which a leaf node of one index points to the root node of another index. Through analysis and experiments, we prove that the cascading index can efficiently support keyword-docid retrieval as well as keyword retrieval. Next, we present techniques for optimizing integrated queries using the cascading index in a tightly-coupled DBMS. Especially, by using keyword-docid retrieval that the cascading index can efficiently support, we can take advantage of another strategy for processing the content-based query to further optimize the integrated queries. Here, we address the following two aspects of query optimization. First, we can process integrated queries with a different method using keyword-docid retrieval. The new method checks whether each document obtained from an attribute-based query satisfies the content-based query using the keyword-docid retrieval. We show that such a method can perform best when only a small number of documents satisfy the attribute-based query, but a large number of documents satisfy the content-based query. Second, when processing the content-based query before or in parallel with the attribute-based query, we can utilize many new strategies for processing the content-based query using the keyword-docid retrieval. For instance, if the content-based query is a boolean query including two keywords, we can process it by checking whether each document containing one keyword contains the other using keyword-docid retrieval. We show that such a method can perform best where one keyword is contained in only a small number of documents, but the other in a large number of documents. Finally, as an application of the tightly-coupled DBMS, we propose an architecture for hypermedia systems with information retrieval capability. When a large number of hypermedia documents exist in the database, the query capability is essential for locating the hypermedia document that we want to find because navigation through hyperlinks alone takes too much time to reach the document. Conventional hypermedia systems use file systems as their storage systems, and thus, they cannot efficiently support hypermedia database queries. The proposed architecture stores and manages hypermedia documents in a database using the DBMS tightly coupled with information retrieval as its storage system, thereby allowing both content-based and attribute-based retrievals. In summary, the new inverted index structure proposed in this paper can efficiently support keyword-docid retrieval. Using keyword-docid retrieval, different strategies of processing integrated queries can be utilized for optimization. We believe this work is the first attempt to optimize a query integrating content-based query and attribute-based query in a tightly-coupled DBMS using keyword-docid retrieval.

서지기타정보

서지기타정보
청구기호 {DCS 98020
형태사항 ix, 81 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Byung-Kwon Park
지도교수의 한글표기 : 황규영
지도교수의 영문표기 : Kyu-Young Whang
수록잡지명 : "An Object-Oriented Hypermedia System Based on the Dexter Reference Model and the MHEG Standard". IEICE Trans. on Information and Systems. The Institute of Electronics, Information and Communication Engineers, vol. E79-D, no. 6, pp. 687-694 (1996)
학위논문 학위논문(박사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 73-81
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서