This thesis describes an automatic document retrieval system which analyzes the contents of documents and search requests, expressed in the natural language, and produces answers to the search request the documents that appear to be most relevant to the request even though they may not be indexed by the exact terms of the request.
A number of procedures to organize indexing systems which can provide the improvement of the total system performance are studied. Among these are functional word elimination, stem dictionary and suffix-list construction, word decomposition, thesaurus construction, and phrase generation. Problems to minimize the dictionary storage and problems for fast document search with document clustering are also examined.
A practical implementation of the automatic document retrieval system is represented. The system provides four different processing methods which can be used not only to simulate an actual operating environment, but also to test the effectiveness of the various available processing methods. Retrieval performance of the system as a result is given, demonstrating the usefulness of each method.
본 논문은 자연어 (natural language)로 형성된 document와 search request의 내용을 분석하여 request의 내용과 관계가 깊은 document들을 검색하는 자동 정보 검색 System의 설계 및 제작과 system의 효율성에 대해서 연구하였다.
System 효율을 증대시키기 위해서 functional word elimination, stem dictionary constuction, word decomposition, thesaurus construction, phrase dictionary construction등에 대한 procedure들을 작성해서 indexing system을 구성했으며, split dictionary 형태로 stem dictionary를 만들어서 storage 절약을 꾀했으며, document clustering을 함으로써 query processing 시간을 줄이는 방법도 모색했다. 특히 system에서 가장 중요한 부분인 indexing system의 설계 과정에서 discrimination value model을 도입함으로써 system 효율은 증대시킬 수 있었다.