This paper presents a prototype for an approximate information retrieval system based on clustering technique. The approximate information retrieval system is assisting a user to browse searching results which are based on his/her queries as showing similar information that he/she selected a suitable document for his/her purpose from the searching results.
It has two advantages; the one is it reflects user's relevance feedback about search results and the other is it searches information based on a document that has much more useful keywords than user's short queries. As it uses a document's whole keywords to seek information, it usually fmds too large documents so that are not related with a user's searching purposes. To solve those problems we discuss the prototype which is based on word's co-occurrence information and clustering technique. Word's co-occurrence information is helpful for settling problems due to homonyms and synonyms in documents and clustering technique can give user much compact and fine searching results.
An empirical study was conducted using newspaper articles with 194 entries which are classified by experts. The result is that this prototype achieves considerable improvement of precision while preserving the recall and reduction of retrieval size over a traditional approximate information system based on single keyword indexing.
In further research, I will apply standard corpus sets to generalize the prototype. In addition, I will compare the expenses of managing co-occurrence word pairs and making clustering and the benefits of improvement of precision and reduction of retrieval size.