Given the ranked lists of documents returned by multiple search engines in response to a given query, the problem of metasearch is to combine these lists in a way which optimizes the performance of the combination.
Meanwhile, a user’s queries can be classified into two types. One is for finding as many web documents as possible which contain the explanation of a given topic. This type of web search is called “topic relevance task”. The other is for finding the entry page of the web site which is maintained by the given organization. This is called “entry page finding task”.
However, past research in metasearch has been restricted to the combination of topic relevance task results. To combine the results of topic relevance task, it has been known that the more engines a document retrived by, the higher rank the combinging function should assign to the document.
In this thesis, we combine multiple results of entry page finding. A site is an organized collection of web documents. To combine the results of entry page finding, we extend the unit of source information from a document to the site which the document belongs to. We add the site information to the existing combing fuction and gain the improved performance.