Recently, similarity search has been playing an important role in multimedia databases. However, a conventional approach for similarity search, partitioning index structures which partition vector spaces recursively, perform poorly as dimensionality increases. Moreover, even a linear scan outperforms the partitioning index structures when the dimensionality exceeds around ten. This phenomenon is often called the `dimensionality curse.' The VA-file, that is proposed as an alternative to the partitioning index structures, is a flat file containing approximations for object vectors, and is used as a filter. The VA-file retains good performance in high dimensionality.
In this paper, we present a generalization of similarity search algorithms of the partitioning index structures and analyze the generalized algorithm. From the analysis, we find out a fact that the dimensionality curse arises from long diameters of minimum bounding regions (MBRs) in high dimensional vector spaces. We then provide a cost model for the similarity search in the VA-file. The performance of the VA-file varies as a function of the number of bits b, used for an approximation. Therefore, it is very important to determine b in order to optimize the performance of the VA-file. Finally, we analyze the performance behavior of the VA-file, and suggest the guideline to determine b that produces the best performance.