Most of today's data is not stored in relational database systems: instead, it is found in text documents, SGML or HTML documents, spreadsheets, nonstandard data formats, etc. Research on semi-structured data has focused on extending database techniques to handle such data, by proposing new data models, query languages, schema formalisms, and optimization techniques.
Various query languages have been proposed lately for querying semi-structured data, whose data model is a rooted and labeled graph. The simplest kinds of queries on such data are those which traverse paths described by regular path expressions. An ordinary example of semi-structured data is web sites, where the pages correspond to nodes in the graph and the hyperlinks correspond to labeled edges: a query with a regular path expression traverses the graph entirely or partially.
In this paper, we address the problem of query processing on distributed semi-structured databases. In our setting the nodes of the database are stored at a number of sites and the edges can be either local (with both ends in the same site) or cross (with ends in two distinct sites). We propose three algorithms: path query reduction, path query diffusion with normal termination, and path query diffusion with user abortion. We also show the correctness of termination detection of path query diffusion algorithms.