As more and more documents are published in XML, generating relational schemas to store XML documents in a relational database is also getting important. This thesis describes a technique to produce relational schema from the XML Schema, a standard recently recommended by W3C.
Compared with the DTD-based inlining technique, the originality of the technique comes from the inherent complexity and flexibility of the XML Schema. The XML Schema language defines the structure of a class of XML documents by means of types. Therefore, new data structures, the schema graph and the type graph, are constructed for intermediate parsing results from an XML Schema. By traversing these graphs, relational tables are composed. This paper also suggests new algorithms for the traversal. Before constructing the graphs, some preliminary work, such as mapping the simple types to RDBMS data types and simplifying the schema, is needed.
Basic Extended Inlining, which was just described, deals with a few subtle features of the XML Schema, such as the anonymous type, abstract type and xsi:type. However, Basic Extended Inlining may produce some undesirable results such as generating big tables for big schemas. So this paper suggests Hybrid Extended Inlining utilizing heuristics to cope with such problems.
We have implemented a prototype system based on the proposed technique. Experimental results show that our prototype system generates a schema which is suitable to store data centric XML documents.