As the importance of privacy protection is widely recognized and the popularity of XML databases is rapidly increasing, privacy protection in XML databases is becoming an important research issue. XML data can be stored in XML databases or disseminated through the internet. In this dissertation, we address the issues of privacy protection on both the XML data stored and those disseminated.
The Hippocratic database model recently proposed by Agrawal et al. incorporates privacy protection capabilities into relational databases. Since the Hippocratic database model is based on the relational database, it needs extensions to be adapted for XML databases widely used today.
In the first part of this dissertation, we make an extension of the Hippocratic database model to the Hippocratic XML database model so as to manage XML data and present an efficient access control mechanism under this model. In contrast to relational data, XML data have tree-like hierarchies. Thus, in order to manage these hierarchies of XML data, we extend and formally define concepts presented in the Hippocratic database model.
To implement the Hippocratic XML database model, we need an efficient access control mechanism. Existing access control mechanisms for XML data suffer from inefficiency because they traverse the XML data tree to determine the accessibility of elements and, in the worst case, they could access the whole XML data tree. To solve this problem, we propose two new concepts for the access control mechanism in this model: 1) the authorization index combined with the nearest neighbor search technique and 2) the dynamic predicate (DP). The authorization index, which is implemented using a multi-dimensional index, allows us to efficiently search authorizations implied by the authorization granted on the nearest ancestor using the nearest neighbor search technique. A DP is a novel concept representing a dynamically constructed condition that can be adapted for determining the accessibility of elements during query execution. DPs allow us to effectively integrate authorization checking into the query plan so that unauthorized elements are excluded in the process of query execution. Using synthetic and real data, we have performed extensive experiments comparing query processing time with those of existing access control mechanisms. The experimental results show that the proposed access control mechanism improves the processing time significantly over earlier methods reported in the literature - including the top-down access control strategy, the bottom-up access control strategy, and the compressed accessibility map (CAM) method.
Dissemination of XML data on the internet could breach the privacy of data providers unless access to the disseminated XML data is carefully controlled. Recently, the methods using encryption have been proposed for such access control. However, in these methods, the performance of processing queries has not been ad-dressed. A query processor cannot identify the contents of encrypted XML data unless the data are decrypted. This limitation incurs overhead of decrypting the parts of the XML data that would not contribute to the query result.
In the second part of this dissertation, we propose the notion of query-aware decryption for efficient processing of queries against encrypted XML data. Query-aware decryption allows us to decrypt only those parts that would contribute to the query result. For this purpose, we disseminate an encrypted XML index along with the encrypted XML data. This index, when decrypted, informs us where the query results are located in the encrypted XML data, thus preventing unnecessary decryption for other parts of the data. Since the size of this index is much smaller than that of the encrypted XML data, the cost of decrypting this index is negligible compared with that for unnecessary decryption of the data itself. The experimental results show that our method improves the performance of query processing by up to 6.1 times compared with those of earlier methods reported in the literature. Finally, we formally prove that dissemination of the encrypted XML index does not compromise security.
In summary, we have discussed the issues of privacy protection on both the XML data stored and those disseminated through the internet. For the XML data stored, we have proposed the Hippocratic XML database model and presented an access control mechanism using the notion of the dynamic predicate. For the XML data disseminated, we have proposed an access control mechanism using the notion of query-aware decryption. We have verified the effectiveness of the proposed mechanisms by extensive experiments. We believe that the proposed model and mechanisms provide a practical framework that can be implemented in commercial XML DBMSs.
최근 들어, 프라이버시 보호의 중요성에 대한 인식이 크게 증가하고 XML 데이타베이스의 사용이 급속도로 증가함에 따라, XML 데이타베이스에서의 프라이버시 보호가 중요한 연구 이슈로 떠오르고 있다. XML 데이타는 XML 데이타베이스에 저장되기도 하며 인터넷을 통해 배포되기도 한다. 본 학위논문에서는 저장된 XML 데이타와 배포된 XML 데이타에 대한 프라이버시 보호에서 발생하는 이슈들을 다룬다. 최근에 Agrawal 등이 제안한 히포크라테스 데이타베이스 모델은 관계형 데이타베이스에 프라이버시 보호 기능을 추가한 데이타베이스 모델이다. 히포크라테스 데이타베이스 모델은 관계형 데이타베이스에 기반한 모델이므로, 최근에 널리 사용되는 XML 데이타베이스에 적용하기 위해서는 확장이 필요하다. 본 학위논문의 첫번째 파트에서는 히포크라테스 데이타베이스 모델을 XML 데이타베이스에 적용할 수 있도록 히포크라테스 XML 데이타베이스(Hippocratic XML database) 모델로 확장하고, 이 모델 상에서 효율적인 액세스 통제 방법을 제시한다. XML 데이타는 관계형 데이타와 달리 트리 형태의 계층 구조를 가진다. 따라서, 히포크라테스 데이타베