Database exploration means activities examining the database thoroughly to acquire potentially useful information. Especially, the data mining facility discovering knowledge that is implicit, but obtainable through systematic data processing, is one of essential constituents of database exploration. The importance of data mining is emphasized, since rapidly increasing size of data makes direct exposure of raw records no longer so helpful. In addition, treatment of fuzzy information must be incorporated into database exploration facilities to cope with ubiquitous fuzziness in actual domain, and in turn, provide more effective functionality.
In this thesis, we investigate database exploration techniques accommodating fuzzy information. Firstly, Level-1 Fuzzy Relational Data Model (FRDM-1) is proposed as a theoretically clear framework for processing fuzzy queries. It is hard to make a crisp query reflecting the user's data request exactly, against a large amount of data. Fuzzy querying capability is regarded as a basic form of database exploration, since users can express their data requests with their own subjective linguistic and flexible terms. Furthermore, the ranked answer for a fuzzy query provides useful information to understand content of the database. Secondly, an interactive top-down data mining process for database summarization is devised. The process exploits fuzzy domain knowledge to hypothesize discovery targets and evaluate the validity of each hypothesis. Thirdly, the top-down data mining process is extended to discover inter-attribute relationships. Finally, the data mining process is integrated with FRDM-1 to allow more flexibility in user's exploration request.
FRDM-1 is established by two basic query languages, i.e., Level-1 Fuzzy Relational Algebra(FRA-1) and Level-1 Fuzzy Relational Calculus(FRC-1). In addition, two advanced query languages, i.e., Fuzzy Selective Relational Algebra(FSRA) and Fuzzy Selective Relational Calculus(FSRC), to facilitate expressing vagueness is derived. Furthermore, we show that extended semantics of various relational operators can be easily incorporated into the proposed FRDM-1, which gives a strong support that FRC-1(and also FRA-1) can be served as an expressiveness measure for fuzzy query languages and the proposed FRDM-1 can be regarded as a useful framework of various fuzzy databases.
We define a concept tuple as a representation form of a database summary including fuzzy concepts. The validity of a concept tuple with respect to a given database, i.e., the support degree, is measured as the fraction of supporting tuples to the total cardinality of the database. The adopted fuzzy domain knowledge conducts construction and evaluation of concept tuples, while pruning unnecessary hypothesis derivations without missing any significant concept tuples. We also present an informativeness measure for distinguishing concept tuples that delivers much information to users, based on Shannon's information theory. Explanatory rules are defined in terms of concept tuples, and their validity is measured in two folds. One is the support degree to measure statistical significance of the explanatory rule, and the other is the confidence factor to measure rule strength. The support degree of an explanatory rule corresponds to the support degree of the conjunction of the antecedent and consequence concept tuples. And the confidence factor is defined as the fraction of positive instances to relevant instances.