Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and mean-ingful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to gen-erate class models. Frequently, however, the constructed trees are complex with hundreds of nodes and thus difficult to comprehend, a fact that calls into question an often-cited benefit that decision trees are easy to interpret.
In this thesis, we address the problem of constructing "simple" decision trees with few nodes that are easy for humans to interpret. By permitting users to specify constraints on tree size or accuracy, and then building the "best" tree that satisfies the constraints, we ensure that the final tree is both easy to understand and has good accuracy. We develop novel branch and bound algorithms for pushing the constranints the building phase of classifiers, and pruning early tree nodes that cannot possibly satisfy the constraints.
Our experimental results with realife and synthetic data sets demostrate that significant performance speedups reducions in the number of nodes expanded can be ahieved as a result of incorporating knowledge of the con-straints into the building step as opposed to applying the constraints after the entire tree is builts.