On the induction of decision trees for multiple concept learning.

Fayyad, Usama Mohammad

On the induction of decision trees for multiple concept learning.

dc.contributor.author	Fayyad, Usama Mohammad
dc.contributor.advisor	Irani, Keki B.
dc.date.accessioned	2016-08-30T16:56:08Z
dc.date.available	2016-08-30T16:56:08Z
dc.date.issued	1991
dc.identifier.uri	http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:9208535
dc.identifier.uri	https://hdl.handle.net/2027.42/128810
dc.description.abstract	We focus on developing improvements to algorithms that generate decision trees from training data. This dissertation makes four contributions to the theory and practice of the top-down non-backtracking induction of decision trees for multiple concept learning. First, we provide formal results for determining how one generated tree is better than another. We consider several performance measures on decision trees and show that the most important measure to minimize is the number of leaves. Notably, we derive a probabilistic relation between the number of leaves of the decision tree and its expected error rate. The second contribution deals with improving tree generation by avoiding problems inherent in the current popular approaches to tree induction. We formulate algorithms GID3 and GID3$\sp$ that are capable of grouping irrelevant attribute values in subsets rather than branching on them individually. We empirically demonstrate that better trees are obtained. Thirdly, we present results applicable to the binary discretization of continuous-valued attributes using the information entropy minimization heuristic. The results serve to give a better understanding of the entropy measure, to point out desirable properties that justify its usage in a formal sense, and to improve the efficiency of evaluating continuous-valued attributes for cut point selection. We then proceed to extend the binary discretization algorithm to derive multiple interval quantizations. We justify our criterion for deciding the intervals using decision-theoretic principles. Empirical results demonstrate improved efficiency and that the multiple interval discretization algorithm allows GID3$\sp$ to find better trees. Finally, we analyze the merits and limitations of using the entropy measure (and others from the family of impurity measures) for attribute selection. We argue that the currently used family of measures is not particularly well-suited for attribute selection. We motivate and formulate a new family of measures: C-SEP. The new algorithm, O-BTREE, that uses a selection measure from this family is empirically demonstrated to produce better trees. Ample experimental results are provided to demonstrate the utility of the above contributions by applying them to synthetic and real-world problems. Some applications come from our involvement in the automation of semiconductor manufacturing techniques.
dc.format.extent	263 p.
dc.language	English
dc.language.iso	EN
dc.subject	Concept
dc.subject	Decision
dc.subject	Induction
dc.subject	Machine Learning
dc.subject	Multiple
dc.subject	Trees
dc.title	On the induction of decision trees for multiple concept learning.
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Applied Sciences
dc.description.thesisdegreediscipline	Artificial intelligence
dc.description.thesisdegreediscipline	Computer science
dc.description.thesisdegreediscipline	Electrical engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/128810/2/9208535.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: 9208535.pdf
Size:: 7.075MB
Format:: PDF
Description:: Access Restricted to UM users only.

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.