A co-training algorithm for multi-view data with applications in data fusion

Culp, Mark; Michailidis, George

A co-training algorithm for multi-view data with applications in data fusion

dc.contributor.author	Culp, Mark	en_US
dc.contributor.author	Michailidis, George	en_US
dc.date.accessioned	2009-07-06T15:37:11Z
dc.date.available	2010-08-02T17:56:56Z	en_US
dc.date.issued	2009-06	en_US
dc.identifier.citation	Culp, Mark; Michailidis, George (2009). "A co-training algorithm for multi-view data with applications in data fusion." Journal of Chemometrics 23(6): 294-303. <http://hdl.handle.net/2027.42/63041>	en_US
dc.identifier.issn	0886-9383	en_US
dc.identifier.issn	1099-128X	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/63041
dc.description.abstract	In several scientific applications, data are generated from two or more diverse sources (views) with the goal of predicting an outcome of interest. Often it is the case that the outcome is not associated with any single view. However, the synergy of all measurements from each view may yield a more predictive classifier. For example, consider a drug discovery application in which individual molecules are described partially by several assay screens based on diverse profiles and partially by their chemical structural fingerprints. A common classification problem is to determine whether the molecule is associated with a particular disease. In this paper, a co-training algorithm is developed to utilize data from diverse sources to predict the common class variable. Novel enhancements for variable importance, robustness to a mislabeled class variable, and a technique to handle unbalanced classes are applied to the motivating data set, highlighting that the approach attains strong performance and provides useful diagnostics for data analytic purposes. In addition, comparisons to a framework with data fusion using partial least squares (PLS) are also assessed on real data. An R package for performing the proposed approach is provided as Supporting information. Copyright © 2003 John Wiley & Sons, Ltd.	en_US
dc.format.extent	38428 bytes
dc.format.extent	3118 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.publisher	John Wiley & Sons, Ltd.	en_US
dc.subject.other	Chemistry	en_US
dc.subject.other	Analytical Chemistry and Spectroscopy	en_US
dc.title	A co-training algorithm for multi-view data with applications in data fusion	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Chemical Engineering	en_US
dc.subject.hlbsecondlevel	Chemistry	en_US
dc.subject.hlbsecondlevel	Materials Science and Engineering	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.contributor.affiliationum	Department of Statistics, University of Michigan, Ann Arbor, MI, USA	en_US
dc.contributor.affiliationother	Department of Statistics, West Virginia University, Morgantown, WV, USA ; Department of Statistics, West Virginia University, Morgantown, WV, USA.	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/63041/1/cem_1233_sm_suppmaterial.pdf
dc.identifier.doi	10.1002/cem.1233	en_US
dc.identifier.source	Journal of Chemometrics	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: cem_1233_sm_suppmaterial.pdf
Size:: 37.52KB
Format:: PDF

View/Open

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.