Liberal or Conservative: Evaluation and Classification with Distribution as Ground Truth.

Zhou, Daniel Xiaodan

Liberal or Conservative: Evaluation and Classification with Distribution as Ground Truth.

dc.contributor.author	Zhou, Daniel Xiaodan	en_US
dc.date.accessioned	2013-06-12T14:16:54Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2013-06-12T14:16:54Z
dc.date.issued	2013	en_US
dc.date.submitted	2013	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/97979
dc.description.abstract	The ability to classify the political leaning of a large number of articles and items is valuable to both academic research and practical applications. The challenge, though, is not only about developing innovative classification algorithms, which constitutes a “classifier” theme in this thesis, but also about how to define the “ground truth” of items’ political leaning, how to elicit labels when labelers do not agree, and how to evaluate classifiers with unreliable labeled data, which constitutes a “ground truth” theme in the thesis. The “ground truth” theme argues for the use of distributions (e.g., 0.6 conservative, 0.4 liberal) instead of labels (e.g, conservative, liberal) as the underlying ground truth of items’ political leaning, where disagreements among labelers are not human errors but rather useful information reflecting the distribution of people’s subjective opinions. Empirical data demonstrate that distributions are dispersed: there are many items upon which labelers simply do not agree. Therefore, mapping distributions into single labels requires more than just majority vote. Also, one can no longer assume the labels from a few labelers are reliable because a different small sample of labelers might yield a very different picture. However, even though individual labeled items are not reliable, simulation suggests that we may still reliably evaluate and rank classifiers, as long as we have a large number of labeled items for evaluation. The optimal way is to obtain one label per item with many items (e.g., 1000~3000) for evaluation. The “classifier” theme proposes the LabelPropagator algorithm that propagates the political leaning of known articles and users to the target nodes in order to classify them. LabelPropagator achieves higher accuracy than the alternative classifiers based on text analysis, suggesting that a relatively small number of labeled people and stories, together with a large number of people to item votes, can be used to classify the other people and items. An article’s source is useful as an input for propagation, while text similarities, users’ friendship, and “href” links to articles are not.	en_US
dc.language.iso	en_US	en_US
dc.subject	Political Leaning Classification	en_US
dc.subject	Classification Algorithm Evaluation	en_US
dc.title	Liberal or Conservative: Evaluation and Classification with Distribution as Ground Truth.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Information	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Resnick, Paul J.	en_US
dc.contributor.committeemember	Chen, Jowei	en_US
dc.contributor.committeemember	Adar, Eytan	en_US
dc.contributor.committeemember	Mei, Qiaozhu	en_US
dc.contributor.committeemember	Sami, Rahul	en_US
dc.subject.hlbsecondlevel	Science (General)	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/97979/1/mrzhou_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: mrzhou_1.pdf
Size:: 45.25MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.