Error Metrics for Large-Scale Digitization

Conway, Paul; Bronicki, Jacqueline

Error Metrics for Large-Scale Digitization

dc.contributor.author	Conway, Paul
dc.contributor.author	Bronicki, Jacqueline
dc.date.accessioned	2013-08-12T01:59:04Z
dc.date.available	2013-08-12T01:59:04Z
dc.date.issued	2012-09-10
dc.identifier.uri	https://hdl.handle.net/2027.42/99520
dc.description.abstract	The paper summarizes the methodology utilized in an ongoing project that is exploring quality issues in the large-scale digitization of books by third-party vendors – such as Google and the Internet Archive – that are preserved in the HathiTrust Digital Library. The paper describes the research foundation for the project and the model of digitization error that frames the data gathering effort. The heart of the paper is an overview of the metrics and methodologies developed in the project to apply the error model to statistically valid random samples of digital book-surrogates that represent the full range of source volumes digitized by Google and other third party vendors. Proportional and systematic sampling of page-images within each 1,000-volume sample produced a study set of 356,217 page images. Using custom-built web-enabled database systems, teams of trained coders have recorded perceived error in page-images on a severity scale of 0-5 for up to eleven possible errors. The paper concludes with a summary of ongoing research and the potential for future research derived from the present effort.	en_US
dc.description.sponsorship	National Science Foundation	en_US
dc.description.sponsorship	Institute for Museum and Library Services	en_US
dc.language.iso	en_US	en_US
dc.publisher	National Science Foundation	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Digitization, Error Measurement	en_US
dc.title	Error Metrics for Large-Scale Digitization	en_US
dc.type	Working Paper	en_US
dc.subject.hlbsecondlevel	Information and Library Science
dc.subject.hlbtoplevel	Social Sciences
dc.contributor.affiliationum	University of Michigan Library	en_US
dc.contributor.affiliationumcampus	Ann Arbor	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/99520/1/C8 Conway-Bronicki Digitization Error Metrics 2012.pdf
dc.identifier.source	Proceedings of the UNC/NSF Workshop Curating for Quality	en_US
dc.owningcollname	Information, School of (SI)

Files in this item

Name:: license_rdf
Size:: 1.5KB
Format:: application/rdf+xml

View/Open

Name:: C8 Conway-Bronicki Digitization ...
Size:: 413.1KB
Format:: PDF
Description:: Main article

View/Open

Information, School of (SI)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.