Show simple item record

Error Metrics for Large-Scale Digitization

dc.contributor.authorConway, Paul
dc.contributor.authorBronicki, Jacqueline
dc.date.accessioned2013-08-12T01:59:04Z
dc.date.available2013-08-12T01:59:04Z
dc.date.issued2012-09-10
dc.identifier.urihttps://hdl.handle.net/2027.42/99520
dc.description.abstractThe paper summarizes the methodology utilized in an ongoing project that is exploring quality issues in the large-scale digitization of books by third-party vendors – such as Google and the Internet Archive – that are preserved in the HathiTrust Digital Library. The paper describes the research foundation for the project and the model of digitization error that frames the data gathering effort. The heart of the paper is an overview of the metrics and methodologies developed in the project to apply the error model to statistically valid random samples of digital book-surrogates that represent the full range of source volumes digitized by Google and other third party vendors. Proportional and systematic sampling of page-images within each 1,000-volume sample produced a study set of 356,217 page images. Using custom-built web-enabled database systems, teams of trained coders have recorded perceived error in page-images on a severity scale of 0-5 for up to eleven possible errors. The paper concludes with a summary of ongoing research and the potential for future research derived from the present effort.en_US
dc.description.sponsorshipNational Science Foundationen_US
dc.description.sponsorshipInstitute for Museum and Library Servicesen_US
dc.language.isoen_USen_US
dc.publisherNational Science Foundationen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectDigitization, Error Measurementen_US
dc.titleError Metrics for Large-Scale Digitizationen_US
dc.typeWorking Paperen_US
dc.subject.hlbsecondlevelInformation and Library Science
dc.subject.hlbtoplevelSocial Sciences
dc.contributor.affiliationumUniversity of Michigan Libraryen_US
dc.contributor.affiliationumcampusAnn Arboren_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/99520/1/C8 Conway-Bronicki Digitization Error Metrics 2012.pdf
dc.identifier.sourceProceedings of the UNC/NSF Workshop Curating for Qualityen_US
dc.owningcollnameInformation, School of (SI)


Files in this item

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.