Show simple item record

Automating the Capture of Data Transformation Metadata from Statistical Analysis Software

dc.contributor.authorAlter, George
dc.contributor.authorDonakowski, Darrell
dc.contributor.authorGager, Jack
dc.contributor.authorHeus, Pascal
dc.contributor.authorHunter, Carson
dc.contributor.authorIonescu, Sanda
dc.contributor.authorIverson, Jeremy
dc.contributor.authorJagadish, H V
dc.contributor.authorLagoze, Carl
dc.contributor.authorLyle, Jared
dc.contributor.authorMueller, Alexander
dc.contributor.authorRevheim, Sigbjørn
dc.contributor.authorRichardson, Matthew
dc.contributor.authorRisnes, Ørnulf
dc.contributor.authorSeelam, Karunakara
dc.contributor.authorSmith, Dan
dc.contributor.authorSmith, Tom
dc.contributor.authorSong, Jie
dc.contributor.authorVaidya, Yashas Jaydeep
dc.contributor.authorVoldsater, Ole
dc.date.accessioned2020-07-06T18:33:59Z
dc.date.available2020-07-06T18:33:59Z
dc.date.issued2020-07-06
dc.identifier.urihttps://hdl.handle.net/2027.42/156014
dc.description.abstractThe C2Metadata (“Continuous Capture of Metadata for Statistical Data”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. The C2Metadata Project creates a metadata workflow paralleling the data management process by deriving provenance information from scripts used to manage and transform data. C2Metadata differs from most previous data provenance initiatives by documenting transformations at the variable level rather than describing a sequence of opaque programs. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. Better data documentation makes research more transparent and expands the discovery and re-use of research data.en_US
dc.description.sponsorshipNational Science Foundation grant ACI-1640575en_US
dc.language.isoen_USen_US
dc.subjectmetadata, data sharing, statistical analysisen_US
dc.titleAutomating the Capture of Data Transformation Metadata from Statistical Analysis Softwareen_US
dc.typeArticleen_US
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelSocial Sciences
dc.contributor.affiliationumInter-university Consortium for Political and Social Research, Center for Political Studies, Computer Science and Engineeringen_US
dc.contributor.affiliationotherColectica Inc., Metadata Technology North America Inc., Norwegian Centre for Research Data, NORCen_US
dc.contributor.affiliationumcampusAnn Arboren_US
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/156014/3/Automating_metadata_capture_v15.pdf
dc.identifier.orcid0000-0003-3823-4972en_US
dc.identifier.name-orcidAlter, George; 0000-0003-3823-4972en_US
dc.owningcollnameInter-university Consortium for Political and Social Research (ICPSR)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.