Automating the Capture of Data Transformation Metadata from Statistical Analysis Software

Deep Blue Home

Show simple item record

dc.contributor.author Alter, George
dc.contributor.author Donakowski, Darrell
dc.contributor.author Gager, Jack
dc.contributor.author Heus, Pascal
dc.contributor.author Hunter, Carson
dc.contributor.author Ionescu, Sanda
dc.contributor.author Iverson, Jeremy
dc.contributor.author Jagadish, H V
dc.contributor.author Lagoze, Carl
dc.contributor.author Lyle, Jared
dc.contributor.author Mueller, Alexander
dc.contributor.author Revheim, Sigbjørn
dc.contributor.author Richardson, Matthew
dc.contributor.author Risnes, Ørnulf
dc.contributor.author Seelam, Karunakara
dc.contributor.author Smith, Dan
dc.contributor.author Smith, Tom
dc.contributor.author Song, Jie
dc.contributor.author Vaidya, Yashas Jaydeep
dc.contributor.author Voldsater, Ole
dc.date.accessioned 2020-07-06T18:33:59Z
dc.date.available 2020-07-06T18:33:59Z
dc.date.issued 2020-07-06
dc.identifier.uri http://hdl.handle.net/2027.42/156014
dc.description.abstract The C2Metadata (“Continuous Capture of Metadata for Statistical Data”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. The C2Metadata Project creates a metadata workflow paralleling the data management process by deriving provenance information from scripts used to manage and transform data. C2Metadata differs from most previous data provenance initiatives by documenting transformations at the variable level rather than describing a sequence of opaque programs. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. Better data documentation makes research more transparent and expands the discovery and re-use of research data. en_US
dc.description.sponsorship National Science Foundation grant ACI-1640575 en_US
dc.language.iso en_US en_US
dc.subject metadata, data sharing, statistical analysis en_US
dc.title Automating the Capture of Data Transformation Metadata from Statistical Analysis Software en_US
dc.type Article en_US
dc.subject.hlbsecondlevel Statistics and Numeric Data
dc.subject.hlbtoplevel Social Sciences
dc.contributor.affiliationum Inter-university Consortium for Political and Social Research, Center for Political Studies, Computer Science and Engineering en_US
dc.contributor.affiliationother Colectica Inc., Metadata Technology North America Inc., Norwegian Centre for Research Data, NORC en_US
dc.contributor.affiliationumcampus Ann Arbor en_US
dc.description.bitstreamurl https://deepblue.lib.umich.edu/bitstream/2027.42/156014/3/Automating_metadata_capture_v15.pdf
dc.identifier.orcid 0000-0003-3823-4972 en_US
dc.identifier.name-orcid Alter, George; 0000-0003-3823-4972 en_US
dc.owningcollname Inter-university Consortium for Political and Social Research (ICPSR)
 Show simple item record

This item appears in the following Collection(s)


Search Deep Blue

Browse by

My Account

Information

Coming Soon


MLibrary logo