Automating the Capture of Data Transformation Metadata from Statistical Analysis Software

Deep Blue Home

Show simple item record Alter, George Donakowski, Darrell Gager, Jack Heus, Pascal Hunter, Carson Ionescu, Sanda Iverson, Jeremy Jagadish, H V Lagoze, Carl Lyle, Jared Mueller, Alexander Revheim, Sigbjørn Richardson, Matthew Risnes, Ørnulf Seelam, Karunakara Smith, Dan Smith, Tom Song, Jie Vaidya, Yashas Jaydeep Voldsater, Ole 2020-07-06T18:33:59Z 2020-07-06T18:33:59Z 2020-07-06
dc.description.abstract The C2Metadata (“Continuous Capture of Metadata for Statistical Data”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. The C2Metadata Project creates a metadata workflow paralleling the data management process by deriving provenance information from scripts used to manage and transform data. C2Metadata differs from most previous data provenance initiatives by documenting transformations at the variable level rather than describing a sequence of opaque programs. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. Better data documentation makes research more transparent and expands the discovery and re-use of research data. en_US
dc.description.sponsorship National Science Foundation grant ACI-1640575 en_US
dc.language.iso en_US en_US
dc.subject metadata, data sharing, statistical analysis en_US
dc.title Automating the Capture of Data Transformation Metadata from Statistical Analysis Software en_US
dc.type Article en_US
dc.subject.hlbsecondlevel Statistics and Numeric Data
dc.subject.hlbtoplevel Social Sciences
dc.contributor.affiliationum Inter-university Consortium for Political and Social Research, Center for Political Studies, Computer Science and Engineering en_US
dc.contributor.affiliationother Colectica Inc., Metadata Technology North America Inc., Norwegian Centre for Research Data, NORC en_US
dc.contributor.affiliationumcampus Ann Arbor en_US
dc.identifier.orcid 0000-0003-3823-4972 en_US Alter, George; 0000-0003-3823-4972 en_US
dc.owningcollname Inter-university Consortium for Political and Social Research (ICPSR)
 Show simple item record

This item appears in the following Collection(s)

Search Deep Blue

Browse by

My Account


Coming Soon

MLibrary logo