CONTENTS --------- * Dataset Title * Dataset Creators * Contact Information * Research Overview * Methods * File Inventory * Definition of Terms and Variables * Suggested citation DATASET TITLE ------------- Stewardship Gap Project Bibliography Data DATASET CREATORS ---------------- York, Jeremy Gutmann, Myron Berman, Francine CONTACT INFORMATION ------------------- jjyork@umich.edu RESEARCH OVERVIEW ----------------- The data were collected as part of the Stewardship Gap project, an 18-month study funded by the Alfred P. Sloan Foundation and led by Co-PIs Myron Gutmann and Francine Berman to investigate how research data and creative outputs supported by public or non-profit funding in the United States are being stewarded. These data were collected between August 6, 2015 and July 18, 2016 as part of a literature search of sources about research data stewardship and relate most directly to our work describing “What We Know About the Stewardship Gap” (e.g., [1]). In this work, we categorized “gaps” in stewardship identified in the literature, how the gaps were related to one another, and efforts to measure or develop metrics for measuring the gaps. METHODS ------- We identified the works listed in "Stewardship Gap Bibliography Data.xlsx" file through a variety of methods, including searching for topics related to stewardship and curation in and across databases (e.g., using services such as Google Scholar and cross-database aggregation services such as Summon), and analyzing cited references in relevant articles, reports, and projects. The works have a geographic bias towards North America and Europe and are biased as well to those in English. We identified a total of 404 works as having a potential relevance to the project. These works were then reviewed again according to specific criteria. Works that met the criteria were then assigned to the relevant sample (A, B, or C). An empty cell indicates that the work was not assigned to this sample. The first sample (Sample A) was a body of 87 works, including literature reviews, reports, and empirical research that we analyzed to discover what scholars and practitioners identify as challenges to data stewardship. We conducted descriptive coding of this sample, from which we identified three levels of stewardship gap “areas” and “sub-areas.” At the highest level we defined six gap areas, which we describe in our written research (e.g., [1]). We arrived at these by distilling 14 broader gap areas, which we aggregated from 56 more granular gap sub-areas. The 14 broader gap areas and sub-areas are represented in the Stewardship Gap Project Bibliography Data. The second sample (Sample B) comprised 74 works selected out of the 87 works from Sample A. In addition to identifying challenges to data stewardship, the authors of these 74 works also identified relationships between the challenges (e.g., challenges that cause or exacerbate others). We interpreted these relationships according to the gap areas we identified to better understand the potential impact that our 14 gap areas could have on one another. The third sample (Sample C) is a set of 142 works, some of which are included in Sample A and Sample B, that explicitly sought to measure stewardship gap areas and sub-areas, or articulate metrics for measuring them. Sample C excluded reports and other works that, for instance, articulated strategies or ideas for addressing stewardship challenges but did not conduct empirical research to measure at least one of our identified gap areas or theoretical research to identify what might be measured. We limited works in Sample C for the most part to those dealing explicitly with research data (as opposed, for example, to preservation of digitized cultural materials), though there were a few others. These include studies that investigated the total amount of digital information, studies targeted toward digital curation skills broadly (but that include consideration for research data) and some studies that investigated public sector or government information. We conducted initial stages of coding of all samples using a combination of spreadsheets and the Web-based tool Workflowy. We subsequently kept track of article codes using spreadsheets and a MySQL database associated with a Drupal 7 website. FILE INVENTORY -------------- Stewardship Gap Bibliography Data.xlsx Readme.txt DEFINITION OF TERMS AND VARIABLES --------------------------------- Column A: Citations of works included in the literature review Column B: Works included in Sample A, in which we found evidence of the stewardship gap areas we identified. Works in the sample are indicated by the term "gap_evidence." Column C: Works included in Sample B, in which we identified relationships between stewardship gap areas. Works in the sample are indicated by the term "gap_relationships." Column D: Works included in Sample C that sought to measure stewardship gap areas. Works in the sample are indicated by cells containing gap areas. The first term in the gap area indicates which of the 14 broad gap areas we identified and the second indicates which of the relevant 56 sub-areas we identified. Gap areas and sub-areas are separated from one another by commas, e.g., "Knowledge: Amount of data, IT: Lack of infrastructure." Column E: Works included in Sample C that provided metrics for measuring stewardship gap areas. Works in the sample are indicated by cells containing gap areas. The syntax and meaning of the terms in the cells is the same as Column D. Column F: As we describe in our research (e.g., [1]) we categorized works in Sample C on the one hand as being either "measurement" or "metrics" studies, and on the other as being either "targeted" or "wider" studies. “Targeted” studies focused on one or two closely related gap areas, such as Knowledge or Sharing and Access. "Wider” studies investigated several different gap areas at once, often in the context of an institutional study. Column F indicates whether the cited work was a measurement or metrics study, or both, or a targeted or wider study, or both. This determination was only made for works in sample C. Column G: Indicates which of the 14 high-level or broad gap areas we found evidence of in the cited work. This determination was only made for works in sample C. Column H: At times we found multiple publications associated with the same project. Our procedure was to record findings only once if no new dimensions of the study were described in additional publications. The values in this column vary between the name of the actual project (e.g., the 4C Project or RECODE) and generic taxonomy terms (e.g., Duplicate 3). Attention should be paid to the sameness of terms (e.g., all citations coded with Duplicate 3 or 4C Project) in identifying similar studies. The 14 broad gap areas we identified are below. Abbreviations used in the full list of 56 gap areas (immediately following) are in parentheses: Culture Knowledge Responsibility (Resp) Commitment (Commit) Sustainability Planning (SP) Legal / Policy (LP) Funding Collaboration (Collab) Human Resources (HR) Infrastructure and Tools (IT) Curation, Management, Preservation (CMP) Sharing and Access (SA) Discovery (Disc) Reuse All 56 gap areas include: Culture: Archive mandates and objectives Culture: Data definition Culture: Demand for data Culture: Evaluation of quality Culture: Identifying what is valuable Culture: Intellectual property Culture: Research and development culture Culture: Sharing attitudes and practices Culture: Standards Culture: Stewardship priority Knowledge: Amount of data Knowledge: Challenges of enabling data reuse Knowledge: Costs of stewardship Knowledge: How to preserve Knowledge: Infrastructure for stewardship Knowledge: Provenance and authenticity Knowledge: Reuse possibilities Knowledge: Where to deposit data Resp: Conduct stewardship activities Resp: Coordinate stewardship activities Resp: Support stewardship activities Commit: Duration of commitment Commit: Extent of commitment Commit: Lack of commitment SP: Business and economic models SP: Design and staffing of organizations SP: Dynamic and adaptable infrastructure SP: Lack of strategy and planning LP: Deficiencies that inhibit stewardship, access, and use LP: Incentives that support stewardship, access, and use LP: Institutional structures and pressures LP: Lack of consistency and alignment Funding: Imbalance in funding Funding: Lack of funding Collab: Challenges forming partnerships Collab: Lack of collaboration Collab: Lack of critical mass Collab: Support structures HR: Lack of people HR: Lack of skills HR: Lack of support for data management HR: Unequal access to resources and expertise HR: Uneven distribution of skills IT: Different timescales of infrastructure development and maturity IT: Difficulty meeting generalized and special needs IT: Lack of infrastructure IT: Lack of tools CMP: Difficulty establishing the trustworthiness of curated data CMP: Difficulty maintaining the integrity of data over time CMP: Difficulty managing data for reuse CMP: Fragmented data management CMP: Insufficient data curation or management CMP: Tradeoffs between data management for short or long term SA: Sharing and Access Disc: Discovery Reuse: Reuse A full description and a list of indicators for each of the broad gap areas can be found in our work "What do we know about The Stewardship Gap?" paper (e.g., [1]). SUGGESTED CITATION FOR THIS DATA SET ------------------------------------ York, J., Gutmann, M., Berman, F., 2018. Stewardship Gap Project Bibliography Data. University of Michigan. DOI: http://dx.doi.org/10.7302/Z2ZW1J47 [1] York, J., Gutmann, M., Berman, F., 2016. What Do We Know About The Stewardship Gap? https://deepblue.lib.umich.edu/handle/2027.42/122726