Title: Stewardship Gap Project Bibliography Data Open Access Deposited
|Other Funding agency|
|Citations to related material|
|Related items in Deep Blue|
(2018). Stewardship Gap Project Bibliography Data [Data set]. University of Michigan - Deep Blue. https://doi.org/10.7302/Z2ZW1J47
* Dataset Title
* Dataset Creators
* Contact Information
* Research Overview
* File Inventory
* Definition of Terms and Variables
* Suggested citation
Stewardship Gap Project Bibliography Data
The data were collected as part of the Stewardship Gap project, an 18-month study funded by the Alfred P. Sloan Foundation and led by Co-PIs Myron Gutmann and Francine Berman to investigate how research data and creative outputs supported by public or non-profit funding in the United States are being stewarded. These data were collected between August 6, 2015 and July 18, 2016 as part of a literature search of sources about research data stewardship and relate most directly to our work describing âWhat We Know About the Stewardship Gapâ (e.g., ). In this work, we categorized âgapsâ in stewardship identified in the literature, how the gaps were related to one another, and efforts to measure or develop metrics for measuring the gaps.
We identified the works listed in "Stewardship Gap Bibliography Data.xlsx" file through a variety of methods, including searching for topics related to stewardship and curation in and across databases (e.g., using services such as Google Scholar and cross-database aggregation services such as Summon), and analyzing cited references in relevant articles, reports, and projects. The works have a geographic bias towards North America and Europe and are biased as well to those in English.
We identified a total of 404 works as having a potential relevance to the project. These works were then reviewed again according to specific criteria. Works that met the criteria were then assigned to the relevant sample (A, B, or C). An empty cell indicates that the work was not assigned to this sample.
The first sample (Sample A) was a body of 87 works, including literature reviews, reports, and empirical research that we analyzed to discover what scholars and practitioners identify as challenges to data stewardship.
We conducted descriptive coding of this sample, from which we identified three levels of stewardship gap âareasâ and âsub-areas.â At the highest level we defined six gap areas, which we describe in our written research (e.g., ). We arrived at these by distilling 14 broader gap areas, which we aggregated from 56 more granular gap sub-areas. The 14 broader gap areas and sub-areas are represented in the Stewardship Gap Project Bibliography Data.
The second sample (Sample B) comprised 74 works selected out of the 87 works from Sample A. In addition to identifying challenges to data stewardship, the authors of these 74 works also identified relationships between the challenges (e.g., challenges that cause or exacerbate others). We interpreted these relationships according to the gap areas we identified to better understand the potential impact that our 14 gap areas could have on one another.
The third sample (Sample C) is a set of 142 works, some of which are included in Sample A and Sample B, that explicitly sought to measure stewardship gap areas and sub-areas, or articulate metrics for measuring them. Sample C excluded reports and other works that, for instance, articulated strategies or ideas for addressing stewardship challenges but did not conduct empirical research to measure at least one of our identified gap areas or theoretical research to identify what might be measured.
We limited works in Sample C for the most part to those dealing explicitly with research data (as opposed, for example, to preservation of digitized cultural materials), though there were a few others. These include studies that investigated the total amount of digital information, studies targeted toward digital curation skills broadly (but that include consideration for research data) and some studies that investigated public sector or government information.
We conducted initial stages of coding of all samples using a combination of spreadsheets and the Web-based tool Workflowy. We subsequently kept track of article codes using spreadsheets and a MySQL database associated with a Drupal 7 website.
Stewardship Gap Bibliography Data.xlsx
DEFINITION OF TERMS AND VARIABLES
Column A: Citations of works included in the literature review
Column B: Works included in Sample A, in which we found evidence of the stewardship gap areas we identified. Works in the sample are indicated by the term "gap_evidence."
Column C: Works included in Sample B, in which we identified relationships between stewardship gap areas. Works in the sample are indicated by the term "gap_relationships."
Column D: Works included in Sample C that sought to measure stewardship gap areas. Works in the sample are indicated by cells containing gap areas. The first term in the gap area indicates which of the 14 broad gap areas we identified and the second indicates which of the relevant 56 sub-areas we identified. Gap areas and sub-areas are separated from one another by commas, e.g., "Knowledge: Amount of data, IT: Lack of infrastructure."
Column E: Works included in Sample C that provided metrics for measuring stewardship gap areas. Works in the sample are indicated by cells containing gap areas. The syntax and meaning of the terms in the cells is the same as Column D.
Column F: As we describe in our research (e.g., ) we categorized works in Sample C on the one hand as being either "measurement" or "metrics" studies, and on the other as being either "targeted" or "wider" studies. âTargetedâ studies focused on one or two closely related gap areas, such as Knowledge or Sharing and Access. "Widerâ studies investigated several different gap areas at once, often in the context of an institutional study.
Column F indicates whether the cited work was a measurement or metrics study, or both, or a targeted or wider study, or both. This determination was only made for works in sample C.
Column G: Indicates which of the 14 high-level or broad gap areas we found evidence of in the cited work. This determination was only made for works in sample C.
Column H: At times we found multiple publications associated with the same project. Our procedure was to record findings only once if no new dimensions of the study were described in additional publications. The values in this column vary between the name of the actual project (e.g., the 4C Project or RECODE) and generic taxonomy terms (e.g., Duplicate 3). Attention should be paid to the sameness of terms (e.g., all citations coded with Duplicate 3 or 4C Project) in identifying similar studies.
The 14 broad gap areas we identified are below. Abbreviations used in the full list of 56 gap areas (immediately following) are in parentheses:
Sustainability Planning (SP)
Legal / Policy (LP)
Human Resources (HR)
Infrastructure and Tools (IT)
Curation, Management, Preservation (CMP)
Sharing and Access (SA)
All 56 gap areas include:
Culture: Archive mandates and objectives
Culture: Data definition
Culture: Demand for data
Culture: Evaluation of quality
Culture: Identifying what is valuable
Culture: Intellectual property
Culture: Research and development culture
Culture: Sharing attitudes and practices
Culture: Stewardship priority
Knowledge: Amount of data
Knowledge: Challenges of enabling data reuse
Knowledge: Costs of stewardship
Knowledge: How to preserve
Knowledge: Infrastructure for stewardship
Knowledge: Provenance and authenticity
Knowledge: Reuse possibilities
Knowledge: Where to deposit data
Resp: Conduct stewardship activities
Resp: Coordinate stewardship activities
Resp: Support stewardship activities
Commit: Duration of commitment
Commit: Extent of commitment
Commit: Lack of commitment
SP: Business and economic models
SP: Design and staffing of organizations
SP: Dynamic and adaptable infrastructure
SP: Lack of strategy and planning
LP: Deficiencies that inhibit stewardship, access, and use
LP: Incentives that support stewardship, access, and use
LP: Institutional structures and pressures
LP: Lack of consistency and alignment
Funding: Imbalance in funding
Funding: Lack of funding
Collab: Challenges forming partnerships
Collab: Lack of collaboration
Collab: Lack of critical mass
Collab: Support structures
HR: Lack of people
HR: Lack of skills
HR: Lack of support for data management
HR: Unequal access to resources and expertise
HR: Uneven distribution of skills
IT: Different timescales of infrastructure development and maturity
IT: Difficulty meeting generalized and special needs
IT: Lack of infrastructure
IT: Lack of tools
CMP: Difficulty establishing the trustworthiness of curated data
CMP: Difficulty maintaining the integrity of data over time
CMP: Difficulty managing data for reuse
CMP: Fragmented data management
CMP: Insufficient data curation or management
CMP: Tradeoffs between data management for short or long term
SA: Sharing and Access
A full description and a list of indicators for each of the broad gap areas can be found in our work "What do we know about The Stewardship Gap?" paper (e.g., ).
SUGGESTED CITATION FOR THIS DATA SET
York, J., Gutmann, M., Berman, F., 2018. Stewardship Gap Project Bibliography Data. University of Michigan. DOI: http://dx.doi.org/10.7302/Z2ZW1J47
 York, J., Gutmann, M., Berman, F., 2016. What Do We Know About The Stewardship Gap? https://deepblue.lib.umich.edu/handle/2027.42/122726