Guide to Creating Documentation for Your Dataset in Deep Blue Data
Research Data Services – firstname.lastname@example.org
Why documentation is important
To fully understand, trust, and make use of a dataset created by someone else, researchers often need more than the data files: they need a detailed understanding of the context in which the dataset was created; the methods used to collect, process, and analyze the data; and what the values and terms in the data signify. Best practices in research data curation point to including documentation within a dataset itself as a dependable way to convey comprehensive information about the data, and the research that produced it, to interested members of academic communities and the general public.
Some of this information may be provided in journal articles, monographs, or other texts reporting research findings. However, because they are typically written to describe the methods and results, research publications often leave out key details about the data itself, including how data files are structured; definitions of headings, variables, and other terms used in the data; and how the individual components of the dataset fit together. Even when this information is provided in a research publication, the text may not be easily accessible to those seeking to understand and make use of the data.
What we mean by documentation and metadata
Some data curation experts and practitioners employ the terms documentation and metadata interchangeably. Deep Blue Data chooses to differentiate between the two as, in the context of the repository, they serve different purposes.
Metadata here refer to the high-level information that researchers provide when depositing data in Deep Blue Data and include the Title, Creator, Description, Methods, Keywords, and other descriptive information. Metadata help people who may be interested in a dataset to find it and understand it well enough to decide whether or not they would like to download it from Deep Blue Data.
The information included in documentation is more expansive than what is provided in the metadata. While metadata helps users to discover and identify datasets of interest, documentation provides a richer understanding and establishes trust in the data. Ideally, the documentation would be comprehensive enough to enable others to reuse the data for new projects or to reproduce the research to verify its findings. Documentation can come in many forms depending on the nature of the data and the research; examples include a codebook, a field notebook, or a README file.
What we recommend including in documentation
To facilitate usability of the datasets on Deep Blue Data, Research Data Services requests that researchers include documentation to accompany the data either in separate files (see this readme as an example) or embedded within the data files themselves. Ideally, documentation would be produced throughout the research process, making it easy to compile and combine with the data before deposit, but we recognize that this does not always happen. Research Data Services offers educational programming and consultation on developing documentation to accompany a dataset deposited into Deep Blue Data or elsewhere. Please contact us if you would like to take advantage of our services.
While the practice of creating documentation for data is broadly supported (see bibliography below), the recommended format and content varies between fields, datasets, and repositories. We would be happy to work with you to identify best practices for documenting data in your field of study and to consider how these practices might be applied to your dataset.
Though we acknowledge practices and needs for describing research data will be determined case by case, the Research Data Services team feels that the following areas should be covered at minimum in documentation:
- Research Overview: A summary of the subject and purpose of the research, as well as who conducted the research, where and when it took place, and the research funding sources.
- Methods: An account of how the research was conducted, with a focus on how data were collected, processed, and analyzed. Sampling procedures, instruments, and software used should be described to convey how data were produced and transformed.
- File Inventory: A list of each of the files included in the dataset, including a brief statement of what each file contains and its purpose. An explanation of the organization of the files, including relationships between files, should be considered as well.
- Definition of Terms and Variables: A glossary that specifies clearly and without jargon the meaning of ambiguous terms, obscure procedures, variables, and/or units that appear throughout the dataset. This may take the form of a list of definitions, a data dictionary in a spreadsheet, or comment lines distributed through a program.
- Use and Access: (if needed) Instructions on how to open, run or make use of the data files.
- Suggest a Citation for the Data Set: (if needed) Adding a suggested citation for the data set will encourage people to give you attribution for your work.
Have Questions? Need Help?
Please contact Research Data Services - email@example.com
This document was last updated on May 15, 2018