Coming soon! We are upgrading Deep Blue Data and excited to share the new, updated functionalities on the site. We welcome your comments and questions.

Deep Blue Data - Glossary of Terms

This page provides explanatory definitions of terms that are commonly used on Deep Blue Data. Rather than capturing an objective understanding of the terms, these definitions specify how the terms are used in the context of Deep Blue Data and its services.

If you come across an unfamiliar term while working with Deep Blue Data, please email us at and we will consider adding it to the glossary.

Collection: A grouping of Works in the Deep Blue Data repository. These groupings can be used by researchers or Research Data Services staff to bring together related Works on a single page. For example, a researcher who has deposited multiple datasets in Deep Blue Data on a similar topic or area may want to add the Works containing those datasets to a Collection, so that she or he can more conveniently locate and share those datasets with others. If you are interested in creating a Collection, please contact Research Data Services by emailing

Dataset: The complete unit of work consisting of research data that a researcher deposits into Deep Blue Data. A dataset includes all files or documents that contain research data, as well as any additional components providing information on how to access or understand those files’ contents, such as metadata and documentation.

Documentation: The information needed for people to understand, trust, and use a dataset. Documentation expands upon the information provided in a dataset’s metadata, providing further details about how the data were collected, processed, and analyzed.

Documentation can come in many forms. Some researchers create README documents (often a plain text file or PDF) or codebooks as separate files, which then become a part of the dataset. In other cases, documentation may be embedded within data files themselves; examples include a separate sheet in a Microsoft Excel file defining unfamiliar terms and variables or commented lines in a code file that explain decision points or algorithms. Every dataset can benefit from some form of documentation. For further explanation, recommendations, and resources, read the Documentation Guidelines provided in Deep Blue Data’s Help pages.

DOI: A unique, permanent digital code assigned to an object. DOI stands for Digital Object Identifier. DOIs are commonly used in academia to identify scholarly works and can be connected to metadata and URLs that point to an object’s online location or representation. As part of its mission to encourage data sharing and elevate research data as valuable scholarly output, Deep Blue Data works with DataCite, a DOI Registration Agency, to mint DOIs for datasets deposited to Deep Blue Data. For a more detailed discussion of DOIs for datasets, visit DataCite’s website.

Metadata: Descriptive information that defines key attributes about a dataset. When researchers create metadata for their datasets, others in their community of practice can more easily discover the data and assess its relevance to their work. As metadata are designed to be discoverable through online search engines, researchers preparing metadata should consider what details others will need to find and connect with the dataset.

When depositing to Deep Blue Data, researchers will first encounter metadata in the submission form, where they will be asked to enter information in fields such as Creator, Title, Description, Methods, and Date Coverage. The metadata elements used for the repository are based on the Dublin Core metadata standard. For detailed instructions on completing the submission form (and creating metadata for your dataset), read the Metadata Guidelines provided in Deep Blue Data’s Help pages.

Open access: A publishing model that seeks to make scholarly output, which is often indirectly paid for by tax dollars, freely and readily available to the public. As an Open Access repository, Deep Blue Data allows anyone with access to the Internet to view Works online and download files. Allowed uses of those files are specified by a Creative Commons license selected by the depositor.

Open format: A file format with publicly available documentation or specifications. These formats can be opened with a variety of software and hardware systems and are often associated with widely accepted professional standards. Quintessential examples are Plain Text (.txt) and Comma Separated Values (.csv) files. Deep Blue Data can provide superior preservation services for open formats as compared with proprietary formats.

Proprietary format: A file format that is commercially developed, which often means that its specifications are not publicly available and/or files in that format can only be used with specialized software or hardware. Though Deep Blue Data accepts files in all formats, the repository’s ability to preserve proprietary formats is limited; therefore, files in open formats are preferred. For more details about Deep Blue Data’s Preservation Policy, visit the “Policy and Terms of Use page.

Research data: Representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. Functionally, this means any data produced during the process of research in any discipline.

While operating under a broad definition of the term, Deep Blue Data only accepts research data when certain conditions related to the depositor’s university affiliation and the contents of the deposit are met. For more details about these conditions, visit Deep Blue Data’s “About” page or the “Policy and Terms of Use" page.

Sensitive data: Data that if published could result in harm to research subjects through the disclosure of information that is considered confidential or private. Deep Blue Data does not accept sensitive data. However, in some cases, data can be de-identified or otherwise modified to allow for their publication. Researchers with questions or concerns about the sensitivity of their data should reach out to Research Ethics & Compliance, a unit of the University of Michigan’s Office of Research.

Work: The main organizational unit in the Deep Blue Data repository. When you deposit a dataset, you are creating a Work, which will contain all of the data and documentation files in your dataset and display the metadata you provided in the submission form.

Have Questions? Need Help?

Please contact Research Data Services -

This document was last updated on May 15, 2018