RLG
 Feature Article 2  

A Digital Decade: Where Have We Been and Where Are We Going in Digital Preservation?

Author: Nancy Y. McGovern - ICPSR (nancymcg@umich.edu)

There has been measurable progress in the digital preservation community since the seminal work Preserving Digital Information: Final Report and Recommendations was published by the commission of the Commission on Preservation and Access and RLG more than a decade ago. Those concerned about digital preservation in 1996 did not have the Open Archival Information System (OAIS) standard to frame the development and discussion of digital preservation developments; or a set of attributes of trusted digital repository to delineate the organizational context for digital preservation; or a data dictionary for preservation metadata; or the concept of institutional repositories made real by a range of software options. All of these developments have emerged within the past decade. Today, we have conferences that are entirely devoted to digital preservation (e.g., the International Preservation (iPres)) conference and peer-reviewed journals for digital preservation, (e.g., The International Journal of Digital Curation). One can follow the maturation of the digital preservation community in a decade of RLG DigiNews articles.

Originally focused on “the converging fields of preservation and digitization,” the first article to specifically address digital preservation appeared in RLG DigiNews in 1998. In 2000, the RLG DigiNews editorial staff significantly expanded the coverage of digital preservation, highlighting articles with the now familiar v11_n1_art2_img0 symbol, which added digital to the established infinity notation from print preservation. The cumulative contribution by RLG DigiNews to the digital preservation literature over the past decade includes more than fifty feature articles plus a sequence of highlighted websites and FAQs. These articles and other features stressed practical steps in digital preservation with an emphasis on the development and evaluation of relevant strategies, applications of research results, the integration and use of tools, and national and community-level agendas.

This tenth anniversary review of digital preservation developments takes an informal gap analysis approach, measuring where we are (the “as is”) against where we might like to be (the “to be”). This gap analysis has three components reflecting the core aspects of digital preservation: organizational infrastructure, technological infrastructure, and requisite resources.

v11_n1_art2_img1
Figure 1. Three-Legged Stool for Digital Preservation.

These three components comprise the three-legged stool for digital preservation (Figure 1), a concept developed at Cornell for the Digital Preservation Management (DPM) Workshop series, that was funded by the National Endowment for the Humanities from 2003-2006. The workshop curriculum uses the three-legged stool as a means for an organization to assess its development within the context of a maturity model comprised of five sequential stages: acknowledge, act, consolidate, institutionalize, and externalize.[1] This review takes a more basic step by considering the status of the three legs of the stool within the community from the “as is” and “to be” perspectives.v11_n1_art2_img3

The Organizational Leg

The organizational leg determines the “what” of digital preservation—the mandate, the scope, the objectives, the staffing of an organization—for engaging in digital preservation. Ten years ago, the organizational leg was arguably the weakest leg as evidenced by the general absence of explicit mission statements that referenced digital preservation, policies that specifically addressed the preservation of digital assets and sustained digital preservation programs within organizations.

The “As Is

There have been several important developments for the organizational leg over the past ten years, including the development and promulgation of the RLG/OCLC report on the Attributes of a Trusted Digital Repository (TDR), an increase in the development of digital preservation policies by organizations, and an acknowledgement of the central role of procedural accountability for audit and certification.

Trusted digital repositories
TDR represents the best expression of the organizational leg for digital preservation and has become a de facto standard for the digital preservation community since its release in 2002. Prior to the development of TDR, the community had no formal expression of the organizational context for digital preservation.

v11_n1_art2_img
Figure 2. The Cornell Model for Trusted Digital Repository Attributes.

The Trusted Digital Repositories document defines seven attributes of a conformant organization: OAIS compliance, administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The relationships between the TDR attributes are portrayed in the Cornell model (Figure 2), developed to support the DPM workshop series. OAIS compliance is implicit in the diagram. TDR stresses the importance of the organizational context and places technology within that context. This placement recognizes that technology should be suited to the scope and requirements of each digital preservation program. The Cornell model for TDR added a “digital archives border” to the TDR attributes because one organization might maintain more than one repository instance, in which case the outer layers might be coordinated across the organization, and a group of organizations might come together to manage one repository (e.g., in a consortial effort).

Digital preservation policy development
Policies and other documentation of decisions and actions represent one of the best indicators of the development of the organizational leg. At the 2006 Best Practices Exchange in North Carolina “participants stressed again and again that a successful digital preservation program requires a strong foundation…Participants identified four essential elements for building a strong foundation for a digital preservation program: support and buy-in from stakeholders; “good enough” practices implemented now; collaborations and partnerships; and documentation for policies, procedures, and standards.”[2]

This brief list of digital preservation policies is suggestive of the increase in policy development within the digital preservation community world wide.

1996 Digital Library SunSITE
2000 Columbia University Libraries (rev. 2006)
2000 The National Archives UK
2001 The National Library Australia
2002 British Library
2003 National Library of Wales
2004 Arts and Humanities Data Service
2004 Cornell University Library
2005 Library and Archives Canada
2005 North Carolina Department of Cultural Resources
2005 UK Data Archive
2007 Inter-university Consortium for Political and Social Research (draft)

The advent of the World Wide Web, which was also in its nascent stage in 1996, has made possible more effective and global exchange of information about policies and practices. More work is underway on developing policies. For example, the nestor policy project in Germany is working on a profile for a national long-term preservation policy.

Providing the evidence for audit and certification
“A well-written policy should serve as historical proof of an institution’s commitment to digital preservation now and long into the future.” This conclusion from the 2006 Best Practices Exchange reflects an implicit principle that underlies the evidence requirements for the audit and certification of digital archives. The October 2005 issue RLG DigiNews featured articles on the major digital archive audit initiatives in the US, the UK, and Germany. The Center for Research Libraries (CRL) conducted a series of test audits of digital archives, with funding from The Andrew W. Mellon Foundation, and hosted a meeting of with the UK and German audit projects that produced a set of common audit principles. CRL released the “Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)” in March 2007 and should be releasing the principles and test audit report soon. TRAC is a revised version of the RLG/NARA document, Audit Checklist for Certifying Digital Repositories, that was released for public comment in January 2006. An ISO standard development effort is underway that will build on the work of these initiatives and integrate the relevant requirements from the information technology and security domains. In considering the basis and means for digital archive certification, these initiatives have shifted their focus towards the benefits and tools needed for self-assessment and third-party audits. The results so far have demonstrated that self-assessments and audit effectively identify the strengths and weaknesses of digital preservation programs and define a development plan for organizations to incrementally address the full set of criteria defined for trusted digital repositories.

The “To Be

Though the “as is” perspective on the organizational leg has improved substantively over the past decade, there are at least two notable areas of development for the “to be” view: the need to integrate the organizational policies for digital preservation into technological implementations and the need to develop and evolve digital preservation skills. v11_n1_art2_bo1

Integrating policies into action
The organizational leg (the “what”) and the technological leg (the “how”) of the digital preservation stool need to be coordinated to develop compliant and feasible digital preservation strategies. The theory is in place. The OAIS Reference Model, for example, identifies specific documents that are needed, including submission agreements, format standards, documentation standards, physical access control, database administration, storage management, disaster recovery, system evolution, migration standards, and procedures regarding most of these areas. In practice, the organizational leg, represented by policies, and the technological leg, represented by digital repositories, may develop separately and not always in parallel. There are ongoing developments to watch in bridging this gap between the organizational and technological legs. The EU-funded project, PLANETS, promises technology-based preservation planning and tools that reflect organizational policies. The PLEDGE project, a collaborative initiative by the Massachusetts Institute of Technology and University of California at San Diego Libraries and the San Diego Supercomputer Center, has developed a promising policy engine prototype. Integrating the organizational and technological legs represents a tangible intersection of theory (what should be done) and practice (what is done).

Developing requisite skills
As technology evolves, digital preservation skills need to evolve. Preservation metadata provides an illustrative example of this often unmet requirement. In 2005, the OCLC-RLG Preservation Metadata Implementation Strategies (PREMIS) Working Group released the first version of the preservation metadata data dictionary and is continuing to revise and enhance their results. RLG DigiNews featured PREMIS updates in the October 2004 and December 2004 issues. PREMIS has become a de facto standard that may transform into a formal standard for preservation metadata in the future. Yet practitioners continue to struggle with implementing preservation metadata, as participants at Cornell’s DPM workshops confirmed. One aspect of this struggle is that there are digital preservation specialists who are able to devise digital preservation policies and strategies; and there are metadata specialists, who are versed on metadata standards, schemas, and tools. A useful hybrid skillset would be a digital preservation metadata specialist who is able to bring the best of both together and to apply the policies and requirements at high and low levels of granularity. As digital preservation strategies emerge and evolve, similar hybrid roles that combine organizational and technical skillsets may be needed for specific types of digital content, such as digital preservation workflow management and archival storage management. In the long-term, the digital preservation community will have developed a comprehensive set of specialized roles and skills for digital curators. The Digital Curation Centre in the UK and the Digital Curation Curriculum project at UNC Chapel Hill are examples of initiatives to watch in this area.v11_n1_art2_img2

The Technological Leg

The technology leg addresses the “how” of digital preservation – the specific digital preservation strategies, staff, tools, equipment, and other means for achieving digital preservation objectives. The technology leg combines hardware, software, formats, storage media, networks, security measures, workflows, procedures, protocols, documentations, and skills, both technical and archival. A decade ago, the hope of a “silver bullet” for digital preservation, typically in the form of a technology-only solution, was still strong and served as an inhibitor to the development of organizational responsibility for digital preservation.

The “As Is

Arguably, technology has been viewed as both the problem and solution for digital preservation. The lessons from the past decade have demonstrated to the community that a balanced three-legged stool with a sturdy technology leg will be more effective in establishing a sustainable digital preservation program than a technology pogo stick. Certainly, there have been notable technology leg developments, including the OAIS Reference Model and open source repository software and tools.

OAIS Reference Model
The development of the OAIS Reference Model, begun more than a decade ago, reflects the work of an international group of experts, and it is intended for use in any context in which digital preservation occurs and represents the most formal and comprehensive expression of the archival process that is available to the community. The stages of development for OAIS can be traced on the OAIS website.

v11_n1_art2_img6
Figure 3. The high-level diagram for the OAIS Reference Model.

The high-level OAIS diagram depicted in Figure 3 has become ubiquitous in digital preservation presentations. OAIS provides a common language and a set of functions for use in community-wide discussions and in mapping organizational developments. Cal Lee at UNC Chapel Hill wrote an evaluation of the OAIS development for his dissertation, “Defining Digital Preservation Work: A Case Study of the Development of the Reference Model for an Open Archival Information System (OAIS).”v11_n1_art2_bo2

Repository software and tools
Examples of repository software developed over the past 10 years include: DSpace, the Flexible Extensible Digital Object and Repository Architecture (Fedora), Greenstone digital library software, the Berkeley Electronic Press (bepress) and the Dark Archive In The Sunshine State (DAITSS). Even with these examples of available repository software, organizations need to decide how to select an appropriate repository option by considering the capabilities and limitations of each and the extent to which the repository software meets archival requirements and suits the digital content to be preserved. Organizations may opt to build their own repository, such as the National Library of the Netherlands, or to subscribe to a digital preservation service provider, such as bepress or the OCLC Digital Archive. None of these options was available to organizations a decade ago.

Repository software may integrate digital preservation tools (or equivalent functionality) or an organization may define for itself a digital preservation workflow that integrates tools at appropriate points in the process. Recent examples of tools used for digital preservation include those that identify and evaluate file formats (e.g., JHOVE, DROID), that normalize files to preservable formats (e.g., XENA), that generate and capture metadata (e.g., the NLNZ metadata extractor), and that produce a unique identifier and aid in detecting changes to files (e.g., checksums). The October 2006 RLG DigiNews FAQ reviewed the NLNZ Metadata Extractor and several other tools. These developments represent progress, but the community has some ways to go before digital preservation is fully automated and fully-compliant digital preservation systems are available.

The “To Be

There is significant research and development work underway that is targeting the development, enhancement, and scalability of tools and repository software. RLG DigiNews highlighted 10 promising digital preservation research programs in August 2005. The “to be” category for the technology leg could be categorized as making it possible to do more through automation and to provide the means to integrate audit requirements and measures into digital preservation management.

Scalable capabilities
Scaling repository software to the increasing size of digital content containers (e.g., digital video files) and the extent of digital content to be preserved is a capacity and capability issue that largely remains in the “to be” category. The past decade has also seen the publication of recommendations from the National Science Foundation (NSF) in the US and the Community Research & Development Information Service (CORDIS) in the EU about the infrastructure that will harness the potential of technology developments to support and enable research. These programs provide a framework for development.

Workflows and suites of tools
Still on the wish list for the digital preservation community is the capability to easily define, customize, change, and extend a digital preservation workflow that is modular to allow for the easy integration of tools. There have been developments in generating or extracting metadata for submissions, but this work is still in its infancy. It is also not always possible to easily incorporate tools into a workflow. Moving from individual tools to suites of tools and workflows that can be shared and exchanged between organizations seems to be a natural path for development.

v11_n1_art2_img8
Figure 5. The Integrated Digital Preservation Matrix.

As more and more organizations develop trusted digital repositories that are based upon sound and continuous workflows, the potential exists for leveraging the capacity and capabilities across repositories and across the community to realize cost-savings, more effective results through collaboration, and community-wide action, as envisioned in the integrated digital preservation matrix (Figure 5, developed for the Cornell DPM workshop series).

Audit capabilities
As institutions begin to rely upon each other, there is the need to develop trust through verification. It is not enough to provide assurances about performance and reliability in digital preservation; it is necessary to demonstrate effective and sustained action. With the development of audit and certification for digital preservation, organizations will require the means to conduct self-assessments and participate in external audits. Incorporating these tools into digital preservation repositories would lighten the burden of preparing for audits and make it easier – and less costly – for organizations to meet audit requirements. The audit and certification initiatives have provided tools for self-assessment and are increasingly providing examples for audit; organizations need to step up to contribute local examples and lessons learned.v11_n1_art2_img4

The Resources Leg

The resources leg factors the “how much” of human, technological, and financial resources are needed to produce desired digital preservation outcomes. A decade ago, the question: “How much does digital preservation cost?” was enough to bring a digital preservation discussion to a shuddering stop. At that time, the resources component of digital preservation had not been explicitly separated from the organizational component. As a distinct component, the resources required for a digital preservation program can be identified, quantified, and measured comprehensively and objectively – although for the most part this potential has yet to be achieved.v11_n1_art2_bo3

The “As Is

Unlike the organizational leg that is embodied in the TDR document and the technological leg that is defined in the OAIS Reference Model, the resources leg of digital preservation has no community document that expresses its scope and requirements. The inclusion of financial sustainability as an attribute of a TDR signifies an important development for digital preservation because it was the first time that addressing the cost of digital preservation was explicitly acknowledged as an organizational requirement. Additional indicators of progress towards the development of a sound resources leg include the designation of digital preservation funding by organizations (e.g., DSPACE at MIT); digital preservation programs that are lasting longer than digital preservation projects, as evidenced by organizations such as those that have developed digital preservation policies; and research funding for digital preservation that is ongoing if not permanent (e.g., JISC, NEH, NSF, NHPRC programs). In addition to these indicators, the digital preservation community has a growing base of literature that addresses digital preservation costs, including Brian Lavoie’s proposed economic models for digital preservation; Shelby Sannett’s research on cost models and cost frameworks; and the approach developed in the Netherlands (Oltmans and Kol) that provides a tool to compare the costs of migration and emulation over time. The most comprehensive cost formula for digital preservation was proposed by the LIFE project in 2006. These examples have contributed to a deeper understanding of digital preservation costs within the community, but do not equate to a comprehensive community document for the resources leg. Nor are organizations systematically collecting and sharing resource information.

v11_n1_art2_img7
Figure 5. Integrating the organizational and technological legs of digital preservation.

The resources perspective considers the “what” and the “how” of digital preservation to determine the “how much” (represented by financial sustainability in Figure 4, developed for Cornell’s DPM workshop series). The resources leg is informed by the organizational context and tied to the technological implementation for an organization’s digital preservation program. Figure 4 illustrates the technological implementation expressed by OAIS within the organizational context expressed by TDR and the separation of financial sustainability within the organizational context for digital preservation.

The “To Be

There has been progress in developing the resources leg, though two areas seem ripe for further development: the designation of funding by organizations for digital preservation and the definition of a community document that addresses resources.

Designating digital preservation funding
Organizations are still struggling to secure resources for digital preservation. One of the research library directors interviewed for the recent Metes and Bounds report on e-journal archiving observed that digital preservation is a “just-in-case scenario, and this is very much a just-in-time operation.” (p. 11) Respondents to Cornell’s DPM workshop institutional readiness survey identified insufficient resources for digital preservation as the second highest threat to digital content after insufficient policies or plans. Survey respondents also identified a complicating factor in designating resources for digital preservation. It has been common practice for an organization to establish a digital preservation initiative by assigning a percentage of the digital preservation responsibility to several staff often located across an organization, making it difficult to consolidate or coordinate resources. The digital preservation community also needs a means for being transparent about resources, recognizing that specific details may include confidential or internal-only information.

Defining a community document for resources
The “as is” examples of resource-related writings and developments for digital preservation (e.g., Lavoie, Sannett, Oltmans and Kol, and LIFE examples presented above) provide a starting point for defining a community document for resources. Common elements in TDR and OAIS include the definition of core concepts, the definition of roles and responsibilities, descriptions of the components and attributes, and the discussion of implementation issues with examples and/or recommendations. A productive first step for the community might be to consolidate and rationalize the resource issues and elements presented in the resource examples, then apply a gap analysis process to fill in missing elements. There have been few examples within the community of responses to these contributions to the strengthening of the resource leg of digital preservation.

Stabilizing the Three-legged Stool

Taking the three legs of the stool together, there are a number of indicators that the digital preservation community is coalescing and maturing. Communities by nature share common interests and objectives. Indicators of the development of the digital preservation community include accepted standards and practice and an increasingly effective communication network.

Standards and practice
A decade ago there were no formal shared standards or practice for digital preservation. Today, we have OAIS, TDR, and PREMIS, for example. The sustainable formats website at the Library of Congress and PRONOM are contributing to the development of preservation strategies for classes of digital content. RLG DigiNews featured articles about PRONOM developments in the October 2003 and April 2005 issues. These examples reflect community practice as defined by representatives of archives, libraries, museums, and other cultural heritage institutions. Domain-specific developments, such as the Canadian Heritage Information network (CHIN) report on digital preservation for museums, have also contributed to the development of community-wide practice. In addition, the standards of our community are regularly supplemented by standards developments in other communities, including information technology, information security, telecommunications, and the Internet. We are moving towards more comprehensive codification of accepted practice, the promulgation of standards and practice through community channels, and the means to develop and maintain policies and procedures as needed.v11_n1_art2_bo4

Communication network
A challenge for organizations that are engaged in digital preservation is to balance the time and resources devoted to developing the repository internally against monitoring the external environment for relevant developments, updates, standards, and warnings. The difficulty in keeping up with digital preservation developments is exemplified by a quick review of the RLG DigiNews August 2005 list of ten “watch this space” digital preservation research projects. Three of the project websites had updates and current information about the project that were fairly easy to locate. The current status of three of the project websites was unclear and the projects seemed to be stalled or abandoned based on obvious locations for updates and news on the websites. Three of the project websites had few or no updates since August 2005. It was possible to find results or presentations about the projects by searching, but it was difficult to confidently determine the current status. The URLs for two of the projects have changed and could not be easily found by searching. Of course, there are several possible explanations for that and the projects could be alive and thriving somewhere. One project website required logging-in. Requiring a log-in is not a bad thing, but logging in requires time and a bit more effort. If an organization is trying to track and follow a number of digital preservation developments, these examples represent potential barriers. The PADI website has provided an excellent information service to the digital preservation community for the past decade and other services contribute as well, but there is currently no “one stop shopping” for keeping up with digital preservation research and development. Keeping up takes effort, but it is worthwhile. The digital preservation community is active and offers many opportunities for organizations to participate, contribute, and learn.

“One participant [in the 2006 Best Practices Exchange] characterized a ‘community of practice’ as a flock of birds. Each bird may ultimately have a v11_n1_art2_img9different end destination, but since they are flying in the same general direction, it is more efficient to fly together as a flock.” A fitting close to this anniversary review of the migration patterns of a community over the past decade. How far will we have gotten towards the “to be” by 2012 or 2017? Stay tuned…

Author's Addendum (7 May 2007):
An alert reader contacted me about my list of digital preservation policy examples questioning the dates of some and the inclusion of another. I am submitting this brief response to correct and clarify my list. The reader wondered if I should have cited earlier dates for the National Library of Australia (NLA), the UK Data Archive (UKDA), and the Arts and Humanities Data Service (AHDS). After checking, I can report that 2001 is the correct date for the NLA digital preservation policy and 2004 is the date for version 1.0 of the AHDS digital preservation policy. Both of these institutions have been major contributors to digital preservation progress. An important caveat for the AHDS is that 2004 was the date of their first policy to address the preservation of the digital collections within their care; however, the AHDS developed an early strategic policy framework document (http://ahds.ac.uk/strategic.doc) in 1997 that reported the results of a study they conducted, including recommendations to the community on developing digital preservation policies. I should have cited the date for version 1.0 of the UKDA policy as 2003 and the date for the British Library policy as 2001. I included the Digital Library Sunsite policy because it is both a collection development and a preservation policy. It is an important early example of the definition of preservation levels for digital content and of a preservation policy that address Web content. An interesting thing about digital preservation policies is that even institutions that have been early adopters and pioneers in digital preservation often took a while to develop formal digital preservation policies. We should have many more policy examples that are readily available, though we should also be pleased with the progress we have made and continue to make. Thank you to the diligent reader and my apologies to the British Library and the UKDA for misdating their policies.

Notes
[1] Anne R. Kenney and Nancy Y. McGovern, “The Five Organizational Stages of Digital Preservation,” in Digital Libraries: A Vision for the Twenty First Century, a festschrift to honor Wendy Lougee, 2003.

[2]  Christy E. Allen, “Foundations for a Successful Digital Preservation Program: Discussions from Digital Preservation in State Government: Best Practices Exchange 2006,” RLG DigiNews, June 2006, Vol 10, No 3.


Copyright 2004 RLG.