There has been measurable progress in the digital preservation community since the seminal work Preserving Digital Information: Final Report and Recommendations
was published by the commission of the Commission on Preservation and
Access and RLG more than a decade ago. Those concerned about digital
preservation in 1996 did not have the Open Archival Information System (OAIS) standard to frame the development and discussion of digital preservation developments; or a set of attributes of trusted digital repository
to delineate the organizational context for digital preservation; or a
data dictionary for preservation metadata; or the concept of
institutional repositories made real by a range of software options.
All of these developments have emerged within the past decade. Today,
we have conferences that are entirely devoted to digital preservation
(e.g., the International Preservation (iPres)) conference and peer-reviewed journals for digital preservation, (e.g., The International Journal of Digital Curation). One can follow the maturation of the digital preservation community in a decade of RLG DigiNews articles.
Originally focused on “the converging fields of preservation and digitization,” the first article to specifically address digital preservation appeared in RLG DigiNews in 1998. In 2000, the RLG DigiNews editorial staff significantly expanded the coverage of digital preservation, highlighting articles with the now familiar symbol, which added digital to the established infinity notation from print preservation. The cumulative contribution by RLG DigiNews
to the digital preservation literature over the past decade includes
more than fifty feature articles plus a sequence of highlighted
websites and FAQs. These articles and other features stressed practical
steps in digital preservation with an emphasis on the development and
evaluation of relevant strategies, applications of research results,
the integration and use of tools, and national and community-level
agendas.
This tenth anniversary review of digital preservation developments
takes an informal gap analysis approach, measuring where we are (the
“as is”) against where we might like to be (the “to be”). This gap
analysis has three components reflecting the core aspects of digital
preservation: organizational infrastructure, technological
infrastructure, and requisite resources.
Figure 1. Three-Legged Stool for Digital Preservation.
These three components comprise the three-legged stool for digital
preservation (Figure 1), a concept developed at Cornell for the Digital
Preservation Management (DPM)
Workshop series, that was funded by the National Endowment for the
Humanities from 2003-2006. The workshop curriculum uses the
three-legged stool as a means for an organization to assess its
development within the context of a maturity model comprised of five
sequential stages: acknowledge, act, consolidate, institutionalize, and
externalize.[1]
This review takes a more basic step by considering the status of the
three legs of the stool within the community from the “as is” and “to
be” perspectives.
The Organizational Leg
The organizational leg determines the “what” of digital
preservation—the mandate, the scope, the objectives, the staffing of an
organization—for engaging in digital preservation. Ten years ago, the
organizational leg was arguably the weakest leg as evidenced by the
general absence of explicit mission statements that referenced digital
preservation, policies that specifically addressed the preservation of
digital assets and sustained digital preservation programs within
organizations.
The “As Is”
There have been several important developments for the
organizational leg over the past ten years, including the development
and promulgation of the RLG/OCLC report on the Attributes of a Trusted
Digital Repository (TDR), an increase in the development of digital
preservation policies by organizations, and an acknowledgement of the
central role of procedural accountability for audit and certification.
Trusted digital repositories
TDR represents the best
expression of the organizational leg for digital preservation and has
become a de facto standard for the digital preservation community since
its release in 2002. Prior to the development of TDR, the community had
no formal expression of the organizational context for digital
preservation.
Figure 2. The Cornell Model for Trusted Digital Repository Attributes.
The Trusted Digital Repositories document defines seven attributes
of a conformant organization: OAIS compliance, administrative
responsibility, organizational viability, financial sustainability,
technological and procedural suitability, system security, and
procedural accountability. The relationships between the TDR attributes
are portrayed in the Cornell model (Figure 2), developed to support the
DPM workshop series. OAIS compliance is implicit in the diagram. TDR
stresses the importance of the organizational context and places
technology within that context. This placement recognizes that
technology should be suited to the scope and requirements of each
digital preservation program. The Cornell model for TDR added a
“digital archives border” to the TDR attributes because one
organization might maintain more than one repository instance, in which
case the outer layers might be coordinated across the organization, and
a group of organizations might come together to manage one repository
(e.g., in a consortial effort).
Digital preservation policy development
Policies and
other documentation of decisions and actions represent one of the best
indicators of the development of the organizational leg. At the 2006
Best Practices Exchange in North Carolina “participants stressed again
and again that a successful digital preservation program requires a
strong foundation…Participants identified four essential elements for
building a strong foundation for a digital preservation program:
support and buy-in from stakeholders; “good enough” practices
implemented now; collaborations and partnerships; and documentation for
policies, procedures, and standards.”[2]
This brief list of digital preservation policies is suggestive of
the increase in policy development within the digital preservation
community world wide.
The advent of the World Wide Web, which was also in its nascent
stage in 1996, has made possible more effective and global exchange of
information about policies and practices. More work is underway on
developing policies. For example, the nestor policy project in Germany is working on a profile for a national long-term preservation policy.
Providing the evidence for audit and certification
“A
well-written policy should serve as historical proof of an
institution’s commitment to digital preservation now and long into the
future.” This conclusion from the 2006 Best Practices Exchange reflects
an implicit principle that underlies the evidence requirements for the
audit and certification of digital archives. The October 2005 issue RLG DigiNews
featured articles on the major digital archive audit initiatives in the
US, the UK, and Germany. The Center for Research Libraries (CRL)
conducted a series of test audits of digital archives, with funding
from The Andrew W. Mellon Foundation, and hosted a meeting of with the
UK and German audit projects that produced a set of common audit
principles. CRL released the “Trustworthy Repositories Audit &
Certification: Criteria and Checklist (TRAC)”
in March 2007 and should be releasing the principles and test audit
report soon. TRAC is a revised version of the RLG/NARA document, Audit Checklist for Certifying Digital Repositories, that was released for public comment in January 2006. An ISO standard development effort
is underway that will build on the work of these initiatives and
integrate the relevant requirements from the information technology and
security domains. In considering the basis and means for digital
archive certification, these initiatives have shifted their focus
towards the benefits and tools needed for self-assessment and
third-party audits. The results so far have demonstrated that
self-assessments and audit effectively identify the strengths and
weaknesses of digital preservation programs and define a development
plan for organizations to incrementally address the full set of
criteria defined for trusted digital repositories.
The “To Be”
Though the “as is” perspective on the organizational leg has
improved substantively over the past decade, there are at least two
notable areas of development for the “to be” view: the need to
integrate the organizational policies for digital preservation into
technological implementations and the need to develop and evolve
digital preservation skills.
Integrating policies into action
The organizational leg
(the “what”) and the technological leg (the “how”) of the digital
preservation stool need to be coordinated to develop compliant and
feasible digital preservation strategies. The theory is in place. The
OAIS Reference Model, for example, identifies specific documents that
are needed, including submission agreements, format standards,
documentation standards, physical access control, database
administration, storage management, disaster recovery, system
evolution, migration standards, and procedures regarding most of these
areas. In practice, the organizational leg, represented by policies,
and the technological leg, represented by digital repositories, may
develop separately and not always in parallel. There are ongoing
developments to watch in bridging this gap between the organizational
and technological legs. The EU-funded project, PLANETS, promises technology-based preservation planning and tools that reflect organizational policies. The PLEDGE
project, a collaborative initiative by the Massachusetts Institute of
Technology and University of California at San Diego Libraries and the
San Diego Supercomputer Center, has developed a promising policy engine
prototype. Integrating the organizational and technological legs
represents a tangible intersection of theory (what should be done) and
practice (what is done).
Developing requisite skills
As technology evolves,
digital preservation skills need to evolve. Preservation metadata
provides an illustrative example of this often unmet requirement. In
2005, the OCLC-RLG Preservation Metadata Implementation Strategies (PREMIS)
Working Group released the first version of the preservation metadata
data dictionary and is continuing to revise and enhance their results. RLG DigiNews featured PREMIS updates in the October 2004 and December 2004
issues. PREMIS has become a de facto standard that may transform into a
formal standard for preservation metadata in the future. Yet
practitioners continue to struggle with implementing preservation
metadata, as participants at Cornell’s DPM workshops confirmed. One
aspect of this struggle is that there are digital preservation
specialists who are able to devise digital preservation policies and
strategies; and there are metadata specialists, who are versed on
metadata standards, schemas, and tools. A useful hybrid skillset would
be a digital preservation metadata specialist who is able to bring the
best of both together and to apply the policies and requirements at
high and low levels of granularity. As digital preservation strategies
emerge and evolve, similar hybrid roles that combine organizational and
technical skillsets may be needed for specific types of digital
content, such as digital preservation workflow management and archival
storage management. In the long-term, the digital preservation
community will have developed a comprehensive set of specialized roles
and skills for digital curators. The Digital Curation Centre in the UK and the Digital Curation Curriculum project at UNC Chapel Hill are examples of initiatives to watch in this area.
The Technological Leg
The technology leg addresses the “how” of digital preservation – the
specific digital preservation strategies, staff, tools, equipment, and
other means for achieving digital preservation objectives. The
technology leg combines hardware, software, formats, storage media,
networks, security measures, workflows, procedures, protocols,
documentations, and skills, both technical and archival. A decade ago,
the hope of a “silver bullet” for digital preservation, typically in
the form of a technology-only solution, was still strong and served as
an inhibitor to the development of organizational responsibility for
digital preservation.
The “As Is”
Arguably, technology has been viewed as both the problem and
solution for digital preservation. The lessons from the past decade
have demonstrated to the community that a balanced three-legged stool
with a sturdy technology leg will be more effective in establishing a
sustainable digital preservation program than a technology pogo stick.
Certainly, there have been notable technology leg developments,
including the OAIS Reference Model and open source repository software
and tools.
OAIS Reference Model
The development of the OAIS
Reference Model, begun more than a decade ago, reflects the work of an
international group of experts, and it is intended for use in any
context in which digital preservation occurs and represents the most
formal and comprehensive expression of the archival process that is
available to the community. The stages of development for OAIS can be traced on the OAIS website.
Figure 3. The high-level diagram for the OAIS Reference Model.
The high-level OAIS diagram depicted in Figure 3 has become
ubiquitous in digital preservation presentations. OAIS provides a
common language and a set of functions for use in community-wide
discussions and in mapping organizational developments. Cal Lee at UNC
Chapel Hill wrote an evaluation of the OAIS development for his
dissertation, “Defining
Digital Preservation Work: A Case Study of the Development of the
Reference Model for an Open Archival Information System (OAIS).”
Repository software and tools
Examples of repository software developed over the past 10 years include: DSpace, the Flexible Extensible Digital Object and Repository Architecture (Fedora), Greenstone digital library software, the Berkeley Electronic Press (bepress) and the Dark Archive In The Sunshine State (DAITSS).
Even with these examples of available repository software,
organizations need to decide how to select an appropriate repository
option by considering the capabilities and limitations of each and the
extent to which the repository software meets archival requirements and
suits the digital content to be preserved. Organizations may opt to
build their own repository, such as the National Library of the Netherlands, or to subscribe to a digital preservation service provider, such as bepress or the OCLC Digital Archive. None of these options was available to organizations a decade ago.
Repository software may integrate digital preservation tools (or
equivalent functionality) or an organization may define for itself a
digital preservation workflow that integrates tools at appropriate
points in the process. Recent examples of tools used for digital
preservation include those that identify and evaluate file formats
(e.g., JHOVE, DROID), that normalize files to preservable formats (e.g., XENA), that generate and capture metadata (e.g., the NLNZ metadata extractor), and that produce a unique identifier and aid in detecting changes to files (e.g., checksums). The October 2006 RLG DigiNews FAQ
reviewed the NLNZ Metadata Extractor and several other tools. These
developments represent progress, but the community has some ways to go
before digital preservation is fully automated and fully-compliant
digital preservation systems are available.
The “To Be”
There is significant research and development work underway that is
targeting the development, enhancement, and scalability of tools and
repository software. RLG DigiNews highlighted 10 promising digital preservation research programs
in August 2005. The “to be” category for the technology leg could be
categorized as making it possible to do more through automation and to
provide the means to integrate audit requirements and measures into
digital preservation management.
Scalable capabilities
Scaling repository software to the
increasing size of digital content containers (e.g., digital video
files) and the extent of digital content to be preserved is a capacity
and capability issue that largely remains in the “to be” category. The
past decade has also seen the publication of recommendations from the National Science Foundation (NSF) in the US and the Community Research & Development Information Service (CORDIS)
in the EU about the infrastructure that will harness the potential of
technology developments to support and enable research. These programs
provide a framework for development.
Workflows and suites of tools
Still on the wish list for
the digital preservation community is the capability to easily define,
customize, change, and extend a digital preservation workflow that is
modular to allow for the easy integration of tools. There have been
developments in generating or extracting metadata for submissions, but
this work is still in its infancy. It is also not always possible to
easily incorporate tools into a workflow. Moving from individual tools
to suites of tools and workflows that can be shared and exchanged
between organizations seems to be a natural path for development.
Figure 5. The Integrated Digital Preservation Matrix.
As more and more organizations develop trusted digital repositories
that are based upon sound and continuous workflows, the potential
exists for leveraging the capacity and capabilities across repositories
and across the community to realize cost-savings, more effective
results through collaboration, and community-wide action, as envisioned
in the integrated digital preservation matrix (Figure 5, developed for
the Cornell DPM workshop series).
Audit capabilities
As institutions begin to rely upon
each other, there is the need to develop trust through verification. It
is not enough to provide assurances about performance and reliability
in digital preservation; it is necessary to demonstrate effective and
sustained action. With the development of audit and certification for
digital preservation, organizations will require the means to conduct
self-assessments and participate in external audits. Incorporating
these tools into digital preservation repositories would lighten the
burden of preparing for audits and make it easier – and less costly –
for organizations to meet audit requirements. The audit and
certification initiatives have provided tools for self-assessment and
are increasingly providing examples for audit; organizations need to
step up to contribute local examples and lessons learned.
The Resources Leg
The resources leg factors the “how much” of human, technological,
and financial resources are needed to produce desired digital
preservation outcomes. A decade ago, the question: “How much does
digital preservation cost?” was enough to bring a digital preservation
discussion to a shuddering stop. At that time, the resources component
of digital preservation had not been explicitly separated from the
organizational component. As a distinct component, the resources
required for a digital preservation program can be identified,
quantified, and measured comprehensively and objectively – although for
the most part this potential has yet to be achieved.
The “As Is”
Unlike the organizational leg that is embodied in the TDR document
and the technological leg that is defined in the OAIS Reference Model,
the resources leg of digital preservation has no community document
that expresses its scope and requirements. The inclusion of financial
sustainability as an attribute of a TDR signifies an important
development for digital preservation because it was the first time that
addressing the cost of digital preservation was explicitly acknowledged
as an organizational requirement. Additional indicators of progress
towards the development of a sound resources leg include the
designation of digital preservation funding by organizations (e.g., DSPACE at MIT);
digital preservation programs that are lasting longer than digital
preservation projects, as evidenced by organizations such as those that
have developed digital preservation policies; and research funding for
digital preservation that is ongoing if not permanent (e.g., JISC, NEH,
NSF, NHPRC programs). In addition to these indicators, the digital
preservation community has a growing base of literature that addresses
digital preservation costs, including Brian Lavoie’s proposed economic models for digital preservation; Shelby Sannett’s research on cost models and cost frameworks; and the approach developed in the Netherlands (Oltmans and Kol)
that provides a tool to compare the costs of migration and emulation
over time. The most comprehensive cost formula for digital preservation
was proposed by the LIFE project
in 2006. These examples have contributed to a deeper understanding of
digital preservation costs within the community, but do not equate to a
comprehensive community document for the resources leg. Nor are
organizations systematically collecting and sharing resource
information.
Figure 5. Integrating the organizational and technological legs of digital preservation.
The resources perspective considers the “what” and the “how” of
digital preservation to determine the “how much” (represented by
financial sustainability in Figure 4, developed for Cornell’s DPM
workshop series). The resources leg is informed by the organizational
context and tied to the technological implementation for an
organization’s digital preservation program. Figure 4 illustrates the
technological implementation expressed by OAIS within the
organizational context expressed by TDR and the separation of financial
sustainability within the organizational context for digital
preservation.
The “To Be”
There has been progress in developing the resources leg, though two
areas seem ripe for further development: the designation of funding by
organizations for digital preservation and the definition of a
community document that addresses resources.
Designating digital preservation funding
Organizations
are still struggling to secure resources for digital preservation. One
of the research library directors interviewed for the recent Metes and Bounds
report on e-journal archiving observed that digital preservation is a
“just-in-case scenario, and this is very much a just-in-time
operation.” (p. 11) Respondents to Cornell’s DPM workshop institutional
readiness survey identified insufficient resources
for digital preservation as the second highest threat to digital
content after insufficient policies or plans. Survey respondents also
identified a complicating factor in designating resources for digital
preservation. It has been common practice for an organization to
establish a digital preservation initiative by assigning a percentage
of the digital preservation responsibility to several staff often
located across an organization, making it difficult to consolidate or
coordinate resources. The digital preservation community also needs a
means for being transparent about resources, recognizing that specific
details may include confidential or internal-only information.
Defining a community document for resources
The “as is”
examples of resource-related writings and developments for digital
preservation (e.g., Lavoie, Sannett, Oltmans and Kol, and LIFE examples
presented above) provide a starting point for defining a community
document for resources. Common elements in TDR and OAIS include the
definition of core concepts, the definition of roles and
responsibilities, descriptions of the components and attributes, and
the discussion of implementation issues with examples and/or
recommendations. A productive first step for the community might be to
consolidate and rationalize the resource issues and elements presented
in the resource examples, then apply a gap analysis process to fill in
missing elements. There have been few examples within the community of
responses to these contributions to the strengthening of the resource
leg of digital preservation.
Stabilizing the Three-legged Stool
Taking the three legs of the stool together, there are a number of
indicators that the digital preservation community is coalescing and
maturing. Communities by nature share common interests and objectives.
Indicators of the development of the digital preservation community
include accepted standards and practice and an increasingly effective
communication network.
Standards and practice
A decade ago there were no formal
shared standards or practice for digital preservation. Today, we have
OAIS, TDR, and PREMIS, for example. The sustainable formats website at the Library of Congress and PRONOM are contributing to the development of preservation strategies for classes of digital content. RLG DigiNews featured articles about PRONOM developments in the October 2003 and April 2005
issues. These examples reflect community practice as defined by
representatives of archives, libraries, museums, and other cultural
heritage institutions. Domain-specific developments, such as the
Canadian Heritage Information network (CHIN)
report on digital preservation for museums, have also contributed to
the development of community-wide practice. In addition, the standards
of our community are regularly supplemented by standards developments
in other communities, including information technology, information
security, telecommunications, and the Internet. We are moving towards
more comprehensive codification of accepted practice, the promulgation
of standards and practice through community channels, and the means to
develop and maintain policies and procedures as needed.
Communication network
A challenge for organizations
that are engaged in digital preservation is to balance the time and
resources devoted to developing the repository internally against
monitoring the external environment for relevant developments, updates,
standards, and warnings. The difficulty in keeping up with digital
preservation developments is exemplified by a quick review of the RLG DigiNews
August 2005 list of ten “watch this space” digital preservation
research projects. Three of the project websites had updates and
current information about the project that were fairly easy to locate.
The current status of three of the project websites was unclear and the
projects seemed to be stalled or abandoned based on obvious locations
for updates and news on the websites. Three of the project websites had
few or no updates since August 2005. It was possible to find results or
presentations about the projects by searching, but it was difficult to
confidently determine the current status. The URLs for two of the
projects have changed and could not be easily found by searching. Of
course, there are several possible explanations for that and the
projects could be alive and thriving somewhere. One project website
required logging-in. Requiring a log-in is not a bad thing, but logging
in requires time and a bit more effort. If an organization is trying to
track and follow a number of digital preservation developments, these
examples represent potential barriers. The PADI website
has provided an excellent information service to the digital
preservation community for the past decade and other services
contribute as well, but there is currently no “one stop shopping” for
keeping up with digital preservation research and development. Keeping
up takes effort, but it is worthwhile. The digital preservation
community is active and offers many opportunities for organizations to
participate, contribute, and learn.
“One participant [in the 2006 Best Practices Exchange] characterized a ‘community of practice’ as a flock of birds. Each bird may ultimately have a different
end destination, but since they are flying in the same general
direction, it is more efficient to fly together as a flock.” A fitting
close to this anniversary review of the migration patterns of a
community over the past decade. How far will we have gotten towards the
“to be” by 2012 or 2017? Stay tuned…
Author's Addendum (7 May 2007):
An alert
reader contacted me about my list of digital preservation policy
examples questioning the dates of some and the inclusion of another. I
am submitting this brief response to correct and clarify my list. The
reader wondered if I should have cited earlier dates for the National
Library of Australia (NLA), the UK Data Archive (UKDA), and the Arts
and Humanities Data Service (AHDS). After checking, I can report that
2001 is the correct date for the NLA digital preservation policy and
2004 is the date for version 1.0 of the AHDS digital preservation
policy. Both of these institutions have been major contributors to
digital preservation progress. An important caveat for the AHDS is that
2004 was the date of their first policy to address the preservation of
the digital collections within their care; however, the AHDS developed
an early strategic policy framework document
(http://ahds.ac.uk/strategic.doc) in 1997 that reported the results of
a study they conducted, including recommendations to the community on
developing digital preservation policies. I should have cited the date
for version 1.0 of the UKDA policy as 2003 and the date for the British
Library policy as 2001. I included the Digital Library Sunsite policy
because it is both a collection development and a preservation policy.
It is an important early example of the definition of preservation
levels for digital content and of a preservation policy that address
Web content. An interesting thing about digital preservation policies
is that even institutions that have been early adopters and pioneers in
digital preservation often took a while to develop formal digital
preservation policies. We should have many more policy examples that
are readily available, though we should also be pleased with the
progress we have made and continue to make. Thank you to the diligent
reader and my apologies to the British Library and the UKDA for
misdating their policies.
Notes
[1] Anne
R. Kenney and Nancy Y. McGovern, “The Five Organizational Stages of
Digital Preservation,” in Digital Libraries: A Vision for the Twenty
First Century, a festschrift to honor Wendy Lougee, 2003.
[2] Christy E. Allen, “Foundations for a Successful Digital Preservation Program: Discussions from Digital Preservation in State Government: Best Practices Exchange 2006,” RLG DigiNews, June 2006, Vol 10, No 3.