Context, Structure and Content:
Postscript: Postscript: Since this paper was
delivered at the Annual Conference of the Association of Canadian Archivists in
June, the final report of the InterPARES
Authenticity Task Force, Authenticity
Task Force Final Report, has become available. Readers who find my
paper interesting may wish to read this one as well, particularly as it relates
to the verification of authenticity (see section 4.2.4.2) and preservation
baseline requirement B.2 "Documentation of Reproduction Process and its
Effects" (see section 3.2 of the Appendix).
Introduction
Electronic
records have unique characteristics.
This paper explores some of those in terms of appraisal, and proposes
four criteria specific to electronic records.
It also discusses at what level of appraisal these criteria would best
be applied.
The
introduction of the macro-appraisal model clearly established appraisal as a
hierarchical exercise. That model is
designed to isolate key areas where the best records are likely to be located
by examining the structure and functions of the creating agency. The appraisal of the actual record series follows
the identification of those key areas.
Terry Cook cautions that
It
is important to recall that there are several factors at this later stage which
can refine or reverse a positive decision made by using the macro-appraisal
model.[2]
At
this lower level archivists look at the different series to determine which of
them best fulfil the values sought or identified at the macro-appraisal
level. As well various criteria are
applied, such as completeness, uniqueness, barriers to accessibility, relationship
with other archival holdings, etc.
While these criteria remain relevant and applicable to the appraisal of
electronic series, the separability of content, structure and context - the
three components of a record, along with other characteristics unique to
electronic records, suggest that these "traditional" criteria are not
sufficient in themselves to fully appraise electronic record series. The four new criteria proposed in this paper
apply at one level lower yet than what I have referred to as series level
appraisal, thus two levels below macro/functional appraisal. Paraphrasing Cook, one might say that there
are several factors at this later stage that can refine or reverse a positive
decision made at the series level.
There
is a need to appraise at this lower level because the component facets of a
record - context, content and structure - are independent of each other as well
as of their medium in the electronic environment. With every migration, whether
it occurs while the records are still operational, at point of transfer to an
archives, or while in the custody of the archives, some aspects of the context,
content and structure are affected. The
National Archives of Australia acknowledges this fact as follows:
In
the electronic environment we consider that the ‘original’ means the content,
structure and context of the original transaction but not all the attributes
present in the original software or hardware platform. It is inevitable that
some losses will occur at the point of migration from one version of the
software to the next or one platform to the next but this is acceptable as long
as the aspects of the record required for evidence are preserved.[3]
It is therefore critical for archivists to determine what the essence of the record is in order to ensure that "the aspects of the record required for evidence are preserved."
This
paper uses the definitions contained in the National Archives of Australia's
publication Keeping Electronic Records. Content is defined as "That which
conveys information e.g., the text, data, symbols, numerals, images, sound and
vision." Structure is "The
appearance and arrangement of the content e.g., the relationships between
fields, entities, language, style, fonts, page and paragraph breaks, links and
other editorial devices." And
context is "The background information which enhances understanding of
technical and business environments to which the records relate e.g., metadata,
application software, logical business models, and the provenance (i.e.,
address, title, link to function or activity, agency, program or
section)."[4]
In addition to the independence of the
content, structure and context, records in the electronic environment have
unique characteristics. Harold Naugler
noted several of these and I have summarized them as follows:
§
durability
§
lifespan
§
maintenance
§
ease of editing, copying erasure and reformatting (manipulability)
§
ease of manipulation, including the difficulty of tracing manipulation
§
need for supporting documentation to describe the contents,
arrangement, codes, and technical characteristics
§
need for specialized personnel for the processing and maintenance of
the records, introducing a new player in the normal clique of archivist,
creator and user.[5]
From these unique characteristics I propose four appraisal
criteria:
1.
Durability,
2.
Presentation/Rendering,
3.
Manipulability, and
4.
Technical Context.
Durability
Naugler refers to this characteristic
primarily in terms of storage media. I
define it here in terms of the durability of the native application's
contribution to the record. Durability
is an important consideration because changing the format of the record may
have a fundamental impact on it, as will become clear through the illustrations
used to clarify these criteria.
Migrating records or data to an open format simply means a double
conversion, as they will presumably have to be opened again in another
application.
Illustration 1 provides a view of a record
called Jim's Calendar. Much of the
structure is established by Microsoft Outlook 98 - as testified by the buttons
visible along the top. The data is
ordered chronologically within the Calendar folder.
Illustration
1: Jim's Calendar, "normal"
view
Illustration
2: A sample entry from a paper-based
calendar.
Contrast this illustration with illustration 2, which is a scanned
image from a paper journal or calendar.
The paper calendar, to my eyes, shows more directly the impact of its
contributors. It has an area for
listing tasks and, at the bottom left, a place for recording expenses and
reimbursements. Different colors of ink
and styles of writing suggest that different individuals contributed to the
calendar. Not all of these
characteristics are apparent in the illustration of the electronic
calendar. There is a task list, but the
information is presented in a uniform fashion. Unlike the paper calendar, MS
Outlook contributes the ability to change the structure of the calendar from a
presentation of the daily information to a presentation of information for the
whole month (see illustration 3), an option not possible in the paper calendar.
Illustration
3: Jim's Calendar presented by month.
Having pointed out some of the differences
between the same type of record, how is durability applied as an appraisal
criterion? In a test undertaken with
this Calendar record, exporting the data from Outlook to an open format and
reloading it back into Outlook worked very well. Would it do so well if the data were reloaded into MS Outlook
2000? Or 2005? As soon as the structure of the data in the
application is changed the option of reloading older data in its native format
or even from open formats becomes less sure.
Emulating applications has been proposed as a means of overcoming this
problem, but as the ICA's Committee on Electronic Records has warned, there are
significant issues that affect emulation as a long-term strategy.[6]
By contrast, a simple text file created in
WordPerfect 5.1 opens reasonably well in Microsoft Word 97. This suggests to me that records created
using WP5.1 have a higher durability than those created in Outlook. The WordPerfect documents enjoyed the
moderate lifespan of its native software (WP5.1), which is extended by the
accessibility of the format using other, more recent applications. Using a viewer application to look at the
same text file may confer a high durability to records in this format. In this
instance I conclude that the Outlook Calendar record has low durability because
the application contributes a great deal to the structure of the record and
that contribution cannot be carried out of the application with the content.
Why is durability important? Is it not simply a technical matter
concerning preservation? I would argue that
it is an appraisal concern because as a record's durability declines over the
years, so will the contribution made by the original application to the
record's structure. As the record
changes when it is accessed through another application so will the way the
viewer understands the record.
If records have been preserved primarily for
their informational value, durability might recede into insignificance.[7] But if the Calendar was preserved in part to
convey some evidential aspect of how the creator went about his/her business,
then loss of that original functionality may render the record no longer worth
preserving.[8] Which is to say that durability becomes a
significant appraisal criterion where values other than informational value
predominate.[9]
Presentation/Rendering
The manner in which records are presented or
rendered is closely linked to durability.
This criterion, however, goes beyond durability to address which visual
attributes give value to the records.
In our Calendar example, if it is decided that the essential attributes
are simply the date and time, event details and knowing whose calendar it is,
then there are no visual attributes other than that provided by the Gregorian
calendar, that give value to the record.
The essential data, in this instance being very simple, could easily be
sorted and presented in different ways, i.e., the presentation or structuring
provided by the native format is not deemed critical. Other data, such as
expenses might be deemed secondary and discarded. (Expenses is visible only in
the paper calendar so far, but each appointment in the electronic calendar can
also be "opened" to show this additional information). But what if
one knew that the data was normally viewed by category, i.e., where appointments
from other categories are not interspersed, (see illustration 4) rather than
simply by chronology? As illustration 4
shows, the structure of presentation is considerably different. In this case the presentational structure of
the native format may become a value or aspect that needs to be preserved.
Illustration 4:
Jim's Calendar, categories view.
Similarly, the content of Jim's Calendar is
sound but the rendering is very different when it is extracted as tab separated
values and viewed using Microsoft Notepad (see illustration 5).
Illustration
5: Jim's Calendar (portion) as
tab
separated values and presented in MS Notepad.
This particular format does not allow flexible rendering, this is the
only presentation possible. If this content
is loaded into an application that can address the tab delimited data elements
individually, such as a spreadsheet, then the data not only will look more
comprehensible, but its data elements can be manipulated as well to allow for
alternative renderings.
It can be argued that how the value is
preserved is a technical consideration.
It may be preserved by keeping the records in their native format, if
that has a high durability.
Alternatively, it may be preserved in the description of the record,
leaving the secondary user (i.e., the researcher) to restore any particular
means of presentation. There will
likely be other ways as well. What I am
suggesting is that presentation should be an appraisal criterion simply because
it would establish certain requirements for the long-term preservation of the
record. Any archival migration strategy
would need to ensure that the visual characteristics of the records in their
native format that were appraised as
giving value to the record were accommodated in whatever new format or
environment to which the records were migrated. Failure to preserve these values might make the record not worth
preserving. The Final Report of the Victorian Electronic Records Strategy
emphasizes this as follows:
From an archival
perspective, it is important that both the content and structure are accurately
captured. The captured record should be
identical in appearance to the original document as it was viewed by the
creator of the record.[10]
Thus, Jim's Calendar as it appears in illustration 5 does not meet the
requirement articulated in the VERS Final
Report, although this format has much to commend it purely in terms of
preservation.
Like durability, this criterion does not have
an analogy in the paper environment.
Record characteristics and functionalities are not, I think, as variable
in paper form as they are in electronic form because in paper technology
content and structure are normally closely linked.
Manipulability
Manipulability is greatest
when the record resides in its native format and in its operational
environment. It is a curiously
paradoxical value from an archival standpoint.
One might think that archivists would reject manipulability root and
branch as a desirable value in records.
Yet this very value is consistently identified as desirable at least
since 1984 when Naugler wrote:
For those machine-readable
records containing information duplicated by textual records, the
machine-readable records will, in the majority of cases, be appraised as having
the better arrangement because they have greater manipulability.[11]
Manipulability can apply to
all three record components either singly or collectively. In appraisal therefore, the archivist must
decide for which components manipulability is archivally valuable. In Illustration 5 the potential for
manipulation is very different from what it is in its native format. For example, I can replace all the R's with
Q's, but cannot manipulate the data elements in the way that is possible in
Outlook, its native format. Thus if
manipulability of the content in terms of discrete data elements is archivally
valuable, then preserving the content as illustrated in Illustration 5 means
the researcher must be alerted to this value, and presumably is expected to
bring a resource to the record to restore that manipulability.
Manipulability, like the
presentation/rendering criterion, is linked to durability. Durability will be irrelevant where the
manipulable characteristics of the native environment are not desired, i.e.,
when a very low manipulability helps preserve the value of the record.[12]
Technical Context
I address the technical context criterion
last because it seems already near to adoption in Canada. The most recent RAD draft chapter for the description
of electronic records that I have seen (April 2000) contains a requirement for
a description of the system in cases where it is deemed significant to an
understanding of the unit being described.[13] The specific descriptive elements listed in the
chapter are not important here, but the recognition that system information can
be significant to understanding records is.
This recognition helps define the
criterion. For example, how is one to
determine whether the technological context is indeed significant to
understanding the records? And if it
is, what aspects of that context are significant? Knowing that Jim's Calendar was created in a networked
environment would be important because it is possible that contributors other
than the creator may have affected the content by entering, changing, or
deleting information. Knowing that MS
Outlook was specifically designed to support that capability would alert an
appraiser to learn who had such privileges, or to account for the impact of
that fact on the record's value. It
might also be useful for the appraiser to know that Outlook allows appointments
to be made "in the past" or deleted without trace.
Illustration 6 provides yet another a view of
Jim's Calendar, this time as comma separated values loaded into a Microsoft
Excel spreadsheet. An aspect of the
technical context that it illustrates
Illustration 6:
Jim's Calendar, comma separated values loaded into MS Excel.
is that the data is in a different order than
it appeared in the native format - still chronologically, but by date of entry
rather than by date of event. Date of
entry information can be found within Outlook, but the aspect of the technical
context illustrated here is that the date of entry information was not exported
with the content of the record, except in the way the data was ordered during
the extraction process.
Conclusion
It is on the basis of
the unique characteristics of electronic records and the independence of
content, context and structure, that this paper proposes new appraisal
criteria. Existing criteria remain
useful and relevant, and for this reason it has been my goal to propose
criteria where the analogy to existing criteria is weak or absent. In the case
of durability, while it is true that paper records are reformatted onto
microfilm or digital formats, this situation is more the exception than the
rule. Whereas in the electronic
environment reformatting is rightly assumed to be a fact not only of a record's
archival life, but of its operational life as well. Thus durability is a temporal criteria, directly addressing the
changing nature of electronic records over time.
Presentation or
rendering in the paper environment follows a culture centuries old and uses a
technology that is equated with the term "fixed". That there is no comparable tradition in the
electronic environment is clear at least to Dick Brass, Vice President of
Microsoft's eMerging Technologies group. He said in an interview that
… his group is not "anti-paper," that they love it, they
venerate it. 'We respect it, and we think the tragedy of computing to date is
that we didn't sufficiently imitate it.'[14]
The
manipulability criterion is a counterpoint to the presentation/rendering
one. Manipulability provides for the
greater or lesser manipulation of content, structure and context. As the contrast between a paper calendar and
an electronic one makes clear, "archiving" the paper calendar is an
all or nothing exercise, whereas with the electronic calendar content,
structure and technical context can be preserved wholly or only in part. For example, in Jim's Calendar, content can
be wholly preserved, while at the same time preserving no structure and only a
part of the technical context.
Of
the four proposed criteria the technical context criterion is perhaps the one
most closely linked to existing criteria for paper records as it reflects some
of processes involved in creating the record.[15]
Another goal of mine has been to propose
criteria that address one or more of the facets of a record, i.e., the criteria
may not deal with the record as a whole, but merely one of its components. Archivists routinely make decisions about
the essential context to preserve records in the paper environment. This decision-making process is reflected in
the functional appraisal model where functions are appraised and only those
that are deemed significant in some way are documented. Thus, housekeeping records are routinely
destroyed because their importance to the essential context of the more valuable
records is considered to be low.
Because archivists
have always approached context in terms of essence I believe this practice to
be valid. Archivists seek to determine the
essential contextual elements to preserve meaning and authenticity through
arrangement, description and the custodial chain of ownership. Archivists
now need to address with equal confidence appraising not only the record
as a whole, but its individual components as well. What technical structure or structures are needed to preserve the
values ascribed to a series of records?
Not addressing this matter results in either preserving all
structure(s), something which I have tried to illustrate as being difficult in
the extreme, or a more random preservation, based on preservation priorities or
convenience rather than appraisal considerations.
Angelika Menne-Haritz
writes:
Appraisal
is a body of methods and techniques to destroy in order to preserve. Preservation as a professional task results
from this concept. By destroying
consciously and in a responsible way the remains are saved and can be
appreciated in all dimensions of their value.[16]
If this is true, then archivists have to
determine what elements of a record can be destroyed that will still leave the
essence of the record intact. It is essential to do this because we know that
there are changes in content, structure and context even during the operational
life of an electronic record. We know
that converting records or data to new platforms, even open ones, has an impact
on structure and context, and even on content.
As
well, new appraisal criteria must be sensitive to the limitations of the
technology contemporary with the records under review. Recordkeeping initiatives are underway to
ensure that better electronic records will be created in the future. This too is an appraisal exercise, and it is
valuable to recall the observation made by Lily Koltun with reference to
electronic records:
These are not in fact
archives whose value is derived from their office of origin, but from the
theorizing and selection principles of archivists who identify their source and
scope, judge their value, select and preserve them prior to their creation and
then “appraise” them once again post-creation.[17]
So it is clear that
archivists are defining what the archival record is even now. In her article "Are We Collecting the
Right Stuff" Carolyn Heald wrote, "I fundamentally disagree with the notion
that archives store information; we store artifacts in which information
inheres.”[18] Archival appraisal must have criteria that
help illuminate the right virtual stuff.
So
far as possible I have adhered to the words of the text; but Icelandic is a
highly idiomatic language, and Icelandic idiom is not English idiom. I have not
hesitated therefore, in departing from the verbal idiom in order to preserve
the sense.
G.H.Hight in his 1913 Translator's Introduction to
The Saga of Grettir the Strong
Version
1.02, 2 November 2001
[1] Jim Suderman is the
Coordinator of the Electronic Records Program at the Archives of Ontario in Toronto,
Canada. Jim will be discussing Archives
of Ontario electronic records implementation
issues in his upcoming paper "Implementing Custody of Electronic Records
at the AO" for presentation at the Association of Canadian Archivists
annual conference in Vancouver, British Columbia, 20-25 May 2002.
[2] Terry Cook.
"Mind over Matter: Towards a New Theory of Archival Appraisal" in
Barbara L. Craig, ed. The Archival
Imagination. Essays in Honour of Hugh A. Taylor (Ottawa: Association of
Canadian Archivists, 1992), 58.
[3] National
Archives of Australia. Management of
Electronic Records, Appendix 3 “Preserving
Electronic Records through Migration.”
[4] National
Archives of Australia. Keeping
Electronic Records, chapter 4 "Records - Their Creation and Management."
[5] Harold
Naugler. The Archival appraisal of
machine-readable records: a RAMP study with guidelines (Paris: United Nations Educational, Scientific and
Cultural Organization (UNESCO), 1984), para. 1.46, p. 14.
[6] See ICA Committee on Electronic Records. Guide for Managing Electronic Records from an Archival Perspective (Paris: ICA, 1997), 48. Emulation is defined as"One system is said to emulate another when it performs in exactly the same way, though perhaps not at the same speed." (Free On-Line Dictionary Of Computing). Difficulties listed in the Guide include 1) an unwarranted assumption that it will be possible to run any operating system under an emulator indefinitely into the future, 2) the need to emulate only part of the native application to prevent creation or manipulation of the preserved records, and 3) emulation entails an ever expanding requirement for in-depth expertise in obsolete software.
[7] Informational Value: The value of records/archives for reference and research deriving from the information they contain as distinct from their evidential value. Definition taken from Peter Walne, ed. Dictionary of Archival Terminology 2nd revised ed., ICA Handbooks Series Volume 7 (Munich: K.G. Saur, 1988).
[8] Evidential
value: The value of records/archives of
an institution or organization in providing evidence of its origins, structure,
functions, procedures and significant transactions as distinct from
informational value. Walne, Peter, ed. Dictionary
of Archival Terminology 2nd revised ed., ICA Handbooks Series
Volume 7 (Munich: K.G. Saur, 1988).
[9] Durability
also affects the archival functions of acquisition, description and
preservation. These functions are
presumably simplest in cases where durability is either high, because the
native format still endures (as is the case with simple text records preserved
in their native format), or not important because the context and meaning
provided by the native software is not deemed significant to understanding the
record.
[10] Victorian Electronic Records Strategy Final Report (1998) section on Record Capture, sub-section entitled "Capture of Content and Structure", p. 18.
[11] Naugler,
paragraph 4.12, p. 59. See also the Kansas State Historical Society. Kansas Electronic Records
Management Guidelines [2000], section 7.2 deals with appraisal
criteria, with sub-section 7.2.2 dealing specifically with manipulability.
[12] In this
context it is interesting to note that the Public Record Office in the United
Kingdom identifies three format types appropriate for electronic records in its
publication Management, Appraisal and
Preservation of Electronic Records:
1) transfer formats, 2) preservation formats, and 3) presentation
formats. Public
Record Office. Management,
Appraisal and Preservation of Electronic Records. Section 4.16.
[13] RAD chapter 9 "Records in
Electronic Form" revised version - draft for comment - April 2000, rule
9.7D2. Some specific elements include
system name and developer, hardware, operating system, and network
configuration.
[14] Kim
Honey. "Beyond Paper, Part
I", The Globe and Mail, Monday,
March 5, 2001, p. R3. Dick Brass is the
Vice President of Microsoft's eMerging Technologies group.
[15] Michael
Wettengel, Senior Archivist, Electronic Records Division of the Federal
Archives in Germany identifies three contexts:
structural, functional and technical.
Michael Wettengel. "Old
Traditions and New Uncertainties. The German Archival Concept of a Record and
Electronic Environments" in The
Concept of Record. Report from the Second Stockholm Conference on Archival
Science and the Concept of Records, 30-31 May 1996 (Lund: Riksarkivet,
1998), pp. 139-40. Note that I
have not addressed cost as an appraisal criterion. This is due to the structure of resource allocation for archives,
which can be generalized to say that it is optimized for the support of paper
records. Supporting archival records in
electronic formats is seen by contrast to be costly. At some point, and certainly in some institutions the process is
already well underway, the structure of resource allocation will be modified to
better support electronic records. At
this point then, it is impossible to set a meaningful cost criterion.
[16] Angelika
Menne-Haritz. "What Can be Achieved with Archives?" The Concept of Record. Report from the
Second Stockholm Conference on Archival Science and the Concept of Record,
30-31 May 199 .p. 15.
[17] Lily
Koltun. “The Promise and Threat of Digital Options in an Archival Age” Archivaria 47 (Spring 1999), 123. Koltun’s article goes far beyond the scope
of this paper as is illustrated by the sentence immediately preceding the one
quoted: “So now we have the full and
staggering implication: that digital data represent the first medium collected
by archives which can be totally dependent on the ‘archiving function’ for its
birth, its definition of value, and its continued life.”
[18] Carolyn
Heald. “Are We Collecting The ‘Right Stuff’?” Archivaria 40 (Fall, 1995), 182-188.