Content Management
& Preservation
The OpenReader™ Consortium Project ,
CIOs, IT Directors, Electronic Records and Digital Library Managers Take Note!
"OpenReader™ is a cooperative project to create next-generation software for reading digital publications. The software and accompanying format are for books, periodicals, newspapers, business documents, and other similar types of publications — most any type of content best presented in a page-based manner. The OpenReader System will be open source, built upon XML and related open standards....The OpenReader format, now in the initial stages of development, will be a single, portable, compressed archive file which will internally contain (“encapsulate”) a recognized XML/CSS-based framework representing one or more publications...."
This project has enormous potential for long-term access to electronic records and other digital objects. Originally conceived as a multi-media reader for e-books, newspapers and other publications, it will also be developed for digital documents including electronic records. Principal founders of the OpenReader™ Consortium Founders include Jon Noring, Rick Barry and David Rothman. Further information may be found here. The founders are seeking inputs from the archives and records management community on ways in which to enhance OpenReader so as to more fully accommodate electronic records access needs in all major native formats, including multi-media records. Please contact Rick Barry at <rickbarry at aol.com>.
Dave Austin, Corporate
Blogger, Intraware, Inc., "A
Very Brief Look at Blogging for the Uninitiated Executive," posted
to Global PR Blog Week, July 13,2004.
This crisp mini-paper on corporate web logs (blogs) offers some excellent
insights into corporate blogging (often used as a substitute for, or corollary
to, traditional corporate Web sites) for business communications. It also
provides a model example of what a well done blog looks like. Most blogs are not
time limited. This one, like a traditional professional conference, happens to
have been limited to a specific one-week period of time to address a specific
topic. It offers an early warning signal to CIOs, archivists, records managers
and other information management professionals to be on the lookout in their own
organizations to ensure that when blogging is under consideration in their own
organizations, appropriate measures are taken to include such applications in
the enterprise information architecture and to address recordkeeping
implications. (See also Blogging in a Crisis, by Jim Horton, below.)
Richard E.
Barry, Barry Associates,
"Factoring
Web Technologies into the Knowledge Management Equation...for the Record,"
keynote presentation to the Records Management Association of Australia, March
1999.(Requires PDF reader.)
Richard E. Barry, Barry
Associates. Catching
Up with the Last Technology Train at the Next Station. This
paper is an update of one that originally appeared in the September 1996 issue
of The Record, a publication of the U. S. National Archives and Records
Administration. It reflects
significant changes in technology and in the use of technology since it was
first written in the summer of 1996.
F. Boudrez, "<XML/>
and electronic recordkeeping"
F. Boudrez and S. Van
den Eynde, "Archiving
websites"
Timo
Burkard, "Herodotus:
A Peer-to-Peer Web Archival System" submitted to the Department of
Electrical Engineering and Computer Science in partial fulfillment of the
requirements for the degree of Master of Engineering in Electrical Engineering
and Computer Science at the Massachusetts Institute of Technology, May 2002, ©
Timo Burkard, MMII. All rights reserved. Like
the Wayback Machine web archive, Herodotus periodically crawls the world wide
web and stores copies of all downloaded web content. However, Herodotus does not
rely on a centralized server farm. Rather, many individual nodes across the
Internet collaboratively perform the task of crawling and storing the content,
allowing a large group of contributors' idle computer resources to jointly
achieve the goal of creating an Internet archive. Herodotus uses replication to
ensure the persistence of data as nodes join and leave.
Chandra
Chekuri, Michael
H.Goldwasser, Computer Science
Department, Stanford University, Prabhakar
Raghavan, Eli Upfal,
IBM Almaden
Research Center, "Web
Search Using Automatic Classification," Currently available search
tools suffer either from poor precision (i.e., too many irrelevant
documents) or from poor recall (i.e., too little of the Web is covered
by well-categorized directories). We address this by developing a search
interface that relies on the automatic classification of Web pages. Our
classification builds on the Yahoo! taxonomy,
but differs in that it is automatic and thus capable of covering the whole Web
substantially faster than the (human-generated) Yahoo! taxonomy.
Chief Information Officers Council,
"Securing Electronic
Government," the report of the Council's Security, Privacy, and Critical Infrastructure Committee, January 19, 2001.
Cornell University "Digital Preservation Tutorial".
This tutorial is the proud recipient of the 2004 SAA
Preservation Publication Award.
Patricia Daukantas, What
on Web merits saving? Webmasters agree
that not everything is archive-worthy. Exactly which parts of an
agency’s Web site constitute federal records, subject to rules governing
retention and disposition, depends on the agency in question.
Mark
Giguere, “Overview
of Major Concepts in the Proposed NARA Web Guidance,” a presentation
synopsis in the “Bimonthly Records and Information Discussion Group (BRIDG)
Meeting Summary,”
"Guidelines
for State Government websites," Government of Western
Australia Department of Industry and Technology (DoIT). Governments around the
world have recognised the need for a consistent approach to online service
delivery. DoIT has also recognised this need and has released a set of Guidelines
for State Government Web Sites, (July 2002), which were approved by State
Cabinet in June 2002.
Amarnath
Gupta, “Preserving
Presidential Library Websites, A Case Study with the Franklin D. Roosevelt
Library, Museum and Digital Archives” San Diego Supercomputer Center,
SDSC TR-2001-3, January 18, 2001.
John B.
Horrigan, “How
Americans Get in Touch with Government,” report of a survey of the Pew
Internet & American Life Project, May 2004, Washington, D.C., May
2004. "Internet users are increasingly turning to e-government sites to
carry out their business with government. But Internet users and non-users alike
value having more than one way to get in touch with government."
Jim Horton, "Blogging in a Crisis", posted to Global PR Blog Week, July 13,2004. Like Dave Austin's mini-paper on corporate web logs (blogs) above, this posting also offers some excellent insights into corporate blogging (often used as a substitute for, or corollary to, traditional corporate Web sites) for business communications. In this case, the author discusses how the use of web log technology could have aided in a major crisis situation in which he was involved that involved rapidly changing events, Congressional hearings, rumors, facts, demonstrations, hate mail, etc. The author says, "How would blogging fit into a situation like this? Blogging, as some define it -- a place to record opinion and insights -- does not fit. However, blogging as a continuous record of facts and corrections of errors in near real time would have been valuable. Regrettably, the client did not use the blogging tool but did make use of its Web page. A key difference between a Web page and blogging was critical. The corporate communications director reli8ed on the Webmaster to upload information to the Web page. With a blog, the director could have created a content stream directly. Speed was critical." The recordkeeping implications of this kind of blog application -- or Web application for that matter -- is fairly obvious. See also Dan Austin's paper above.
Jonathan Lazar,
Dr. Charles R. McClure and Dr. J. Timothy Sprehe. "Solving Electronic Records Management
(ERM) Issues for Government
Websites: Policies, Practices, and Strategies: Conference Report on
Questionnaire and Participant Discussion, April 22, 1998."
William
LeFurgy, "Records
and Archival Management of World Wide Web Sites," April 2001. By
now, virtually all organizations have set up web sites to provide information
and conduct business. As web sites grow, so does dependency on them for
accountability, evidence, and other purposes that require recorded
documentation. Organizations must take steps to manage content on web sites as
information resources and, in some cases, as records. This is an enormous
challenge.
Susan
S. Lukesh, "E-mail
and Potentail Loss to Future Archives and Scholarship or The Dog that Didnt Bark,"
First Monday, Peer-Reviewed Journal on
the Internet, Volume 4 Number 9 — September 6th 1999 pattern
has emerged in starting presentations on the preservation of electronic
materials: Disaster! In 1975, the U.S. Census Bureau discovered that only two
computers on earth can still read the 1960 census. The computerized index to a
million Vietnam War records was entered on a hybrid motion picture film carrier
that cannot be read. The bulk of the National Aeronautics and Space
Administration's research since 1958 is threatened because of poor storage.
These tales are akin to Jorge Luis Borges's short story in which the knowledge
of the world is concentrated in one mammoth computer - and the key is lost. The essential question for the Information Age may well
be how to save the electronic memory (Stielow: 333)."
Charles R. McClure,
and J. Timothy Sprehe, consultants to NHPRC. Final Report developed as
part of an NHPRC grant project. "Analysis and Development of Model Quality Guidelines for Electronic
Records Management on State and Federal Websites: Final Report."
Charles R. McClure, and J. Timothy Sprehe, consultants to NHPRC. This accompanies the Final
Report above, developed as part of an NHPRC grant project. "Guidelines for Electronic Records Management on State and Federal Agency
Websites.."
NARA
Guidance on Managing Web Records,
Introduction
Nigel
McFarlane
, "XML
simply the best
," an excellent current status, forward and backward review
of XML, Sydney Morning Herald, December 10
2002.
Sarah
Mitchell, LOC
to save data 'born digital',
February 21, 2003, on the Library of Congress plan for preserving Web
sites, CDs, electronic journals and other digital information as part of the National
Digital Information Infrastructure and Preservation Program and how archivists face the daunting task of
figuring out just how to save information that was 'born digital.'"
Glyn Moody, "A New
Dawn" in New Scientist, 30 May 1998, and "oldie"
but still good lay article on
HTML, XML, XSL, RDF and other emerging metalanguage standards.
National Archives of Australia, as
part of its Australasian Digital Recordkeeping Initiative (ADRI),
has published the "AGLS
Metadata Element Set". This standard set out a 19-metadata element
model based on the Dublin Core metadata set, including five mandatory elements:
Creator, Title, Date, Subject or Function, Identifier or Availability.
Catherine Nicholls, Web
Archiving Strategy Project (WASP), University
of Melbourne, Australia, "Creating
Road Signs and Encouraging Safe Driving on the Information Superhighway:
Accountability and Compliance in the Web Archiving Environment" (PDF),
presented at the Association of Canadian Archivists 29th Annual Conference,
Montreal, May 2004. "So when it comes to developing the Web Archiving
Strategy, what should the focus be on? Are we only concerned with managing web
pages like those containing University statues and regulations, or should we
also be concerned with the copies of the council minutes published on the
University website? There is also the concern about publishing false or
misleading information and what it means to the University in terms of its
compliance with the Trades Practices Act. When it comes down to it, what
is the University really accountable for in terms of managing its web page
information and web records over time and space?" See
other related papers produced by WASP.
Catherine Nicholls
and Jon-Paul Williams, Web
Archiving Strategy Project (WASP),
University
of Melbourne, Australia. "Identifying
Roadkill on the Information Superhighway: A Website Appraisal Case Study
Public
Records Office, UK. "Management
of electronic records on websites and intranets: an ERM toolkit" (pdf)
"Website management has often been
seen as the preserve of IT specialists, press/communications functions and
librarians. In government, it also needs rigorous records management input. This
is a point that has often been overlooked. The primary intended audience for
this toolkit is records managers in government, web project managers or IT and
information managers with information and records management responsibilities.
Some aspects may be of assistance to business managers. It assumes a reasonable
level of general IT and information literacy but is not written from a technical
IT perspective."
A
voice from the near future,"
March 18, 2002, on recently developed VoiceXML specifications. "VoiceXML
is a variation of Extensible Markup Language, which serves as something of a
universal translator, tagging data so that different computers know how to
process or present it. In the case of VoiceXML, it's a matter of translating
dialogue between humans and computers."
David
Rothman, author (see Guest Authors), one of
the pioneers of e-Books and founder of the TeleRead
Project. See his TeleRead blog
and the blog discussion of the integration of Canada's
National Library and National Archives and the potential for integration
of cultural heritage information resources.
Thomas J. Ruller, "Open All Night: Using the Internet to Improve Access to Archives: A Case Study of the New York State Archives and Records Administration." This is an excellent commentary on how the access to archival assets (not only indices) can be of an enormous value to the client community but also to the archival organization making use of this technology.
MacKenzie
Smith, Associate Director for Technology
MIT Libraries, "DSpace:
An Open Source Dynamic Digital Repository."
In March 2000, Hewlett-Packard Company (HP) awarded $1.8 million to the
MIT Libraries for an 18-month collaboration to build a Durable Digital
Depository, DSpace™, a dynamic repository for the intellectual output in
digital formats of multi-disciplinary research organizations. As an open source
system, DSpace is now freely available to other institutions to run as-is, or to
modify and extend as they require to meet local needs. From the outset, HP and
MIT designed the system to be run by institutions other than MIT, and to support
federation among its adopters, in both the technical and the social sense. Links
for downloading the free open source are located in the Special
Resources section.
Smithsonian Institution Archives
Website Archives Project. Below are three reports on the Smithsonian Institution
Archives Website Archives Pilot Project. While designed to meet the dynamic
needs of the Smithsonian Institution, the project has considerable relevance to
most organizations facing the challenge of archiving enterprise websites.
1) Charles Dollar, Dollar Consulting, "Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices," July 20, 2001. This paper has been substituted for Dollar's slide presentation on this subject to the 2001 Society of American Archivists Annual Conference, previously located here. It summarizes Dollar's white paper for the Smithsonian Institution on issues and options relating to the management of websites for both regulated and unregulated organizations in an emerging content management environment.
2) Charles Dollar, Dollar Consulting, "Archival Preservation of Web Resources: HTML to XHTML Migration Test Considerations, Evaluatin, and Recommendations," July 1, 2002. "This report presents the results of a study undertaken by Dollar Consulting for the Smithsonian Institution Archives (SIA) as part of a larger effort to test and evaluate the feasibility of preserving Web sites and HTML pages in an accessible, usable and trustworthy form for as far into the future as is necessary."
3)
Smithsonian
Institution Archives Records Management
Team, “Archiving
Smithsonian Websites: An Evaluation and Recommendation for a Smithsonian
Institution Archives Pilot Project,”
Viechnicki,
Peter , Vredenburg, an AMS Company,
"Using Link Analysis to Leverage Enterprise Data" December,
2003, a discussion of link analysis strategies with focus on fraud detection and
intelligence applications.
Adam
K. Watts, "XML
Briefing for Managers"
in Government Technology, August
2001, on how XML differ from HTML? Is it something new that will sweep away all the hard
work on your portal, or can XML and HTML co-exist?
WEB ARCHIVES ON LINE: Visit three web archive sites to get a feel for
how this dynamic information is being captured on the World Wide Web. See Special
Resources section.
Back to Home Page