Content Management & Preservation

     The OpenReader™ Consortium Project , CIOs, IT Directors, Electronic Records and Digital Library Managers Take Note! 

    "OpenReader™ is a cooperative project to create next-generation software for reading digital publications. The software and accompanying format are for books, periodicals, newspapers, business documents, and other similar types of publications — most any type of content best presented in a page-based manner. The OpenReader System will be open source, built upon XML and related open standards....The OpenReader format, now in the initial stages of development, will be a single, portable, compressed archive file which will internally contain (“encapsulate”) a recognized XML/CSS-based framework representing one or more publications...." 

    This project has enormous potential for long-term access to electronic records and other digital objects. Originally conceived as a multi-media reader for e-books, newspapers and other publications, it will also be developed for digital documents including electronic records. Principal founders of the OpenReader™ Consortium Founders include Jon Noring, Rick Barry and David Rothman. Further information may be found here. The founders are seeking inputs from the archives and records management community on ways in which to enhance OpenReader so as to more fully accommodate electronic records access needs in all major native formats, including multi-media records. Please contact Rick Barry at <rickbarry at>.

    Dave Austin, Corporate Blogger, Intraware, Inc., "A Very Brief Look at Blogging for the Uninitiated Executive," posted to Global PR Blog Week, July 13,2004. This crisp mini-paper on corporate web logs (blogs) offers some excellent insights into corporate blogging (often used as a substitute for, or corollary to, traditional corporate Web sites) for business communications. It also provides a model example of what a well done blog looks like. Most blogs are not time limited. This one, like a traditional professional conference, happens to have been limited to a specific one-week period of time to address a specific topic. It offers an early warning signal to CIOs, archivists, records managers and other information management professionals to be on the lookout in their own organizations to ensure that when blogging is under consideration in their own organizations, appropriate measures are taken to include such applications in the enterprise information architecture and to address recordkeeping implications. (See also Blogging in a Crisis, by Jim Horton, below.)

    Richard E. Barry, Barry Associates, "Factoring Web Technologies into the Knowledge Management Equation...for the Record," keynote presentation to the Records Management Association of Australia, March 1999.(Requires PDF reader.)


    Richard E. Barry, Barry Associates. Catching Up with the Last Technology Train at the Next Station. This paper is an update of one that originally appeared in the September 1996 issue of The Record, a publication of the U. S. National Archives and Records Administration.  It reflects significant changes in technology and in the use of technology since it was first written in the summer of 1996.

    F. Boudrez, "<XML/> and electronic recordkeeping

    F. Boudrez and S. Van den Eynde, "Archiving websites" 

    Timo Burkard, "Herodotus: A Peer-to-Peer Web Archival System" submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, May 2002, © Timo Burkard, MMII. All rights reserved. Like the Wayback Machine web archive, Herodotus periodically crawls the world wide web and stores copies of all downloaded web content. However, Herodotus does not rely on a centralized server farm. Rather, many individual nodes across the Internet collaboratively perform the task of crawling and storing the content, allowing a large group of contributors' idle computer resources to jointly achieve the goal of creating an Internet archive. Herodotus uses replication to ensure the persistence of data as nodes join and leave.

    Chandra Chekuri, Michael H.Goldwasser, Computer Science Department, Stanford University, Prabhakar Raghavan, Eli Upfal, IBM Almaden Research Center, "Web Search Using Automatic Classification," Currently available search tools suffer either from poor precision (i.e., too many irrelevant documents) or from poor recall (i.e., too little of the Web is covered by well-categorized directories). We address this by developing a search interface that relies on the automatic classification of Web pages. Our classification builds on the Yahoo! taxonomy, but differs in that it is automatic and thus capable of covering the whole Web substantially faster than the (human-generated) Yahoo! taxonomy.

    Chief Information Officers Council, "Securing Electronic Government," the report of the Council's Security, Privacy, and Critical Infrastructure Committee, January 19, 2001.

    Cornell University "Digital Preservation Tutorial". This tutorial is the proud recipient of the 2004 SAA Preservation Publication Award

    Patricia Daukantas, What on Web merits saving?  Webmasters agree that not everything is archive-worthy. Exactly which parts of an agency’s Web site constitute federal records, subject to rules governing retention and disposition, depends on the agency in question.

    Mark Giguere, “Overview of Major Concepts in the Proposed NARA Web Guidance,” a presentation synopsis in the “Bimonthly Records and Information Discussion Group (BRIDG) Meeting Summary,” April 21, 2001 , National Archives and Records Administration, Washington , D.C.  

    "Guidelines for State Government websites," Government of Western Australia Department of Industry and Technology (DoIT). Governments around the world have recognised the need for a consistent approach to online service delivery. DoIT has also recognised this need and has released a set of Guidelines for State Government Web Sites, (July 2002), which were approved by State Cabinet in June 2002.

    Amarnath Gupta,  “Preserving Presidential Library Websites, A Case Study with the Franklin D. Roosevelt Library, Museum and Digital Archives” San Diego Supercomputer Center, SDSC TR-2001-3, January 18, 2001.

    John B. Horrigan, “How Americans Get in Touch with Government,” report of a survey of the Pew Internet & American Life Project, May 2004, Washington, D.C., May 2004. "Internet users are increasingly turning to e-government sites to carry out their business with government. But Internet users and non-users alike value having more than one way to get in touch with government."

    Jim Horton, "Blogging in a Crisis", posted to Global PR Blog Week, July 13,2004. Like Dave Austin's mini-paper on corporate web logs (blogs) above, this posting also offers some excellent insights into corporate blogging (often used as a substitute for, or corollary to, traditional corporate Web sites) for business communications. In this case, the author discusses how the use of web log technology could have aided in a major crisis situation in which he was involved that involved rapidly changing events, Congressional hearings, rumors, facts, demonstrations, hate mail, etc. The author says, "How would blogging fit into a situation like this? Blogging, as some define it -- a place to record opinion and insights -- does not fit. However, blogging as a continuous record of facts and corrections of errors in near real time would have been valuable. Regrettably, the client did not use the blogging tool but did make use of its Web page. A key difference between a Web page and blogging was critical. The corporate communications director reli8ed on the Webmaster to upload information to the Web page. With a blog, the director could have created a content stream directly. Speed was critical."  The recordkeeping implications of this kind of blog application -- or Web application for that matter -- is fairly obvious. See also Dan Austin's paper above. 

    Jonathan Lazar,   Dr. Charles R. McClure and Dr. J. Timothy Sprehe. "Solving Electronic Records Management (ERM) Issues for Government Websites: Policies, Practices, and Strategies: Conference Report on Questionnaire and Participant Discussion, April 22, 1998.

    William LeFurgy, "Records and Archival Management of World Wide Web Sites," April 2001. By now, virtually all organizations have set up web sites to provide information and conduct business. As web sites grow, so does dependency on them for accountability, evidence, and other purposes that require recorded documentation. Organizations must take steps to manage content on web sites as information resources and, in some cases, as records. This is an enormous challenge. 

    Susan S. Lukesh"E-mail and Potentail Loss to Future Archives and Scholarship or The Dog that Didnt Bark,"  First Monday, Peer-Reviewed Journal on the Internet, Volume 4 Number 9 — September 6th 1999   pattern has emerged in starting presentations on the preservation of electronic materials: Disaster! In 1975, the U.S. Census Bureau discovered that only two computers on earth can still read the 1960 census. The computerized index to a million Vietnam War records was entered on a hybrid motion picture film carrier that cannot be read. The bulk of the National Aeronautics and Space Administration's research since 1958 is threatened because of poor storage. These tales are akin to Jorge Luis Borges's short story in which the knowledge of the world is concentrated in one mammoth computer - and the key is lost. The essential question for the Information Age may well be how to save the electronic memory (Stielow: 333)." 

    Charles R. McClure,  and  J. Timothy Sprehe, consultants to NHPRC. Final Report developed as part of an NHPRC grant project. "Analysis and Development of Model Quality Guidelines for Electronic Records Management on State and Federal Websites: Final Report.

    Charles R. McClure, and J. Timothy Sprehe, consultants to NHPRC. This accompanies the Final Report above, developed as part of an NHPRC grant project. "Guidelines for Electronic Records Management on State and Federal Agency Websites..

    NARA Guidance on Managing Web Records 


    1. How are agencies currently using the web?
    2. Who has records management responsibilities for agency web sites?
    3. What statutory and regulatory requirements apply to agency web operations?
    4. What are Federal web site-related records?
    5. What web site records must be managed?
    6. Does managing agency web sites as Federal records mean that I must keep all page changes for a long time?

    Nigel McFarlane , "XML simply the best ,"  an excellent current status, forward and backward review of XML, Sydney Morning Herald, December 10 2002. 

    Sarah Mitchell, LOC to save data 'born digital'  February 21, 2003, on the Library of Congress plan for preserving Web sites, CDs, electronic journals and other digital information as part of the National Digital Information Infrastructure and Preservation Program and how archivists face the daunting task of figuring out just how to save information that was 'born digital.'" 

    Glyn Moody, "A New Dawn" in New Scientist, 30 May 1998, and "oldie" but still good lay article on HTML, XML, XSL, RDF and other emerging metalanguage standards.


    National Archives of Australia, as part of its Australasian Digital Recordkeeping Initiative (ADRI), has published the "AGLS Metadata Element Set". This standard set out a 19-metadata element model based on the Dublin Core metadata set, including five mandatory elements: Creator, Title, Date, Subject or Function, Identifier or Availability. 

    Catherine Nicholls, Web Archiving Strategy Project (WASP), University of Melbourne, Australia, "Creating Road Signs and Encouraging Safe Driving on the Information Superhighway: Accountability and Compliance in the Web Archiving Environment" (PDF), presented at the Association of Canadian Archivists 29th Annual Conference, Montreal, May 2004. "So when it comes to developing the Web Archiving Strategy, what should the focus be on? Are we only concerned with managing web pages like those containing University statues and regulations, or should we also be concerned with the copies of the council minutes published on the University website? There is also the concern about publishing false or misleading information and what it means to the University in terms of its compliance with the Trades Practices Act. When it comes down to it, what is the University really accountable for in terms of managing its web page information and web records over time and space?" See other related papers produced by WASP.


    Catherine Nicholls and Jon-Paul Williams, Web Archiving Strategy Project (WASP), University of Melbourne, Australia. "Identifying Roadkill on the Information Superhighway: A Website Appraisal Case Study ", originally published in the 'Archives and Manuscripts', Vol.30, Number 2, November 2002, pp. 96-111. "As we accelerate down the Information Superhighway, there is still a danger that many of the webpages and records created now are not going to be available for accountability and historical purposes in the future....The purpose of this article is to present a case study on the work of the University of Melbourne Web Archiving Working Group (WAWG). WAWG is the current driving force behind the University’s web archiving strategy. This article will begin by introducing WAWG and outlining its role and purpose. The article will then focus on the process Records Management Program (RMP) staff have been through in developing web archiving selection criteria." See other related papers produced by WASP. 

    Public Records Office, UK. "Management of electronic records on websites and intranets: an ERM toolkit" (pdf) "Website management has often been seen as the preserve of IT specialists, press/communications functions and librarians. In government, it also needs rigorous records management input. This is a point that has often been overlooked. The primary intended audience for this toolkit is records managers in government, web project managers or IT and information managers with information and records management responsibilities. Some aspects may be of assistance to business managers. It assumes a reasonable level of general IT and information literacy but is not written from a technical IT perspective."

    Brian Robinson, "A voice from the near future,"  FCW  March 18, 2002, on recently developed VoiceXML specifications. "VoiceXML is a variation of Extensible Markup Language, which serves as something of a universal translator, tagging data so that different computers know how to process or present it. In the case of VoiceXML, it's a matter of translating dialogue between humans and computers."

    David Rothman, author (see Guest Authors), one of the pioneers of e-Books and founder of the TeleRead Project. See his TeleRead blog and the blog discussion of the integration of Canada's National Library and National Archives and the potential for integration of cultural heritage information resources.  


    Thomas J. Ruller, "Open All Night: Using the Internet to Improve Access to Archives: A Case Study of the New York State Archives and Records Administration." This is an excellent commentary on how the access to archival assets (not only indices) can be of an enormous value to the client community but also to the archival organization making use of this technology.

    MacKenzie Smith, Associate Director for Technology
    MIT Libraries,
    "DSpace: An Open Source Dynamic Digital Repository." In March 2000, Hewlett-Packard Company (HP) awarded $1.8 million to the MIT Libraries for an 18-month collaboration to build a Durable Digital Depository, DSpace™, a dynamic repository for the intellectual output in digital formats of multi-disciplinary research organizations. As an open source system, DSpace is now freely available to other institutions to run as-is, or to modify and extend as they require to meet local needs. From the outset, HP and MIT designed the system to be run by institutions other than MIT, and to support federation among its adopters, in both the technical and the social sense. Links for downloading the free open source are located in the Special Resources section. 

    Smithsonian Institution Archives Website Archives Project. Below are three reports on the Smithsonian Institution Archives Website Archives Pilot Project. While designed to meet the dynamic needs of the Smithsonian Institution, the project has considerable relevance to most organizations facing the challenge of archiving enterprise websites.

    1) Charles Dollar, Dollar Consulting, "Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices," July 20, 2001. This paper has been substituted for Dollar's slide presentation on this subject to the 2001 Society of American Archivists Annual Conference, previously located here. It summarizes Dollar's white paper for the Smithsonian Institution on issues and options relating to the management of websites for both regulated and unregulated organizations in an emerging content management environment.

    2) Charles Dollar, Dollar Consulting, "Archival Preservation of Web Resources: HTML to XHTML Migration Test Considerations, Evaluatin, and Recommendations,"  July 1, 2002. "This report presents the results of a study undertaken by Dollar Consulting for the Smithsonian Institution Archives (SIA) as part of a larger effort to test and evaluate the feasibility of preserving Web sites and HTML pages in an accessible, usable and trustworthy form for as far into the future as is necessary." 


    3) Smithsonian Institution Archives Records Management Team, “Archiving Smithsonian Websites: An Evaluation and Recommendation for a Smithsonian Institution Archives Pilot Project,”   May 20, 2003. "Over the past eight years, the Smithsonian has greatly expanded its presence on the web, using its website to display virtual exhibits, expeditions, and field trips; provide primary and secondary research information and educational tools; and promote involvement in Smithsonian programs and commerce through business ventures, development, and museum shop sales. The presentation of this information and much of the information itself is often unique to the Smithsonian website...[It] is now apparent that historical documentation of such information will be lost if not captured electronically....The SIA Records Management (RM) Team reviewed the requirements outlined in Dollar Consulting's reports, and sought advice from Thomas J. Ruller (Independent Consultant for archivists and records managers working with records in electronic form), to evaluate the feasibility and requirements needed for implementing a project incorporating those recommendations." See Dollar's reports of July 20, 2001 and July 1, 2001.

    Viechnicki, Peter , Vredenburg, an AMS Company, "Using Link Analysis to Leverage Enterprise Data" December, 2003, a discussion of link analysis strategies with focus on fraud detection and intelligence applications. 

    Adam K. Watts,  "XML Briefing for Managers" in Government Technology, August 2001, on how XML differ from HTML? Is it something new that will sweep away all the hard work on your portal, or can XML and HTML co-exist?

    WEB ARCHIVES ON LINE: Visit three web archive sites to get a feel for how this dynamic information is being captured on the World Wide Web. See Special Resources section. 

Back to Home Page