Archivists enjoy good satire.
Click here to access the full story. It is not far-fetched to assume that The Onion has a team of information professionals on staff. Ridiculousness aside, there is intelligent thought (i.e. archival theory) behind this veiled critique of narcissistic Facebook users. One such theory is that of appraisal:
…the process of determining whether records and other materials have permanent (archival) value. Appraisal may be done at the collection, creator, series, file, or item level. Appraisal can take place prior to donation and prior to physical transfer, at or after accessioning. The basis of appraisal decisions may include a number of factors, including the records’ provenance and content, their authenticity and reliability, their order and completeness, their condition and costs to preserve them, and their intrinsic value. Appraisal often takes place within a larger institutional collecting policy and mission statement. (courtesy of the Society of American Archivists’ Glossary of Archival and Records Terminology)
Simply, appraisal is the process whereby archivists determine what should be retained in a new collection/accession with consideration of an item’s administrative, legal, and/or historic value. An excellent web resource from the National Archives and Records Administration’s (NARA) records management unit details this process as it applies to federal records. While archivists strive to be as objective and neutral as possible with collections in their custody, appraisal is often criticized as a subjective process that is the result of one person’s (or a small number of people’s) inherent bias(es) when surveying a collection. However, some relatively easy items can be removed from a collection during the appraisal process. These can include:
- Redundant/duplicative material
- Photographs that are clearly blurry and/or of questionable quality
- Material that is the physical and/or intellectual property of another agency or individual
- Sensitive items that contain social security numbers or other information (health, educational) that is protected through federal legislation designed to safeguard the privacy of individuals
Archives and archivists worldwide are currently facing difficult realities with the handling of born-digital materials. Digital photographs, films, documents, databases, spreadsheets, and e-mails–among many other types of electronic records–are beginning to arrive at their institutional doorsteps in greater quantity and many are minimally-equipped to preserve them, let alone facilitate their access and retrieval. Rather than spend this entire post discussing all facets of electronic records, we will briefly focus your attention on one specific aspect of their management: appraisal.
The woman in the above-linked satirical piece would have benefited from a thorough appraisal of her images prior to uploading them in Facebook. Nobody wants to scroll through 12 million images from somebody’s vacation. It can only be assumed that the woman in this fictionalized story was far more concerned with access to the images than she was about their long-term preservation. At Special Collections and Archives (SCA), we would focus our efforts–at least initially–in preserving the images as long as possible. We would remove all digital files from outdated storage devices (floppy disks, CDs, DVDs, portable drives, etc.) and move them onto a centralized server with measures in place to ensure that data is not lost due to corruption from the physical storage medium or potential viruses. Subsequent to this, we would appraise all files and ensure that an appropriate selection for long-term retention is made. With items preserved and a selection made, access to the material is more readily facilitated.
The digital world affords archivists the opportunity to automate portions of the appraisal process. For example, keyword searches across multiple files and folders can quickly isolate anything containing a social security number or other sorts of private information for subsequent redaction or deletion. A hexadecimal number unique to one specific file (known as a hash value) can be created for all items in a series of folders and subsequently compared for the purposes of identifying duplicates. However, digital photography–especially images taken by commercial photographers–can potentially yield hundreds of files taken over a brief period of time that are nearly identical one to another. What is an archives to do?
Some strategies are clear and involve little effort. If shooting in a proprietary file format (Canon’s is .cr2 and Nikon’s is .nef), a photographer’s favorite images (selections) will have been modified in editing software, such as Adobe Photoshop. These changes will not have affected the original file but instead a sidecar (.xmp) file will have been automatically created that captures any/all modifications made to the original file. Simply seeing that a .cr2 or .nef file has an associated sidecar (.xmp) file would suggest that this specific image was of importance to its creator, especially if one can verify that they were the only ones to handle these files after the image was captured. All other items without these sidecar files could potentially be considered for deletion. If the opportunity is available, working with the original photographer on making a selection for permanent retention would be a very useful appraisal exercise, but most institutions will not have this luxury.
Neither of the above scenarios are always available to archivists. Advancements are being made on a regular basis that will help further automate the identification of nearly-identical items that could also be explored as a tool for appraisal. For example, ssdeep is a free program that can be used to analyze context triggered piecewise hashes (CTPH). Also referred to as ‘fuzzy logic,’ ssdeep looks at multiple files and confirms sequences of identical bytes in the same order, but will flag differences between these sequences in both content and length. These subtle changes between files could potentially help in the identification and deletion of near-identical digital photographs in a folder.
No solution will be perfect, but building a solid set of tools and applying archival theory to the appraisal process will help an institution appropriately manage its digital assets. Also important is experimentation and dissemination of efforts. Albert Einstein once said, “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.” SCA looks to the broader archival profession for answers and solutions, but often its most successful efforts have been the result of experimentation. We look forward to sharing our challenges and successes about practical digital appraisal–among many others–with our archival colleagues in the near future.