Doing a million little tasks at once these days -- a kind of academic death by a thousand cuts that we're all familiar with -- one of which is securing image permissions for an essay I co-wrote with Martin Foys on the re-, dis- and unmediations that frame, shape and even softly determine our readings of two literary classics, Beowulf (Martin's portion of the essay) and Samuel Pepys' Diary (of course, mine).
Part of my argument rests on a re-reading of the now legendary story of John Smith, a poor beleaguered student who (the tale goes) spent years decoding the Diary's shorthand, never knowing that the "crack" to the "code" was sitting on the library's shelf all along, "just steps away from where he worked." The scare quotes do intend to scare: as I found in my research, a number of buried references to the Diary crop up throughout the eighteenth century, indicating the work was not entirely unknown until Smith's transcription. In fact, a now all-but-forgotten biography of William Weller Pepys, A Later Pepys by Alice Gaussen (1904), includes facsimile scans of a plan to transcribe and publish the Diary that presumably pre-dates Smith's work, thereby (I argue) exploding the typical origin story. The need for textual provenance is retroactive, applied only after a seventeenth-century manuscript is circumscribed by print scholarship -- "edition-ized" for academic consumption, as it were.
How did I find these traces of an eighteenth-century Pepys which have puzzled scholars for a century -- including the two editors who devoted a large chunk of their lives to studying this text? It has nothing to do with intelligence, and I (sadly) have no tales from the archival crypts. You can chalk it all up to Google Books.
In the introduction to their edition of the Diary (the only unbowdlerized edition ever published, and therefore for the purpose's of contemporary scholarship, the only edition), Latham and Matthews note finding a single reference to Pepys' Diary pre-transcription: a puzzling fact, they say, since (the narrative goes) the Diary mouldered on the stacks for 150 years before finally being "discovered." How could this individual have had 1) access to the Diary to quote it, and 2) knowledge of Pepys' shorthand to read the text?
I googled the quote and unearthed a few more references in early nineteenth-century periodicals, indicating that a particular entry on "tea" was somewhat known among the late-Enlightenment literati. Tracking the beast as far as Google Books would let me, I finally stumbled over the biography mentioned above, I think through the word "transcription," and found the two facsimiles of a pre-Smith plan to transcribe the Diary.*
This research -- an exciting alternative history of a canonical story -- would not have been possible without Google Books or a comparable search engine and database of OCRed texts. So is Google good for history? Uh, hell yeah. That should be a given by now, folks.
Here's where the story get sticky, though. Thinking my work was done, I finished up the essay without ever consulting the physical book (don't judge me, we all do it), even took a screenshot of the facsimiles from the biography, now out of print, and dropped them in as figures for the essay. The time for permissions rolls around, and we realize the scans are too low resolution for publication. So I order the dusty 1904 tome be dragged up from Duke's storage facilities; open it up to scan the figures myself; and find this:
What I thought were scratches from the scanner, or -- honestly, I don't know what I thought they were; my intuitive curiosity as a literary historian and digital humanist failed me -- turned out to be full pages. The dunce that scanned the text for Google Books didn't bother to unfold the paper; and, since Google Books doesn't have any mechanism for indicating moving parts and fold-outs on their flattened scans, whatever was tucked between the folds was lost to the database.
I've talked about interactivity in the digital archive here before; this incident brought the issue home for me. Like all media, tools like Google Books inevitably (re-)frame our research, opening exciting new possibilities; but in doing so, other potentials are foreclosed. Beyond the dampening effect on research into the codex as a form, the digital archive's absences produce an image of "print culture" that slides frustratingly toward the very reductive models that many book historians have challenged in recent years. We need to start thinking seriously about what aspects of the book are elided by the screen; how a text's materiality is mediated by scans; and how the structure of databases disallow us from documenting these bookish anomalies.
Databases are themselves media structures, and the historical artifacts we read in and through them have to take this into account. Perhaps more importantly (and I'm saying this to myself, as much as anyone else), we need to learn to be better skeptics of our own resources, finding new methods for verifying our research when using digital scans. While ultimately this incident didn't put a dent in my argument, it will make me think twice next time a see an odd little scratch on Google Books.
*If you want the whole argument, you're going to have to read the book when it comes out this summer.