Wednesday, 28 September 2016

follow-up to week 3

This week we began our deep dive into the nuts and bolts of XML and TEI encoding, while still keeping our eyes on the big questions of markup theory. As Sperberg-McQueen and others have argued (myself included, in one of our recommended readings) there are some big questions about markup and texts that can only be approached through the nuts and bolts. As we think about the big and the small together, our eyes will start to notice details in books and other artifacts that we didn't see before, and we begin to develop an encoder's-eye-view of texts and technologies alike. (Remember that William Blake poem Dean Duff quoted in our Orientation assembly!) For those who'd like a better look at our test-case, the 1609 version of Shakespeare's Sonnet 129, here's a link to a good digital facsimile available from the Folger Shakespeare Library:

We also considered punctuation, capitalization, spacing, and other taken-for-granted features of writing as a kind of markup we use every day (or everyday?). As we consider the future of the book, we'll consider how books and texts of all kinds depend (or don't) upon details like these. Literary texts are especially useful places to test the power of markup, given how small details can often make big differences in their interpretation, but the same is true of legal and policy documents. I alluded to the recent story of the "million-dollar comma" in a Canadian contract dispute, which you can read about here. I also referenced the variants in the text of the Second Amendment in the Bill of Rights, which was passed and then ratified by Congress in slightly different versions. Wikipedia has a good quick summary of the textual situation of the Second Amendment, with an image of one of the versions (courtesy of the National Archives and Records Administration) shown below:

For a good illustration of the power of markup, and the effect that a few millimetres of ink can have upon the world, see this New York Times piece about a recent U.S. Supreme Court decision that hinged upon a comma. With a U.S. presidential election underway, it will be interesting to keep these questions in mind when we hear arguments made about the Second Amendment and the intentions of the framers of the U.S. constitution. Do those arguments take into account not just the framers but also the scribes, who were the original text encoders of these documents. Are we reading "whatever the author wants," as Steve Jobs said of ebooks in the iPad rollout video? To what extent are the intentions of the creators of a collectively made text or artifact recoverable through the details of its construction? RBG and Antonin Scalia must have had some fascinating conversations about questions like these.

(Incidentally, by an interesting coincidence the Supreme Court and the Folger Shakespeare Library are kitty-corner to each other on Capitol Hill in Washington. I've always liked the idea of researchers in both buildings deliberating upon markup details like commas, both working in different realms but only a few hundred yards apart, and maybe eating lunch in the same food court.)

Lecture slides are available here and in the usual place and formats on BB. Please note that I'll be using the same sequence of slides for this week and next, and may post an updated version again next week.

We covered a fair amount of technical ground Monday, and the specific files we examined in class can be downloaded from BB. I should also mention a couple of pieces of software that may be useful as you work on the encoding challenge.

An XML file, like an HTML file, is simply a text file that you can open in any text editor, but which can also be recognized by a web browser or other XML-aware software. (If you are trying to open an XML file in something other than a browser, you may need to use your operating system's "Open with" command, usually found by right-clicking on the file, rather than by double-clicking the file icon itself.) For working with XML and other kinds of web documents, I find it helps to have an XML-aware text editor. A good simple freeware editor for the Mac is TextWrangler, and a good PC counterpart (though not freeware) is EditPlus. But there are lots of others out there, and some are reviewed in this LifeHacker post:

Another piece of software that's a step more advanced than these is the oXygen XML Editor, which is made specifically for working with XML and offers features such as well-formedness checks (which you'll need for your assignment) and validation (which you won't, but is worth knowing about anyway). oXygen is the most widely used XML editor in the digital humanities, has good cross-platform support. It takes some getting used to -- hint: to check well-formedness, look in the "validation" button's sub-menu -- but it's a great place to learn to write and edit XML (and it has a 30-day free trial period). In any case, whatever you do, don't work with XML in a word processor -- and definitely don't use Microsoft Word's "save as XML" function. Not to get too Yoda-esque, but text encoding (like hand-press printing) requires us to unlearn much of what we've learned from word processing.

Thursday, 22 September 2016

blogging question #1: on representation

Our upcoming sequence of classes on XML and TEI will lead us into the topic of using digital technologies to create representations of existing artifacts (like digitized books), as distinct from born-digital artifacts like video games and hypertext novels. This week's blogging question is designed to get us thinking about representation, digital technologies, and what's at stake in their relationship.

At the beginning of one of our readings for this coming week, Michael Sperberg-McQueen starts with a counter-intuitive claim: "Texts cannot be put into computers. Neither can numbers. ... What computers process are representations of data" (p. 34). This helpful reminder serves to point out the paradox of the term digitization: when we say we're digitizing a book, we're not actually doing anything to the original book (usually; there are exceptions). What we're really doing when we digitize is to create a new, digital representation of the original. Yet the English word digitization, and its grammatical form of an action (making digital) performed on an object (something not digital), can lead us to forget the act of representation that underlies all digitization.

Why is this important? Well, Sperberg-McQueen's answer is that "representations are inevitably partial, never distinterested; inevitably they reveal their authors' conscious and unconscious judgments and biases. Representations obscure what they do not reveal, and without them nothing can be revealed at all" (p. 34). This line of argument leads to a deceptively simple consequence for everyone involved in digitization: "In designing representations of texts inside computers, one must seek to reveal what is relevant, and obscure only what one thinks is negligible" (p. 34). All digitizations, being representations, are choices -- so we'd better learn how to make good ones. That's why Mats Dahlström and his co-authors make a distinction between mass digitization and critical digitization in one of our upcoming recommended readings.

This week's blogging question starts by asking you to find an example that helps us think critically about digitization. Can you think of some specific instance of digitization -- it could be anything: an image, an ebook, digital music, you name it -- where an originally non-digital object or artifact, broadly defined, has been digitized in ways that reveal interesting (or controversial, or funny, or illuminating) representational choices. I'm not asking for examples simply of digitization getting something wrong, as fun as those may be. Rather, I'm asking you to unpack examples where a choice made in digital representation illuminates some quality of the original thing that we might otherwise take for granted, or some revealing aspect of digitization itself -- or possibly both. Your example might arise from digitization gone wrong somehow, but I'd like us to look beyond basic error-identification for this question.

The next question, then, is this: what does the error—or simply the choice—in representation teach us about the original or about the act of representation itself? 

Digitized books are good places to explore this question, but you could draw on other kinds of media and other kinds of texts (in D.F. McKenzie's broad sense of the word text; see our recommended reading from last week titled "The Broken Phiall: Non-Book Texts."). For example, if you bought the Beatles record Sgt. Pepper's Lonely Hearts Club Band on vinyl LP when it was first released in 1967, you'd experience it in at least a couple of different ways than if you bought it on iTunes today, or on CD in 1995. For one, an LP listener would need to flip the record over partway through, which may or may not give the impression of the whole album being divided into a 2-part thematic structure: some bands exploited this imposed division of records into Sides 1 & 2, but not all did. More to the point, an LP listener reaching the very end of the record, in which the song "A Day in the Life" ends on a long E-major chord that would just keep on resonating in a continuous loop until one lifted the needle from the record's run-out groove. A CD track or MP3 file can't (or simply doesn't) do this. What is the representational choice here, and why does it matter? I'd offer the answer that the original design of the Sgt. Pepper LP involves the listener bodily in the music, in that "A Day in the Life" only ends when you chose to lean over and stop the record. That effect is lost in the digitized version of the album -- or is it replaced by different effect that influences how we'd interpret the song? (I like to imagine that somewhere in the great beyond David Bowie and Prince are having this conversation with John Lennon and George Harrison, while Jimi Hendrix and Lemmy are playing air-hockey nearby...)

This might not seem to have much to do with books, but being able to unpack this kind of representational choice, in which form and meaning become intertwined, is exactly what bibliographers and other textual scholars do -- not to mention text encoders who are concerned with critical digitization, not just mass digitization. Your example need not be as involved as the one I've spun out above: the point is to get us thinking about how representation works, and what's at stake.


Dahlström, Mats, Joacim Hansson, Ulrika Kjellman. "'As We May Digitize' -- Institutions and Documents Reconfigured."Liber Quarterly 21.3-4 (2012): 455-74.

McKenzie, D.F. "The Broken Phiall: Non-Book Texts." In Bibliography and the Sociology of Texts, 31-54. Cambridge University Press, 1999. []

Sperberg-McQueen, C.M. "Text in the Electronic Age: Textual Study and Text Encoding, with Examples from Medieval Texts." Literary and Linguistic Computing 6, no. 1 (1991): 34-46. []

Wednesday, 21 September 2016

Follow up to weeks 1 and 2

Normally I'll post a follow-up each week, but this week's post will be a bit of an omnibus to get caught up. In our first class we considered Ramelli's book wheel, which you can read more about in the supplementary article I posted to the week 1 readings, titled "Reading the Book of Mozilla," which includes images of two versions of the device made after Ramelli's -- one being a really interesting Chinese adaptation made just a few decades after Ramelli. There's also a film version of The Three Musketeers in which the book wheel makes an appearance (Michael York's character obviously doesn't know what it is, but finds out how it works in a pretty funny pratfall). I also alluded to another image of a futuristic reading technology, as it was imagined in 1935 (which those of you in my Research Methods class will recognize):

This image came from an issue of Everyday Science and Mechanics, and was recently popularized in a story in Smithsonian Magazine. The U.S. patent filed for the device can be found here: . A tip of the hat to Matthew Wells for finding this.

In last week's class we discussed the domestication of new media, so to speak, including the 19th-century stereoscope (kind of like a Victorian Oculus Rift... kind of). Here's an advertisement for a stereoscope from 1856, which could make for an interesting comparison with the image above, and the Ramelli book wheel, in light of the themes many of you raised in class discussions.

I found this ad in a serially published version of Charles Dickens novel Little Dorrit, held in the Thomas Fisher Rare Book Library. Actually, to give credit where it's due, one of the Fisher librarians pointed it out to me, which shows why it's important to talk with the librarians when doing research in places like the Fisher. I was doing research on the use of Shakespeare in the introduction of new media (notice the Hamlet reference above the image), and ended up writing about it in the 4th chapter of this book (, which deals with the photographic prehistory of digitization.

Downloadable lecture slides for weeks 1-2 are now posted on Blackboard, and you can view an embedded version here:

I'm just polishing up our first blogging question, which will be posted here today or tomorrow.