Wednesday 28 September 2016

follow-up to week 3

This week we began our deep dive into the nuts and bolts of XML and TEI encoding, while still keeping our eyes on the big questions of markup theory. As Sperberg-McQueen and others have argued (myself included, in one of our recommended readings) there are some big questions about markup and texts that can only be approached through the nuts and bolts. As we think about the big and the small together, our eyes will start to notice details in books and other artifacts that we didn't see before, and we begin to develop an encoder's-eye-view of texts and technologies alike. (Remember that William Blake poem Dean Duff quoted in our Orientation assembly!) For those who'd like a better look at our test-case, the 1609 version of Shakespeare's Sonnet 129, here's a link to a good digital facsimile available from the Folger Shakespeare Library: http://luna.folger.edu/luna/servlet/s/weo114.

We also considered punctuation, capitalization, spacing, and other taken-for-granted features of writing as a kind of markup we use every day (or everyday?). As we consider the future of the book, we'll consider how books and texts of all kinds depend (or don't) upon details like these. Literary texts are especially useful places to test the power of markup, given how small details can often make big differences in their interpretation, but the same is true of legal and policy documents. I alluded to the recent story of the "million-dollar comma" in a Canadian contract dispute, which you can read about here. I also referenced the variants in the text of the Second Amendment in the Bill of Rights, which was passed and then ratified by Congress in slightly different versions. Wikipedia has a good quick summary of the textual situation of the Second Amendment, with an image of one of the versions (courtesy of the National Archives and Records Administration) shown below:



https://upload.wikimedia.org/wikipedia/commons/1/18/SecondAmendentoftheUnitedStatesConstitution.jpg

For a good illustration of the power of markup, and the effect that a few millimetres of ink can have upon the world, see this New York Times piece about a recent U.S. Supreme Court decision that hinged upon a comma. With a U.S. presidential election underway, it will be interesting to keep these questions in mind when we hear arguments made about the Second Amendment and the intentions of the framers of the U.S. constitution. Do those arguments take into account not just the framers but also the scribes, who were the original text encoders of these documents. Are we reading "whatever the author wants," as Steve Jobs said of ebooks in the iPad rollout video? To what extent are the intentions of the creators of a collectively made text or artifact recoverable through the details of its construction? RBG and Antonin Scalia must have had some fascinating conversations about questions like these.

(Incidentally, by an interesting coincidence the Supreme Court and the Folger Shakespeare Library are kitty-corner to each other on Capitol Hill in Washington. I've always liked the idea of researchers in both buildings deliberating upon markup details like commas, both working in different realms but only a few hundred yards apart, and maybe eating lunch in the same food court.)

Lecture slides are available here and in the usual place and formats on BB. Please note that I'll be using the same sequence of slides for this week and next, and may post an updated version again next week.



We covered a fair amount of technical ground Monday, and the specific files we examined in class can be downloaded from BB. I should also mention a couple of pieces of software that may be useful as you work on the encoding challenge.

An XML file, like an HTML file, is simply a text file that you can open in any text editor, but which can also be recognized by a web browser or other XML-aware software. (If you are trying to open an XML file in something other than a browser, you may need to use your operating system's "Open with" command, usually found by right-clicking on the file, rather than by double-clicking the file icon itself.) For working with XML and other kinds of web documents, I find it helps to have an XML-aware text editor. A good simple freeware editor for the Mac is TextWrangler, and a good PC counterpart (though not freeware) is EditPlus. But there are lots of others out there, and some are reviewed in this LifeHacker post: http://lifehacker.com/five-best-text-editors-1564907215.

Another piece of software that's a step more advanced than these is the oXygen XML Editor, which is made specifically for working with XML and offers features such as well-formedness checks (which you'll need for your assignment) and validation (which you won't, but is worth knowing about anyway). oXygen is the most widely used XML editor in the digital humanities, has good cross-platform support. It takes some getting used to -- hint: to check well-formedness, look in the "validation" button's sub-menu -- but it's a great place to learn to write and edit XML (and it has a 30-day free trial period). In any case, whatever you do, don't work with XML in a word processor -- and definitely don't use Microsoft Word's "save as XML" function. Not to get too Yoda-esque, but text encoding (like hand-press printing) requires us to unlearn much of what we've learned from word processing.