Distant Reading: An Exam List

As a resource to future graduate students, I am sharing the reading list I compiled for my qualifying exam on Distant Reading. Below the list, you will find a user manual of sorts that explains the rationale for each of the selections.

My goal for posting is by no means to assert an authoritative list, but to offer a provisional set of principles and touchpoints for ongoing conversations in the field. I hope that many more such lists will eventually be posted, as the field matures and as new students and perspectives contribute to the project of Distant Reading.

Good luck on your exam!

Qualifying Exam List in Distant Reading

1. Intellectual & Institutional History

A. Literary Study
James Turner, Philology
John Guillory, Cultural Capital
René Wellek, The Rise of English Literary History
William K. Wimsatt & Cleanth Brooks, Literary Criticism: A Short History
Gerald Graff, Professing Literature
Gauri Viswanathan, Masks of Conquest

B. Statistics
Alain Desrosières, The Politics of Large Numbers
Ian Hacking, The Emergence of Probability
Stephen Stigler, The History of Statistics
Theodore M. Porter, Trust in Numbers
Margo J. Anderson, The American Census
Stephen Jay Gould, The Mismeasure of Man

C. Computer Science
Martin Campbell-Kelly et al, Computer
William Kneale & Martha Kneale, The Development of Logic
Michael R. Williams, A History of Computing Technology
Paul N. Edwards, A Vast Machine
JoAnne Yates, Control through Communication
Janet Abbate, Recoding Gender

2. Quantitative Literary Study

A. Human Sciences
L. A. Sherman, Analytics of Literature
Caroline Spurgeon, Shakespeare’s Imagery and What It Tells Us
Josephine Miles, Wordsworth and the Vocabulary of Emotion
Janice Radway, Reading the Romance
Franco Moretti*, Atlas of the European Novel, 1800-1900
—, “Conjectures on World Literature”

B. Stylometry
T. C. Mendenhall, “The Characteristic Curves of Composition”
G. K. Zipf, “Selected Studies of the Principle of Relative Frequency in Language”
G. Udny Yule, The Statistical Study of Literary Vocabulary (Ch 1-3)
Frederick Mosteller & David L. Wallace, “Inference and Disputed Authorship: The Federalist”
Stanley Fish, “What is Stylistics, and Why Are They Saying Such Terrible Things about It?”
Mark Olsen, “Signs, Symbols, and Discourses”
J. F. Burrows, “Not Unless You Ask Nicely”

C. Humanities Computing
Roberto Busa, Varia Specimina Concordantiarum
Jacob Leed (ed.), The Computer & Literary Style
Henry Kučera & W. Nelson Francis, Brown Corpus
Susan Hockey, Oxford Concordance Program
Michael Sperberg-McQueen & Lou Burnard, TEI P3
Jerome McGann, The Rossetti Archive
Susan Schreibman et al (eds.), A Companion to Digital Humanities (2004)
Andrew McCallum, MALLET
Roy Rosenzweig & Tom Scheinfeldt, Omeka
NEH, Office of Digital Humanities
CIC & UC Libraries, HathiTrust Digital Library
Matthew K. Gold (ed.), Debates in the Digital Humanities (2012; online)

* Note
As a community, we are reckoning with Moretti’s influential role for Distant Reading alongside the revelation of sexual assault against a graduate student. Personally, I am uneasy about including his work here. Lauren F. Klein has called on us to imagine “Distant Reading after Moretti,” and if this reading list makes any contribution to our collective imagination, it is to show that the field of Distant Reading has a history that long precedes him as well. To be up front, there are several other problematic texts and figures in the reading list, but I have also found that they are met at every turn by scholars whose goal is justice. We must commit ourselves to the same.

A Resource

As mentioned above, I’m sharing this list as a resource for graduate students preparing to do research in Distant Reading. Broadly speaking, Distant Reading is a body of scholarship that shares a general goal to produce new interpretive knowledge about literature and culture through measurement and computation. It is necessarily interdisciplinary. In that sense, I see Distant Reading as a branch of the Data Science movement in the academy, and I hope that this resource will be useful to students in a variety of departments.

If there is an argument or polemic in the list, it is this: for Distant Reading to succeed as a research program, it is not enough to simply use statistics and computing to answer conventional literary questions. There must be a reciprocal move, in which literary study shows that it too is an essential part of Data Science, just as much as statistical modeling and computer engineering. The structure of this reading list is designed to make both moves possible.

[Inside Baseball: Although I am not explicitly laying out the relationship between Distant Reading and Digital Humanities, suffice it to say that I think it is a close one. The polemical stance taken above is deeply informed by discussions about Humanities Computing and New Media Studies in the 2000s, and I understand Digital Humanities to have directly contributed to Distant Reading scholarship. These ideas have implicitly guided my discussion below, as well as the selections in the reading list.]

The Classics

The list is broken into two parts. The second part is easier to explain since it contains the “classics” of quantitative literary study. They are the common touchpoints across a range of conversations that enable newcomers to participate and contribute. Having been a graduate student for a while, I can say that the inverse is also true; being unfamiliar with these texts makes it hard to participate in current conversations.

To my eye, there are three major branches of quantitative literary study, as it has been practiced historically: stylometry, humanities computing, and scholarship that approaches literature as a human science. In brief, stylometry produces statistical measurements from literary and linguistic texts; humanities computing works on the logical formalisms that organize language and text; literature-as-human-science tries to systematize knowledge about an entire discourse, such as an author’s oeuvre, a genre, or a period.

The three branches are not entirely separate; they overlap at various points in their histories. For example, Josephine Miles makes important contributions to two or all three branches, depending on how you count them. However, I have found it useful to trace each conversation individually, and afterwards to identify points of contact.

Disciplinary History

The first part of the exam list is intended to historicize quantitative literary study. How did literary study take its current shape? What made quantitative methods feel timely at a few key historical moments, including the present? What are the shared intellectual roots between aesthetics and computation? (I’m looking at you, Aristotle.) Answering these questions means tracing the intellectual and institutional histories of literary study, as well as computer science and statistics.

Each discipline is considered individually but in a way that should draw out their parallels. There are six texts listed for each discipline, in the following order:

an historical overview of the discipline,
three aspects of its domain knowledge,
an instance of its institutionalization, and
a critique of power in its institutions.

Again, I have attempted to select “classics” in each field, to facilitate participation in their respective conversations.

For example, the standard text Computer, whose subtitle reads A History of the Information Machine, offers a general account of just that. Getting into the weeds, some of the animating tensions in Computer Science come from its roots in several prior disciplines: mathematics, engineering, and physical science. Each of their contributions is considered in turn in the texts The Development of Logic, A History of Computing Technology, and A Vast Machine. As an historical formation, computers were institutionalized first as part of business practices, described in Control Through Communication. However, sexism is a well-known problem in computing culture. The process by which computing became “coded” as masculine, thereby putting up barriers to women’s participation, is recounted in Recoding Gender.

Data Science

Putting both halves of the list together shows my understanding of Distant Reading as part of the larger program of Data Science. In the literature, Data Science is generally framed as the application of data management (Computer Science) and analysis (Statistics) to problems in a given domain (in this case, Literary Study). This framing more-or-less corresponds to the scholarship found in the second half of the list.

But the relationship of method and problem domain can be reversed to useful effect as well. Telling histories of Computer Science and Statistics recasts their central problems as ones in the written record. They become domain problems to which the methods of the Humanities are applied. This reversal of perspective corresponds to the readings in first half of the exam list.

The virtues of the “bi-directional” approach manifest at two levels. Institutionally: if Literary Study is to participate in Data Science as full partners, then we will need to express our concerns in the language of Computer Science and Statistics, and vice-versa. Mutual intelligibility is a minimum requirement. Intellectually: if Distant Reading is to draw from the full resources of both Humanities and Data Science, then it must be articulated from both sides of the divide; each approach supplements the other.

Caveats & Parameters

To be sure, the list in its current form is organized around my own research needs. For one, it emphasizes American cultural study. If the exam were focused on British computing culture, it would be appropriate to switch out JoAnne Yates’s Control Through Communication with Jon Agar’s The Government Machine. The other obvious priority for the list is literary study. A student of Art History, for example, will hopefully find it easy to slot out the history of literary study wherever it appears. The list is designed as a series of modules to facilitate this kind of replacement.

Why is the exam list so short? This is the “theory” section of my full list, which includes American literature and criticism. If you would like additional resources for your own exam, I suggest starting with the UCSB English Department’s exam list in Literature and Theory of Technology. The department requires two rounds of exams, and I took that list during my first round. It offers a theoretical complement to this largely historical list.

It is also worth being explicit about the historical orientation. In both parts, I have constrained readings to those published before 2012 (with the exception of James Turner’s Philology). There are a few motives for this, the most important being to emphasize lines of historical development. There has been an explosion of scholarship in Distant Reading — and in the Digital Humanities generally — since 2012, so it is valuable to have a sense of its historicity. The restriction also guarantees that the list’s expiration date will not come too soon. As time passes, that restriction may appear less congenial.

A Request

I too dislike canons. It is a truism that Distant Reading needs to identify some common discursive touchpoints in order to sustain its conversation, and I have used my exam list to try to name some of them. But I am not certain that they will be useful for everybody. And, to sustain a conversation, more important than texts we agree about are the texts we disagree about! I hope that this list will be the beginning of a conversation rather than its premature closure.

To any student planning your own qualifying exam: please take the useful parts and throw away the rest!

To students who have already taken your exams, I have a request: please share them. I’d love to see what we think Distant Reading is.

Teddy Roland