The DNA of Shakespeare’s Works

Shakespeare died 400 years ago this week, but we’re still getting to know him. And, thanks to UCLA’s Center for Digital Humanities, I think we can now read the DNA of his plays in a way that reveals something fundamental about how his authorial mind worked.

This started with a tiny question: which secondary character speaks a little two-line speech which no one onstage seems to pay much attention to, in the middle of Romeo and Juliet? After another fatal street brawl between the feuding clans, Verona’s Prince asks who, if anyone, should be punished. Seizing the opportunity to protect his son, Romeo Montague’s father answers, ‘Not Romeo, Prince, he was Mercutio’s friend. / His fault concludes but what the law should end, / The life of Tybalt’ (3.1.184-8). Or at least that’s who seizes the opportunity unless you happen to be reading the 1963 Signet paperback of the play, which – unlike pretty much every other edition I can find, and I’ve scoured centuries of them, and every film version of the play – gives that answer to Juliet’s father – Lord Capulet – instead.

The only major figure who steadfastly disagreed with this consensus was apparently William Shakespeare, because in all the early editions of the play (except the controversial First Quarto, which omits this speech entirely), including the famous First Folio, the speech-heading gives Capulet the lines:

Text snippet from Shakespeare, "Romeo and Juliet", first folio
Shakespeare’s “Romeo and Juliet”, First Folio. Courtesy of Folger Shakespeare Library.

The current leading scholarly editions pause only long enough to dismiss the original speech-heading as ‘an obvious mistake’ or ‘an obvious error.’ And so it may seem. With the play so focused on the binary family feud, and that feud at a fatal height at this instant, one would hardly expect the head of the Capulet clan to plead on behalf of the Montague lad who has bereaved them.

But Shakespeare’s play is hardly on the side of the feud, and I suspect that Shakespeare – characteristically – invites us to attach ourselves to simplistic, formulaic assumptions, and then exposes their costs and limitations. It offers us neat binary oppositions, only to teach us that a seemingly opposed two can converge toward one, and thereby liberate an almost infinite set of possibilities. In other words, it becomes a love story.

Why have editors and directors been so quick to conclude that Shakespeare and/or his printers had this wrong? That Lord Capulet would defend Romeo against Tybalt is otherwise hardly implausible. Capulet had earlier praised Romeo while trying to restrain hot-headed Tybalt from attacking him. My theory is that Shakespeare gambled by signaling so compelling a binary logic on so many levels of the play that cognitive-dissonance mechanisms block out the alternative he wanted us to glimpse. The trick for me was how to detect and describe those signals.

The key question the play keeps raising, in many areas and at many levels of scale, is whether symmetry can be converted into synthesis. Do the pairs merge in some expansive way, or do they instead remain separate (and hence double), or reflexively antithetical (and hence cancelling each other into a zero)? The most persistent and pervasive theme of the play could be described as an effort – almost alchemical – to use erotic heat to reconcile bluntly opposed and thus mutually exclusive elements into a mysterious compound that converts two into a one that amounts to more than two ones could be, with a goal of immortality. That this is also the usual structure of sexual reproduction is probably more than a mere coincidence.

A happier Verona could easily have happened (and a better world as well, according to deconstructionist readings of race, class, gender, etc.), if people stopped granting dominion to the binaries. The witness for Romeo’s defense should of course be his father, Lord Montague: symmetry demands it. But symmetry demands lots of things that creative artists – and humane societies – are rightly reluctant to grant.

The cellular structure of Romeo and Juliet – the fibrous material with which it is built – keeps telling us people that the world consists of things that are either echo or chiasmus, but never merger. It seemed to me there was more doubling of words and phrases in Romeo and Juliet than in any other Shakespeare play I could think of. As a quick sample, I studied the use of ‘Alack,’ and discovered that it is doubled to “Alack, alack” in 3 of 8 uses in this tragedy, but only 6 of 73 uses elsewhere in Shakespeare’s works. Fortunately, it’s become feasible to evaluate that kind of impression more extensively, so I sought the help of Dave Shepherd at HumTech, who developed a Python-based program to calculate how many doubles – words or phrases spoken twice, and no more than twice, in immediate succession – each play contained.

The results gratifyingly verified my impression. Romeo and Juliet is the only Shakespeare play where such pairs constitute more than 1% of the word-count (it also led all the rest when we re-tested counting doubles separated by no more than one word, where it rises to nearly 2%, and also across any two lines). The peak is the brief scene of preparations for Juliet’s forced wedding to Paris, with over 3% simple pairings; no other scene reaches even 2%. By useful contrast, the final scene of the play virtually erases this mode of unreconciled pairs. Every other scene in the play runs over 0.1%, but that scene plummets to 0.04%. What we hear – aptly if subliminally, and most clearly in the language of the lovers themselves – is the demise of the play’s burdensome two-ness.

The other statistical trials I attempted suggest that the pattern is pervasive and artfully configured. In lines containing antonyms, Romeo and Juliet outpaced all thirty-five other Shakespeare plays tested (415, with Cymbeline a distant second at 364).

2016apr shakesp chart1

A similar dominance emerges in lines containing immediate doubles followed by an antonymic line: Romeo and Juliet again leads, with 12 such sequences; Troilus and Cressida, another play about politically awkward pair-bonding, is next, with 9:

2016apr shakesp chart2

If we cease to demand that the doublings be immediate, the results are no less compelling. Romeo has the most repeated words within single lines and within pairs of lines, and the most decisive lead in any of my half-dozen trials is in doubles with antonyms in same line: Romeo and Juliet has 59, more than three times the median, and no other Shakespeare play has more than 34:

2016apr shakesp chart3

These trials excluded ‘shall’, ‘thou’, ‘thee’, ‘thine’, ‘thy’, ‘hast’, and ‘’tis’, as well as the standard Natural Language Toolkit (NLTK) list of the 127 most common words in modern English. We searched for antonyms using the NLTK interface with the WordNet corpus produced by Princeton University, which tags words with semantic data, including their antonyms. Those lists are based on modern rather than Elizabethan usage, but enough similarities remain to produce a meaningful estimate of Shakespeare’s use of oppositional forms.

Romeo and Juliet keeps tantalizing us, in large ways and small, with near misses: with anticlimaxes and failed mergers. Hidden in the line stolen from Lord Capulet is another reality that could have been. But editors and directors have taken the words right out of his mouth. This shows how the textual history of a play can limit interpretation, and how studying that history can liberate interpretation. This example also offers a metacritical refutation of the binary that sets old-fashioned close reading and textual editing against new-fangled distant reading, thematic analysis, and critique of socio-political dysfunctions.

So it’s a small point, but with big implications. And if my recent publications on this can convince the world to fix one error in a major Shakespeare play, it is probably the most important and lasting thing I’ll ever accomplish as a scholar.

Now I’m working on Shakespeare’s Coriolanus, where Python-based analysis performed for me by Craig Messner (with whom I had worked, while he was on staff at HumTech, on a guide for students about how to use data-mining functions in literary criticism) demonstrates that the tragedy’s fundamental tension between individualist and communitarian ideals is subliminally reinforced by an unmatched rate of prefixes based on the Latin “cum-“ (meaning “with” or “together,” as in “common,” “community,” “cooperate,” etc.). Such evidence seems especially admissible in a play where Menenius puns on the suffixes of names of the Tribunes, Sicinius Velutus and Junius Brutus: “I find the ass in compound with the major part of your syllables” (2.1.56-57).

We derived a list of words from Shakespeare concordances and trimmed it to include only those whose prefixes were plausibly derived from the Latin “cum-“ root. We used the texts at, which are already marked up in XML and thus made it easy to omit stage directions, character lists, speech headings, and other metafeatures. We then ran the text of plays believed to be entirely or almost entirely by Shakespeare through a Python algorithm Craig developed for this purpose, and finally converted the results to bar-graphs using the Python plotting library called matplotlib.

Coriolanus contains far more of these co-/com-/col-/cor- words than any other Shakespeare play. When the sample is reduced to only those words that actually refer to mixtures or human interactions, Coriolanus stands out starkly atop the list, with 71 instances where the other plays average only 30, and the highest percentage as well (see

2016apr shakesp chart4

Furthermore, the cumulative subliminal effect of these instances would have been strongly augmented by the thirty-four namings of Coriolanus, seventeen of Corioles, and eighteen of Cominius, as well as ten of Shakespeare’s thirty-five uses of “corn,” the sharing of which is the initial main topic of the plot. These are not included in my statistics because they don’t share the cum- etymology, but – via a process neuroscientists call “priming” – they would still have contributed to an audience’s sense of relentless pressure toward mingling.

I just presented the Coriolanus analysis in New Orleans at the annual meeting of Shakespeareans in this country, and have been invited to do so again at the quadrennial meeting of the World Shakespeare Congress, in London and Stratford this summer. It will also be appearing in Shakespeare Survey, which is the chief British journal for the study of their greatest writer.

So, with the crucial assistance of HumTech staff, we’re discovering that the artistry of that supreme writer is even more elaborately and pervasively artful that could be shown in the previous 399 years since his death. Long live Shakespeare!

Images of Shakespeare texts courtesy of Folger Shakespeare Library. Licensed under CC BY-SA 4.0 terms.

Robert N. Watson is Distinguished Professor of English at UCLA, and recently finished his term as Neikirk Chair for Educational Innovation. In addition to miscellaneous projects like the ones described here, he is working on books about the interplay of human and other life in Renaissance, and about the role of the arts and humanities in cultural evolution.