Sleuthing for Shakespeare by computer

By WARD E. Y. ELLIOTT and ROBERT J. VALENZA  |  June 25, 2002 at 4:55 PM
share with facebook
share with twitter

In 1995, Vassar Shakespeare scholar Donald Foster made an electrifying announcement. Using new, computer-aided analysis, he conclusively identified a dreary, pious, 1612 "Funeral Elegy" as the work of William Shakespeare.

His discovery was front-page news. Celebratory conferences, speeches, articles and books followed. American Shakespeare scholars praised his "flawless methodology." They swallowed doubts that Shakespeare could have written such a bad poem. They hailed the new addition to the Canon. All three 1997 "American Complete Works of Shakespeare" editions included the elegy as at least "possibly Shakespeare."

There were a few skeptics. Most British and European scholars believed their ears, not Foster's computer. Two of them, Brian Vickers and Gilles Monsarrat, thought the elegy sounded much more like John Ford (c. 1586-1640) than like Shakespeare. A team of Claremont College students, advised by us, also used computers, but found that ours disagreed with Foster's. Ours rejected the elegy in 16 of 33 Shakespeare tests, but in only one Ford test. The odds of such an outcome are trillions of times worse for Shakespeare than for Ford.

Foster, however, was seldom indulgent of those who disagreed with him. The European critics' arguments, he declared, were "an entertainment," "mere fiction." The American ones were "idiocy" and "foul vapor." Publishing their "methodological madness" would give a black eye to all computer-aided analysis.

He went on to become the most celebrated of literary sleuths. He made further successful, high-profile attributions, among them identifying Joe Klein as the author of "Primary Colors." He appeared on talk shows, wrote a book, offered his services in the Unabomber and Jon Benét Ramsey cases. He became the only forensic linguist in the world regularly consulted by the FBI and the only one regularly referred to as "The Sherlock Holmes of literary attribution."

At one point he boasted to his literary agent, "In the 10 years I have done textual analysis, I've never made a mistaken attribution." Many were convinced that this was so.

Then, on June 12, (joined by Richard Abrams, his Watson) the master sleuth shocked his public with an announcement as startling as his first one. Writing to the leading Shakespeare Web discussion group, Foster conceded that he had changed his mind about the elegy. The European critics were right after all. He had read Monsarrat's article. He had not read Vickers' forthcoming book, "Counterfeiting Shakespeare," to be published by Cambridge University Press in August -- the same Vickers, mind you, whose earlier arguments he had scorned as "mere fiction" -- but he had heard it was persuasive, and this time it was enough. No further discussion was needed. The elegy was plainly written by Ford, not Shakespeare.

He graciously saluted Monsarrat -- "I know good evidence when I see it." He voiced his pleasure at being corrected -- "No one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar." And, while maintaining that the other previous criticisms were still "mistakes," he did wonder "where I went wrong with the statistical evidence."

Many have already addressed this last question. Some say that the master sleuth himself has given statistical analysis a black eye and that everyone should trust computers less, and their ears more. "Who was the Bard?" asked a New York Times headline. "Don't Ask a Computer." Shouldn't he have counted less and read more?

Or could it be that he didn't count enough? Some say he should have analyzed more than the 80,000 lines of verse on which his first book was based, given the millions of words of computer-accessible texts now available. Others blame the American professors, editors and publishers for following him too blindly on a wild goose chase. Yet others say the whole authorship question is a delusion and should be thrown in the trash, along with the computers.

Certainly, Foster's methods were not flawless. His favorite Shakespeare stylistic "thumbprints" -- incongruous who's ("book who"); redundant comparatives and superlatives ("more better," "most unkindest"); noun-plus-noun doublets ("grace and strength"); and hendiadys ("cups and gold") have all turned out to be equally or more abundant in Ford. Hence, they are useless or worse for distinguishing him from Shakespeare.

Foster did not control consistently for stylistic changes over time. He never controlled at all for sample size, a serious problem because large samples average out more variance than small ones. Smaller samples need wider profiles. He counted common prepositions that supported his Shakespeare ascription but didn't count those that did not. He often treated his own repunctuation of the elegy as if it had created a reliable Shakespeare identifier. It hadn't.

Foster also relied too heavily on positive-evidence "smoking guns" and too lightly on negative-evidence "silver bullets." Fitting the size 4 slipper doesn't prove you are Cinderella, only that you could be. You could also be Little Miss Muffet or Tiny Tim. But if you are a size 8, it is strong evidence that you could not be Cinderella.

True "smoking guns" and "thumbprints" are rare in literary-evidence cases. Most such indicators are less like thumbprints than like shoe sizes, which can easily disprove your identity but not easily prove it.

Does this mean that authorship itself is a delusion or that we should stop using computers to test for it? Hardly. Authorship does matter if it's Shakespeare. Years of dogged research like Foster's are not a waste of time, even if you make a few mistakes.

If authorship matters, then computers matter. You can see things with computers that you cannot see with the naked eye. They can do dozens of necessary tasks hundreds of times faster and more accurately than you could do them by hand; they can tease out otherwise obscure patterns. You should no more search a big stack of texts for common authorship without a computer than you should dine without a fork.

But it's certainly true the fork won't guarantee you against mistakes, any more than the knives and spoons of more traditional literary analysis. New-style analysis or old, finding and correcting your mistakes is still the name of the game. Computers should not be used to the exclusion of reading, but neither should reading be used to the exclusion of computing.

What went wrong with the elegy ascription? It's not that the master sleuth cared whether it was Shakespeare, nor that he tried to test it with computers. What went wrong is that he tried to test it in many wrong ways and badly overclaimed his results. And, yes, his followers probably were too credulous. The false proof produced a false thrill of discovery and a long, contentious debate -- which, however, was often instructive. We would not have had Monsarrat's, Vickers', or our own later findings without it.

Foster's retraction, we hope, will make a timely, fitting, and welcome conclusion to all of these excursions. What it should not do is end literary sleuthing, whether old-style or new. Who was the Bard? It's still OK to ask, and it's still OK to ask the computer.

(Ward E. Y. Elliott and Robert J. Valenza are professors of government and mathematics, respectively, at Claremont McKenna College in Claremont, Calif.)

Related UPI Stories
Trending Stories