When the Director of Research for Google compares one of the most highly regarded linguists of all time to Bill O’Reilly, you know it is on. Recently, Peter Norvig, Google’s Director of Research and co-author of the most popular artificial intelligence textbook in the world, wrote a webpage extensively criticizing Noam Chomsky, arguably the most influential linguist in the world. Their disagreement points to a revolution in artificial intelligence that, like many revolutions, threatens to destroy as much as it improves. Chomsky, one of the old guard, wishes for an elegant theory of intelligence and language that looks past human fallibility to try to see simple structure underneath. Norvig, meanwhile, represents the new philosophy: truth by statistics, and simplicity be damned. Disillusioned with simple models, or even Chomsky’s relatively complex models, Norvig has of late been arguing that with enough data, attempting to fit any simple model at all is pointless. The disagreement between the two men points to how the rise of the Internet poses the same challenge to artificial intelligence that it has to human intelligence: why learn anything when you can look it up?
Chomsky started the current argument with some remarks made at a symposium commemorating MIT’s 150th birthday. According to MIT’s Technology Review,
Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. “That’s a notion of [scientific] success that’s very novel. I don’t know of anything like it in the history of science,” said Chomsky.
To frame Chomsky’s position as scientific elegance versus complexity is not quite fair, because Chomsky’s theories have themselves become more and more complex over the years to account for all the variations in human language. Chomsky hypothesized that humans biologically know how to use language, besides just a few parameters that need to be set. But the number of parameters in his theory continued to multiply, never quite catching up to the number of exceptions, until it was no longer clear that Chomsky’s theories were elegant anymore. In fact, one could argue that the state of Chomskyan linguistics is like the state of astronomy circa Copernicus: it wasn’t that the geocentric model didn’t work, but the theory required so many additional orbits-within-orbits that people were finally willing to accept a different way of doing things. AI endeavored for a long time to work with elegant logical representations of language, and it just proved impossible to enumerate all the rules, or pretend that humans consistently followed them. Norvig points out that basically all successful language-related AI programs now use statistical reasoning (including IBM’s Watson, which I wrote about here previously).
But Norvig is now arguing for an extreme pendulum swing in the other direction, one which is in some ways simpler, and in others, ridiculously more complex. Current speech recognition, machine translation, and other modern AI technologies typically use a model of language that would make Chomskyan linguists cry: for any sequence of words, there is some probability that it will occur in the English language, which we can measure by counting how often its parts appear on the internet. Forget nouns and verbs, rules of conjugation, and so on: deep parsing and logic are the failed techs of yesteryear. In their place is the assumption that, with enough data from the internet, you can reason statistically about what the next word in a sentence will be, right down to its conjugation, without necessarily knowing any grammatical rules or word meanings at all. The limited understanding employed in this approach is why machine translation occasionally delivers amusingly bad results. But the Google approach to this problem is not to develop a more sophisticated understanding of language; it is to try to get more data, and build bigger lookup tables. Perhaps somewhere on the internet, somebody has said exactly what you are saying right now, and all we need to do is go find it. AIs attempting to use language in this way are like elementary school children googling the answers to their math homework: they might find the answer, but one can’t help but feel it doesn’t serve them well in the long term.
In his essay, Norvig argues that there are ways of doing statistical reasoning that are more sophisticated than looking at just the previous one or two words, even if they aren’t applied as often in practice. But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. “Tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” O’Reilly once said, apparently unsatisfied with physics as an explanation for anything. But is Chomsky’s dismissal of statistical approaches really as bad as O’Reilly’s dismissal of physics in general?
I’ve been a Peter Norvig fan ever since I saw his talk he gave to the Singularity Institute patiently explaining why the Singularity is bunk, a position that most AI researchers believe but somehow haven’t effectively communicated to the popular media. So I found similar joy in Norvig’s dissection of Chomsky’s famous “colorless green ideas sleep furiously” sentence, providing citations to counter Chomsky’s claim that its parts had never been spoken before. But I can’t help but feel that an indifference to elegance and understanding is a shift in the scientific enterprise, as Chomsky claims.
“Everything should be simple as possible, but no simpler,” Einstein once said, echoing William of Ockham’s centuries-old advice to scientists that entities should not be multiplied beyond necessity. The history of science is full of oversimplifications that turn out to be wrong: Kepler was right on the money with his Laws of Motion, but completely off-base in positing that the planets were nested in Platonic solids. Both models were motivated by Kepler’s desire to find harmony and simplicity hidden in complexity and chaos; in that sense, even his false steps were progress. In an age where petabytes of information can be stored cheaply, is an emphasis on brevity and simplicity an anachronism? If the solar system’s structure were open for debate today, AI algorithms could successfully predict the planets’ motion without ever discovering Kepler’s laws, and Google could just store all the recorded positions of the stars and planets in a giant database. But science seems to be about more than the accumulation of facts and the production of predictions.
What seems to be a debate about linguistics and AI is actually a debate about the future of knowledge and science. Is human understanding necessary for making successful predictions? If the answer is “no,” and the best way to make predictions is by churning mountains of data through powerful algorithms, the role of the scientist may fundamentally change forever. But I suspect that the faith of Kepler and Einstein in the elegance of the universe will be vindicated in language and intelligence as well; and if not, we at least have to try.
Kevin Gold is an Assistant Professor in the Department of Interactive Games and Media at RIT. He received his Ph.D. in Computer Science from Yale University in 2008, and his B.A. from Harvard in 2001. When he is not thinking up new ideas for his research, he enjoys reading really good novels, playing geeky games, listening to funny, clever music, and reading the webcomics xkcd and Dresden Codak.