Log In Using
Facebook
Twitter
Google

Your tor.com Acct
Tue
Jun 21 2011 4:05pm
Norvig vs. Chomsky and the Fight for the Future of AI

When the Director of Research for Google compares one of the most highly regarded linguists of all time to Bill O’Reilly, you know it is on. Recently, Peter Norvig, Google’s Director of Research and co-author of the most popular artificial intelligence textbook in the world, wrote a webpage extensively criticizing Noam Chomsky, arguably the most influential linguist in the world. Their disagreement points to a revolution in artificial intelligence that, like many revolutions, threatens to destroy as much as it improves. Chomsky, one of the old guard, wishes for an elegant theory of intelligence and language that looks past human fallibility to try to see simple structure underneath. Norvig, meanwhile, represents the new philosophy: truth by statistics, and simplicity be damned. Disillusioned with simple models, or even Chomsky’s relatively complex models, Norvig has of late been arguing that with enough data, attempting to fit any simple model at all is pointless. The disagreement between the two men points to how the rise of the Internet poses the same challenge to artificial intelligence that it has to human intelligence: why learn anything when you can look it up?

Chomsky started the current argument with some remarks made at a symposium commemorating MIT’s 150th birthday. According to MIT’s Technology Review,

Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. “That’s a notion of [scientific] success that’s very novel. I don’t know of anything like it in the history of science,” said Chomsky.

To frame Chomsky’s position as scientific elegance versus complexity is not quite fair, because Chomsky’s theories have themselves become more and more complex over the years to account for all the variations in human language. Chomsky hypothesized that humans biologically know how to use language, besides just a few parameters that need to be set. But the number of parameters in his theory continued to multiply, never quite catching up to the number of exceptions, until it was no longer clear that Chomsky’s theories were elegant anymore. In fact, one could argue that the state of Chomskyan linguistics is like the state of astronomy circa Copernicus: it wasn’t that the geocentric model didn’t work, but the theory required so many additional orbits-within-orbits that people were finally willing to accept a different way of doing things. AI endeavored for a long time to work with elegant logical representations of language, and it just proved impossible to enumerate all the rules, or pretend that humans consistently followed them. Norvig points out that basically all successful language-related AI programs now use statistical reasoning (including IBM’s Watson, which I wrote about here previously).

But Norvig is now arguing for an extreme pendulum swing in the other direction, one which is in some ways simpler, and in others, ridiculously more complex. Current speech recognition, machine translation, and other modern AI technologies typically use a model of language that would make Chomskyan linguists cry: for any sequence of words, there is some probability that it will occur in the English language, which we can measure by counting how often its parts appear on the internet. Forget nouns and verbs, rules of conjugation, and so on: deep parsing and logic are the failed techs of yesteryear. In their place is the assumption that, with enough data from the internet, you can reason statistically about what the next word in a sentence will be, right down to its conjugation, without necessarily knowing any grammatical rules or word meanings at all. The limited understanding employed in this approach is why machine translation occasionally delivers amusingly bad results. But the Google approach to this problem is not to develop a more sophisticated understanding of language; it is to try to get more data, and build bigger lookup tables. Perhaps somewhere on the internet, somebody has said exactly what you are saying right now, and all we need to do is go find it. AIs attempting to use language in this way are like elementary school children googling the answers to their math homework: they might find the answer, but one can’t help but feel it doesn’t serve them well in the long term.

In his essay, Norvig argues that there are ways of doing statistical reasoning that are more sophisticated than looking at just the previous one or two words, even if they aren’t applied as often in practice. But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. “Tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” O’Reilly once said, apparently unsatisfied with physics as an explanation for anything. But is Chomsky’s dismissal of statistical approaches really as bad as O’Reilly’s dismissal of physics in general?

I’ve been a Peter Norvig fan ever since I saw his talk he gave to the Singularity Institute patiently explaining why the Singularity is bunk, a position that most AI researchers believe but somehow haven’t effectively communicated to the popular media. So I found similar joy in Norvig’s dissection of Chomsky’s famous “colorless green ideas sleep furiously” sentence, providing citations to counter Chomsky’s claim that its parts had never been spoken before. But I can’t help but feel that an indifference to elegance and understanding is a shift in the scientific enterprise, as Chomsky claims.

“Everything should be simple as possible, but no simpler,” Einstein once said, echoing William of Ockham’s centuries-old advice to scientists that entities should not be multiplied beyond necessity. The history of science is full of oversimplifications that turn out to be wrong: Kepler was right on the money with his Laws of Motion, but completely off-base in positing that the planets were nested in Platonic solids. Both models were motivated by Kepler’s desire to find harmony and simplicity hidden in complexity and chaos; in that sense, even his false steps were progress. In an age where petabytes of information can be stored cheaply, is an emphasis on brevity and simplicity an anachronism? If the solar system’s structure were open for debate today, AI algorithms could successfully predict the planets’ motion without ever discovering Kepler’s laws, and Google could just store all the recorded positions of the stars and planets in a giant database. But science seems to be about more than the accumulation of facts and the production of predictions.

What seems to be a debate about linguistics and AI is actually a debate about the future of knowledge and science. Is human understanding necessary for making successful predictions? If the answer is “no,” and the best way to make predictions is by churning mountains of data through powerful algorithms, the role of the scientist may fundamentally change forever. But I suspect that the faith of Kepler and Einstein in the elegance of the universe will be vindicated in language and intelligence as well; and if not, we at least have to try.

Noam Chomsky photo by Duncan Rawlinson and his Online Photography School. Peter Norvig photo by Peter Norvig.


Kevin Gold is an Assistant Professor in the Department of Interactive Games and Media at RIT. He received his Ph.D. in Computer Science from Yale University in 2008, and his B.A. from Harvard in 2001. When he is not thinking up new ideas for his research, he enjoys reading really good novels, playing geeky games, listening to funny, clever music, and reading the webcomics xkcd and Dresden Codak.

20 comments
Marc Gioglio
2. Fuzzix
I don't think Chomsky's "derision" is adequately portrayed here (based on the information here and in the link). It seems as though Chomsky declared the database structure that is a successful mimicking tool non-scientific (which IMO is a fair assessment). It does not mean it is useless. Take the Farmer's almanac. Does anyone really consider the Farmer's almanac (just the database part of it) a (successful) scientific approach? Or more importantly, does Norvig consider the compiler's of the Farmer's Almanac to be (successful) scientists?

I by no means believe that a large database is inconsequential to scientific pursuits. In fact, I believe the opposite. I think vast quantities of technical data that can successfully mimic real situations are a supremely valuable tool in advancing scientific understanding. However, as Chomsky so "derisively" pointed out, they are not and should not be considered a substitute.
Sean Arthur
3. wsean
I have no idea why this is on tor.com, but it's fascinating and I hope to see more like it. :)
John Adams
4. JohnArkansawyer
Not to put too fine a point on it, but Chomsky is an academic primarily interested in understanding how language works. Norvig is a corporate employee primarily interested in turning a profit. It's no surprise they're in conflict.
DarrenJL
5. DarrenJL
Wow. Biased. Chomsky is not "arguably" the most influential linguist. He is the most influential linguist in the world. Love him, mild crush on him, or hate the man, that's simply a truth. There isn't a linguistics student in the world who doesn't wind up studying Chomsky from year one.
Steven Halter
6. shalter
It looks to me that both sides are building various types of strawmen. The science/math underlying probabilistic models is actually quite well understood. So the whole argument that they aren't scientific collapses.
On the other hand Norvig is interpreting Chomsky's statements, so it is unclear if Chomsky is actually expounding all of the points that Norvig is claiming.
DarrenJL
7. Mark Pontin
What Shalter said about probabilistic models and strawmen.

I have dialogued with both these men, incidentally -- Norvig face to face, Chomsky briefly by email -- and both are very smart men who are straightforward and uninterested in making sure you are impressed you with their reputations.

That said, for better or worse, we are going to see much more of Norvig's kind of science.

The author here, Kevin Gold, writes: 'What seems to be a debate about linguistics and AI is actually a debate about the future of knowledge and science. Is human understanding necessary for making successful predictions? If the answer is “no,” and the best way to make predictions is by churning mountains of data through powerful algorithms, the role of the scientist may fundamentally change forever."

This is correct and is exactly what neural nets -- including Google -- do. Often these technologies are black boxes in terms of the results that they generate. That is the point of them, since if human minds could learn to recognize the patterns that machine learning can, we wouldn't need the machine learning.

Nor is it just about churning vast amounts of data. Gold writes: "Is human understanding necessary for making successful predictions?" In many cases, human understanding is not only not necessary for making successful predictions of certain kinds, but is unequipped to think in the necessary ways to make such predictions. Whereas machine learning can get there sometimes.

So this is indeed what much 21st century science will look like, though the traditional kind won't go away.
Tim Maughan
8. TimMaughan
An utterly fascinating post, thanks so much Kevin.
DarrenJL
9. Scotoma
Isn't this what Watt's Blindsight was partly about. The main character's job was to explain something to other people without really understanding it himself. The aliens processed information without being self-aware at all. Understanding wasn't important, getting the job done was.
Chin Bawambi
10. bawambi
Very interested to see if the strawman arguments go away and the substantive differences on this topic get discussed later on (other than here of course). This could be yet another sign of the slow decline of western civilization as we know it. The death of facts seems to be a hallmark of the times. I would hope that we don't continue toward Asimov's vision in the Foundation series. But as I get older I tend to get more pessimistic towards society in general while more optimistic about my individual prospects. I guess to sum up my unfocused post - get off of my lawn!
Emmet O'Brien
11. EmmetAOBrien
bawambi@10: I don't know that "we are now exploring realms of knowledge in which statistical methods outperform elegant models" is necessarily a pessimistic statement; if anything, having statistical methods good enough to figure things out that are outside the ambit of readily deduced elegant models seems to me to only be depressing if one has a strong attachment to readily deduced elegant models as the way to do science. In my own professional field of bioinformatics, at least, there's an awful lot that's not been possible to do meaningfully without statistical approaches for decades; as conceptual tools they are complementary, and either alone is less use.
Chin Bawambi
12. bawambi
Actually Emmet I was complaining more about the rhetoric in a supposed scientific discussion than the science itself. On a side note I don't want Asimov's world mainly because of the autocracy it seems to imply. Unfortunately, I think our technological prowess has far exceeded our societal abilities. When the scientific community continues down the road of spin control then we become further lost.
DarrenJL
13. Arthur T. Murray
The article above states that "Norvig points out that basically all successful language-related AI programs now use statistical reasoning". My own successful (it thinks) AI program at http://www.scn.org/~mentifex/AiMind.html uses Chomskyan reasoning, not statistical reasoning. The AI Mind program recently began using neural inhibition to give multiple valid answers to the same question proposed repeatedly. Even more recently, the AI Mind program began retroactively adjusting its knowledge base (KB) to reflect terse yes-or-no answers from human beings to the AI trying to add to its own knowledge. Chomsky is the most cited scientist in human history and is way ahead of the nevertheless very accomplished Peter Norvig.
DarrenJL
15. Tom Bellinson
IBM's Watson answered questions. To do this, Google's fancy algorithms and mountains of data will suffice. What Watson can't do is figure out WHICH questions to ask. Chomsky's version of AI will be required for that. The key to achieving true general AI is the development of a root program that gets at finding and pursuing a purpose. Chomsky's approach is much more likely to yield such a result. We use a variety of tools to achieve intelligence. Some are mimiced by Norvig's AI strategy, but Chomsky is trying to get at the essence of being human and THAT'S the Holy Grail of AI.
DarrenJL
16. Bill la Forge
Why not use genetic software to develop a model from the data? That would likely work best and better model what people actually do.
DarrenJL
17. CA
And what about politics? In what way could artificial intelligences alter political institutions and could they lead to the establishment of a new kind of political regime?
There is not much reflection on the subject (apart from a few exploratory attempts: http://www.inter-disciplinary.net/wp-content/uploads/2011/06/rumpalaepaper.pdf ).
DarrenJL
19. David Saintloth
"Norvig points out that basically all successful language-related AI programs now use statistical reasoning (including IBM’s Watson, which I wrote about here previously)." -- Peter is correct, the funny thing is I think this argument is one mostly of semantics. Norvig just doesn't think it is important to model the minutia of syntactic structure...Chomsky thinks it is...both are kind of right...

Why?

Because each is looking at the problem with different eyes. Chomsky is a linguist, he's analyzing the structures of sentences, grammar, how meaning is relayed using these linguistic atoms but those atoms are encoded into the human brain via collections of neurons which set connections based on experience and new incoming data. Thus, Chomsky's view is true, the brain does create relational maps of the components and subcomponents of language. BUT The statistical view is precisely what these neuronal bunches are modeling out through the strengthening and weakening of axonic and dendritic connections. Norvig is looking at the problem from the perspective of just all the weightings between the stored entities (what is being stored by each neuron is irrelevant) and then creating new weights based on new data coming in, it is precisely modeling the neural networks of the brain but without building any networks...it's pretty smart. This way the effective intelligence is produced without having to build either a physical model of memory storage elements or to care what is being modeled. The same process would work for processing vision data, audio data, olfactory data, gustotory data...in fact, the neuroscientists are looking into the brain with fMRI studies and what do they see, a brain that really does chop up the sensor analysis problem into regions that are remarkably similar. Yes their are differences in how data is hieararchically arranged getting down to the neuronal elements that store it between different processing regions but those are organizational and not structural changes, ultimately weights between memory storage units are the ultimate processing action. So Chomsky's linguistics slowly grades into Norvig's statistical analysis of weights when viewed from the biological gradient of a processing brain from input sensory to low level neuron (memory atom, if you will).
Michael Burke
20. Ludon
Mokusatsu.

Anyone not familliar with that word should look it up in relation to events near the end of World War II. Trying to understand meanings in translations can be difficult - even for professionals. If Humans can have such problems, any AI using either of the discussed approaches or a combination of the two would be likely to make similar mistakes. The answer may include ideas that we have yet to find.
DarrenJL
21. Joe Repka
Perhaps we should be wary of conflating the technique (model) that results from the process (science) with the process itself.
Probablistic models (the result of some scientific process) are still mathematical models. Formulae are perhaps more elegant predictive models, but perhaps we should remain open to the idea that neither the data nor the reality necessarily supports a formal, non-probablistic model in the end. Probability distributions may simply be the best that science can at times, or even ever, do, if the scale of time or space is made sufficiently small or large.
DarrenJL
22. Robopsychologist
Very interesting... However, we should bear in mind that many serious attempts to predict how science is going to evolve seemed laughable in retrospect.

Subscribe to this thread

Receive notification by email when a new comment is added. You must be a registered user to subscribe to threads.
Post a comment