Jun 21 2011 5:05pm

Norvig vs. Chomsky and the Fight for the Future of AI

When the Director of Research for Google compares one of the most highly regarded linguists of all time to Bill O’Reilly, you know it is on. Recently, Peter Norvig, Google’s Director of Research and co-author of the most popular artificial intelligence textbook in the world, wrote a webpage extensively criticizing Noam Chomsky, arguably the most influential linguist in the world. Their disagreement points to a revolution in artificial intelligence that, like many revolutions, threatens to destroy as much as it improves. Chomsky, one of the old guard, wishes for an elegant theory of intelligence and language that looks past human fallibility to try to see simple structure underneath. Norvig, meanwhile, represents the new philosophy: truth by statistics, and simplicity be damned. Disillusioned with simple models, or even Chomsky’s relatively complex models, Norvig has of late been arguing that with enough data, attempting to fit any simple model at all is pointless. The disagreement between the two men points to how the rise of the Internet poses the same challenge to artificial intelligence that it has to human intelligence: why learn anything when you can look it up?

Chomsky started the current argument with some remarks made at a symposium commemorating MIT’s 150th birthday. According to MIT’s Technology Review,

Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. “That’s a notion of [scientific] success that’s very novel. I don’t know of anything like it in the history of science,” said Chomsky.

To frame Chomsky’s position as scientific elegance versus complexity is not quite fair, because Chomsky’s theories have themselves become more and more complex over the years to account for all the variations in human language. Chomsky hypothesized that humans biologically know how to use language, besides just a few parameters that need to be set. But the number of parameters in his theory continued to multiply, never quite catching up to the number of exceptions, until it was no longer clear that Chomsky’s theories were elegant anymore. In fact, one could argue that the state of Chomskyan linguistics is like the state of astronomy circa Copernicus: it wasn’t that the geocentric model didn’t work, but the theory required so many additional orbits-within-orbits that people were finally willing to accept a different way of doing things. AI endeavored for a long time to work with elegant logical representations of language, and it just proved impossible to enumerate all the rules, or pretend that humans consistently followed them. Norvig points out that basically all successful language-related AI programs now use statistical reasoning (including IBM’s Watson, which I wrote about here previously).

But Norvig is now arguing for an extreme pendulum swing in the other direction, one which is in some ways simpler, and in others, ridiculously more complex. Current speech recognition, machine translation, and other modern AI technologies typically use a model of language that would make Chomskyan linguists cry: for any sequence of words, there is some probability that it will occur in the English language, which we can measure by counting how often its parts appear on the internet. Forget nouns and verbs, rules of conjugation, and so on: deep parsing and logic are the failed techs of yesteryear. In their place is the assumption that, with enough data from the internet, you can reason statistically about what the next word in a sentence will be, right down to its conjugation, without necessarily knowing any grammatical rules or word meanings at all. The limited understanding employed in this approach is why machine translation occasionally delivers amusingly bad results. But the Google approach to this problem is not to develop a more sophisticated understanding of language; it is to try to get more data, and build bigger lookup tables. Perhaps somewhere on the internet, somebody has said exactly what you are saying right now, and all we need to do is go find it. AIs attempting to use language in this way are like elementary school children googling the answers to their math homework: they might find the answer, but one can’t help but feel it doesn’t serve them well in the long term.

In his essay, Norvig argues that there are ways of doing statistical reasoning that are more sophisticated than looking at just the previous one or two words, even if they aren’t applied as often in practice. But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. “Tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” O’Reilly once said, apparently unsatisfied with physics as an explanation for anything. But is Chomsky’s dismissal of statistical approaches really as bad as O’Reilly’s dismissal of physics in general?

I’ve been a Peter Norvig fan ever since I saw his talk he gave to the Singularity Institute patiently explaining why the Singularity is bunk, a position that most AI researchers believe but somehow haven’t effectively communicated to the popular media. So I found similar joy in Norvig’s dissection of Chomsky’s famous “colorless green ideas sleep furiously” sentence, providing citations to counter Chomsky’s claim that its parts had never been spoken before. But I can’t help but feel that an indifference to elegance and understanding is a shift in the scientific enterprise, as Chomsky claims.

“Everything should be simple as possible, but no simpler,” Einstein once said, echoing William of Ockham’s centuries-old advice to scientists that entities should not be multiplied beyond necessity. The history of science is full of oversimplifications that turn out to be wrong: Kepler was right on the money with his Laws of Motion, but completely off-base in positing that the planets were nested in Platonic solids. Both models were motivated by Kepler’s desire to find harmony and simplicity hidden in complexity and chaos; in that sense, even his false steps were progress. In an age where petabytes of information can be stored cheaply, is an emphasis on brevity and simplicity an anachronism? If the solar system’s structure were open for debate today, AI algorithms could successfully predict the planets’ motion without ever discovering Kepler’s laws, and Google could just store all the recorded positions of the stars and planets in a giant database. But science seems to be about more than the accumulation of facts and the production of predictions.

What seems to be a debate about linguistics and AI is actually a debate about the future of knowledge and science. Is human understanding necessary for making successful predictions? If the answer is “no,” and the best way to make predictions is by churning mountains of data through powerful algorithms, the role of the scientist may fundamentally change forever. But I suspect that the faith of Kepler and Einstein in the elegance of the universe will be vindicated in language and intelligence as well; and if not, we at least have to try.

Noam Chomsky photo by Duncan Rawlinson and his Online Photography School. Peter Norvig photo by Peter Norvig.

Kevin Gold is an Assistant Professor in the Department of Interactive Games and Media at RIT. He received his Ph.D. in Computer Science from Yale University in 2008, and his B.A. from Harvard in 2001. When he is not thinking up new ideas for his research, he enjoys reading really good novels, playing geeky games, listening to funny, clever music, and reading the webcomics xkcd and Dresden Codak.

Marc Gioglio
2. Fuzzix
I don't think Chomsky's "derision" is adequately portrayed here (based on the information here and in the link). It seems as though Chomsky declared the database structure that is a successful mimicking tool non-scientific (which IMO is a fair assessment). It does not mean it is useless. Take the Farmer's almanac. Does anyone really consider the Farmer's almanac (just the database part of it) a (successful) scientific approach? Or more importantly, does Norvig consider the compiler's of the Farmer's Almanac to be (successful) scientists?

I by no means believe that a large database is inconsequential to scientific pursuits. In fact, I believe the opposite. I think vast quantities of technical data that can successfully mimic real situations are a supremely valuable tool in advancing scientific understanding. However, as Chomsky so "derisively" pointed out, they are not and should not be considered a substitute.
Sean Arthur
3. wsean
I have no idea why this is on tor.com, but it's fascinating and I hope to see more like it. :)
John Adams
4. JohnArkansawyer
Not to put too fine a point on it, but Chomsky is an academic primarily interested in understanding how language works. Norvig is a corporate employee primarily interested in turning a profit. It's no surprise they're in conflict.
5. DarrenJL
Wow. Biased. Chomsky is not "arguably" the most influential linguist. He is the most influential linguist in the world. Love him, mild crush on him, or hate the man, that's simply a truth. There isn't a linguistics student in the world who doesn't wind up studying Chomsky from year one.
Steven Halter
6. stevenhalter
It looks to me that both sides are building various types of strawmen. The science/math underlying probabilistic models is actually quite well understood. So the whole argument that they aren't scientific collapses.
On the other hand Norvig is interpreting Chomsky's statements, so it is unclear if Chomsky is actually expounding all of the points that Norvig is claiming.
7. Mark Pontin
What Shalter said about probabilistic models and strawmen.

I have dialogued with both these men, incidentally -- Norvig face to face, Chomsky briefly by email -- and both are very smart men who are straightforward and uninterested in making sure you are impressed you with their reputations.

That said, for better or worse, we are going to see much more of Norvig's kind of science.

The author here, Kevin Gold, writes: 'What seems to be a debate about linguistics and AI is actually a debate about the future of knowledge and science. Is human understanding necessary for making successful predictions? If the answer is “no,” and the best way to make predictions is by churning mountains of data through powerful algorithms, the role of the scientist may fundamentally change forever."

This is correct and is exactly what neural nets -- including Google -- do. Often these technologies are black boxes in terms of the results that they generate. That is the point of them, since if human minds could learn to recognize the patterns that machine learning can, we wouldn't need the machine learning.

Nor is it just about churning vast amounts of data. Gold writes: "Is human understanding necessary for making successful predictions?" In many cases, human understanding is not only not necessary for making successful predictions of certain kinds, but is unequipped to think in the necessary ways to make such predictions. Whereas machine learning can get there sometimes.

So this is indeed what much 21st century science will look like, though the traditional kind won't go away.
Tim Maughan
8. TimMaughan
An utterly fascinating post, thanks so much Kevin.
9. Scotoma
Isn't this what Watt's Blindsight was partly about. The main character's job was to explain something to other people without really understanding it himself. The aliens processed information without being self-aware at all. Understanding wasn't important, getting the job done was.
Chin Bawambi
10. bawambi
Very interested to see if the strawman arguments go away and the substantive differences on this topic get discussed later on (other than here of course). This could be yet another sign of the slow decline of western civilization as we know it. The death of facts seems to be a hallmark of the times. I would hope that we don't continue toward Asimov's vision in the Foundation series. But as I get older I tend to get more pessimistic towards society in general while more optimistic about my individual prospects. I guess to sum up my unfocused post - get off of my lawn!
Emmet O'Brien
11. EmmetAOBrien
bawambi@10: I don't know that "we are now exploring realms of knowledge in which statistical methods outperform elegant models" is necessarily a pessimistic statement; if anything, having statistical methods good enough to figure things out that are outside the ambit of readily deduced elegant models seems to me to only be depressing if one has a strong attachment to readily deduced elegant models as the way to do science. In my own professional field of bioinformatics, at least, there's an awful lot that's not been possible to do meaningfully without statistical approaches for decades; as conceptual tools they are complementary, and either alone is less use.
Chin Bawambi
12. bawambi
Actually Emmet I was complaining more about the rhetoric in a supposed scientific discussion than the science itself. On a side note I don't want Asimov's world mainly because of the autocracy it seems to imply. Unfortunately, I think our technological prowess has far exceeded our societal abilities. When the scientific community continues down the road of spin control then we become further lost.
13. Arthur T. Murray
The article above states that "Norvig points out that basically all successful language-related AI programs now use statistical reasoning". My own successful (it thinks) AI program at http://www.scn.org/~mentifex/AiMind.html uses Chomskyan reasoning, not statistical reasoning. The AI Mind program recently began using neural inhibition to give multiple valid answers to the same question proposed repeatedly. Even more recently, the AI Mind program began retroactively adjusting its knowledge base (KB) to reflect terse yes-or-no answers from human beings to the AI trying to add to its own knowledge. Chomsky is the most cited scientist in human history and is way ahead of the nevertheless very accomplished Peter Norvig.
15. Tom Bellinson
IBM's Watson answered questions. To do this, Google's fancy algorithms and mountains of data will suffice. What Watson can't do is figure out WHICH questions to ask. Chomsky's version of AI will be required for that. The key to achieving true general AI is the development of a root program that gets at finding and pursuing a purpose. Chomsky's approach is much more likely to yield such a result. We use a variety of tools to achieve intelligence. Some are mimiced by Norvig's AI strategy, but Chomsky is trying to get at the essence of being human and THAT'S the Holy Grail of AI.
16. Bill la Forge
Why not use genetic software to develop a model from the data? That would likely work best and better model what people actually do.
17. CA
And what about politics? In what way could artificial intelligences alter political institutions and could they lead to the establishment of a new kind of political regime?
There is not much reflection on the subject (apart from a few exploratory attempts: http://www.inter-disciplinary.net/wp-content/uploads/2011/06/rumpalaepaper.pdf ).
19. David Saintloth
"Norvig points out that basically all successful language-related AI programs now use statistical reasoning (including IBM’s Watson, which I wrote about here previously)." -- Peter is correct, the funny thing is I think this argument is one mostly of semantics. Norvig just doesn't think it is important to model the minutia of syntactic structure...Chomsky thinks it is...both are kind of right...


Because each is looking at the problem with different eyes. Chomsky is a linguist, he's analyzing the structures of sentences, grammar, how meaning is relayed using these linguistic atoms but those atoms are encoded into the human brain via collections of neurons which set connections based on experience and new incoming data. Thus, Chomsky's view is true, the brain does create relational maps of the components and subcomponents of language. BUT The statistical view is precisely what these neuronal bunches are modeling out through the strengthening and weakening of axonic and dendritic connections. Norvig is looking at the problem from the perspective of just all the weightings between the stored entities (what is being stored by each neuron is irrelevant) and then creating new weights based on new data coming in, it is precisely modeling the neural networks of the brain but without building any networks...it's pretty smart. This way the effective intelligence is produced without having to build either a physical model of memory storage elements or to care what is being modeled. The same process would work for processing vision data, audio data, olfactory data, gustotory data...in fact, the neuroscientists are looking into the brain with fMRI studies and what do they see, a brain that really does chop up the sensor analysis problem into regions that are remarkably similar. Yes their are differences in how data is hieararchically arranged getting down to the neuronal elements that store it between different processing regions but those are organizational and not structural changes, ultimately weights between memory storage units are the ultimate processing action. So Chomsky's linguistics slowly grades into Norvig's statistical analysis of weights when viewed from the biological gradient of a processing brain from input sensory to low level neuron (memory atom, if you will).
Michael Burke
20. Ludon

Anyone not familliar with that word should look it up in relation to events near the end of World War II. Trying to understand meanings in translations can be difficult - even for professionals. If Humans can have such problems, any AI using either of the discussed approaches or a combination of the two would be likely to make similar mistakes. The answer may include ideas that we have yet to find.
21. Joe Repka
Perhaps we should be wary of conflating the technique (model) that results from the process (science) with the process itself.
Probablistic models (the result of some scientific process) are still mathematical models. Formulae are perhaps more elegant predictive models, but perhaps we should remain open to the idea that neither the data nor the reality necessarily supports a formal, non-probablistic model in the end. Probability distributions may simply be the best that science can at times, or even ever, do, if the scale of time or space is made sufficiently small or large.
22. Robopsychologist
Very interesting... However, we should bear in mind that many serious attempts to predict how science is going to evolve seemed laughable in retrospect.
23. JimJim
Your link about Norvig saying the Singularity is bunk, is bunk. Page not found.

So I googled "peter norvig singularity" and the top link was this interview: http://ignoranceisfutile.wordpress.com/2008/08/01/googles-peter-norvig-discusses-the-technological-singularity-ai/

...in which he talks a lot about the Singularity and takes a fairly nuanced view, but hardly describes it as bunk.
24. vemv
As my favorite quote goes, no furious activity is substitute for understanding. Obviously, the Norvig side is the 'furious' one: just collect more petabytes of data, add more processing power, develop more aggressive algorithms.

Regardless of the achievements of this approach, it is obvious that this way we'll never get to create anything nearly comparable to our own cognition. Only a deep research -meditation, if you will- on the meaning of meaning will.
25. Sla
Norvig is mixing up Simulation and Creation.
We are goign to make full cycle with such ideas to basics of understanding what AI means.
And then he would say - oh well i call it different name SimmulatingAI and I (Norvig) is still right!
(normal regular, non C-suit devs/researches, can just shrug and continue their work)
26. Eric K
Norvig comes out of the old MIT AI Lab culture. You know: The LISP hackers, Symbolics, expert systems, the 80s AI boom. I've worked with some of these people, and they're enormously bright—some of the most talented programmers I've ever seen, competent and devious mathematicians, and in some cases, people who've spent their life obsessed with linguistics. These are not ordinary smart people; they're the people who leave ordinary smart people a little dazzled.

DARPA, the DOD and corporate America poured a fortune into AI systems in the 80s, and most of it went to people like Norvig. Each project, each startup tried to find an elegant, rule-based way of modeling intelligence. They had the brainpower and they money—and every last one of them failed. Norvig was there. He still knows many of the major players.

But after the famous AI winter, there were a few shoots of green. In the 90s, people started making progress on previously unsoluable problems. Face recognition went from 20% to 95% accuracy. Computers started passing vocabulary tests written for humans. Paul Graham figured out how to filter spam with 99% accuracy. Robots got a lot smarter. A decade of miserable failure was replaced by slow and steady progress.

And if you dig into any of the big successes of the 90s, every one of them has statistics hiding at the bottom: Single value decomposition, Bayes rule, particle systems, Google's PageRank itself. And as often as not, you can build a system that does 90% as well as a human being using a technique that you can explain in a single page. Statistics are powerful.

(Of course, 90% as good as a human being is often useless, especially if you can brutally exploit real humans to do the work for less.)

Now, I think that Norvig's position is this article is hyperbolic. He only needs more data because he's still lacking crucial insights. He doesn't need the entire Internet to infer the structure of English. A mere billion words would suffice if he really knew what he were doing. He should be doing statistics on structure, and not just on lexical co-locations. And what's more, I guarantee you that none of this is news to him. But for now, it's still terra incognita. Nobody knows the next step.

Or let me take a more humanistic slant. Nobody ever became a better writer by studying Chomskian linguistics. First and foremost, writers learn by reading and listening, by observing language in the wild. If you read a thousand good books, marrinate in them, and think about them, your brain will become filled with words, and patterns of words. And eventually, you'll be able to shape those words, and learn to make them work for you.

Chomsky's vision, in the end, is brutally reductionist. He thinks that language and intelligence can be reduced to simple, logical rules. Norvig spent much of his life trying this, and failed. So Norvig decided that it was time for the computers to (metaphorically speaking) download a big stack of books, and start reading. It's one of life's great cure-alls, after all.

Intelligence grows out of a rich and varied experience. When we call somebody "wise", we don't mean they've memorized a bunch of rules, but that they've lived a full life and they've started noticing the patterns. And patterns are slippery, statistical sorts of things: "If your date is rude to the waitress, there's a good chance that he's a jerk." "This reminds me of that thing with your aunt Martha." "What are the odds that a military widow in Nigeria wants to give me $17.2 million dollars?"

So Norvig's wrong, but he's interestingly wrong. The pendulum will swing again, and we'll develop better statistical methods, ones with more insight and structure. Chomsky's strategy, on the other hand, has probably taken us about as far as it can go—if only because people like Norvig have already tried a 100 variations on the theme and (warning: statistical conclusion ahead) they never really got anywhere.
27. splencha
It strikes me that language is both linear (Chomsky) and stochastic (Norvig) in a non-reducible way, much like other not-very-predictable/computable phenomena like weather, evolution, and music. Which means that it can be modeled realistically by various combinations of random and functionally generated numbers, and the models statistically compared for ability to successfully generate rules and exceptions that duplicate a specific language in hindsight. This process only generates language in a computing machine that allows the generated language to alter the rules whilst generating the particles from which the language is, itself, made. Neither Chomsky nor Norvig does this, though they both work very hard at what they do, with some results.

In short, I think they are both wrong, in ways that cannot be overcome by applying more effort to the processes that have gotten each of them as far as they have, thus far, come.

There is a third strategy that I have not heard, recently, in this debate, of modeling an entity that can intersperse catastrophic error with the opposing deterministic linear and stochastic models in such a way that it actually learns English... .
28. alistair cockburn
Very strange ending to a great article: "and if not, we at least have to try." Why? Why do we have to try? Nothing in the article indicates that this is a sentence that belongs in this article from this write.
29. Griff
The worry I have about the "petabytes of data & statistics" vs "understanding things", is that the latter empowers me (the student of the theory) to go and do something with it, whereas the former simply makes me a consumer of another google product, as I'll never have the data or computing power available myself. It makes us slaves to the supercomputer owners.
Sort of like losing the ability to use a map and relying on GPS all the time.

The obvious application for Norvig's approach is better weather predictions, surely. We need this if we4're going to survive what appears to be coming at us. Speech recognition is a nice-to-have in contrast
30. len bullard
The irony is to click on the link to Norvig's speech on the singularity and get:

"This is somewhat embarrassing, isn’t it? It seems we can’t find what you’re looking for. Perhaps searching, or one of the links below, can help."

Then have to post a real word and a stochastically generated sequence multiple times because the captcha is so obscured it can't be read accurately to post this comment.

And so it goes.
31. Steve Naidamast
I don't agree with everything that Professor Chomsky writes but in this case I am with him 100%. Calling him and others like him the "old guard" is like calling those parts of Europe that did not agree with the invasion of Iraq, "old Europe". We all know what happened in Iraq...

There is a very good reason why the Chomsky's of the world still promote a more involved understanding of the events that make up their chosen areas of expertise. In this case, creating artificial intelligence that can simply mimic Human reactions based solely on statistical models will create the same misunderstandings that politicians who were gung-ho over the Iraq invasion promoted in that event's hey day.

To be certain, there are artificial models that simply do not need to mimic Human reactions in order to be viable co-agents in an ongoing situation such as in a computerized war-game. In this case, only the appearance of Human decison making must be created. However, true artificial intelligence that is to promote Human reactions and in a very real sense be Human must also incorporate the understanding of why or how a certain reaction is developed in the space of nanoseconds. It is this part of the intelligence foundation that will maybe create a soul for an artificial entity witha sense of morality; something Humanity never really had.

To grasp such an understanding of this concept I would recommend that a reader obtain the movie by Rod Serling, "Requiem For a Country Doctor", which predicted back in the 1970s what would happen to the US medical profession if simple statistical models or merely clnical understandings of events were used for its evolution. And what we have today is exactly what Serling had feared...
32. Alan T. Balkany
Using a vast database to simulate intelligence can be a powerful approach (e.g. IBM's Watson's superhuman performance on Jeopardy).

But it fails miserably at tasks that require understanding and induction, like coming up with a theory to explain observed phenomena, and predict phenomena that haven't been seen yet.

Data without intelligence is brittle. Tiny disruptions, like the use of an unexpected synomyn preventing a match, make the stored information useless, producing failure.
33. hectavex
This reminds me of the Heideggerian AI debate, here is a great essay about it reflecting some of this argument between Chomsky/Novak Norvig (edit), except this time it's Heidegger/Minsky and others from MIT working in AI research.

Here's a quote from the essay:

"Minsky, unaware of Heidegger’s critique, was convinced that representing a few million facts about objects including their functions, would solve what had come to be called the commonsense knowledge problem. It seemed to me, however, that the real problem wasn’t storing millions of facts; it was knowing which facts were relevant in any given situation. One version of this relevance problem is called the frame problem. If the computer is running a representation of the current state of the world and something in the world changes, how does the program determine which of its represented facts can be assumed to have stayed the same, and which might have to be updated?"

Moving on to my opinion...

I do not think Norvig's AI would be considered intelligent in a conscious (cognitive) way. That's not to belittle his research, I do think that his AI will be very useful to humanity and have a solid market for years to come. Intelligence in the eyes of Norvig seems to mean "data" and using algorithms to "crunch" that data into an "appropriate" response. An AI like this should prove useful forever, an invaluable "lookup" tool for Humans and cognitive robots. It's like Google 2.0.

Though I am merely a programmer and hobby philosopher in regards to AI, the field interests me immensely so I take notes and critically analyze them. Here are a few excerpts from my notes which relate to the topic, maybe it will inspire someone or some discussion:

"We see a huge difference in the sheer number crunching capability of a computer though. Wikipedia quite possibly contains more information than any one brain could easily store and recall; this data accumulates every day. Wikipedia is the information from thousands of brains combined. If we assume that AI will have similar capabilities, when does the signal-to-noise come into play? How does the AI choose which piece of knowledge to express or relate to? The inherited data and accumulation of new data will only take longer and longer to search through and compute a good response for. The AI would use an increasing amount of data to cache, defragment, organize, and optimize compression patterns. When does this artificial “thinking” ever stop? Humans don’t chase the wild goose forever; they come to peace at times."

"Hand a monkey a hard copy of Wikipedia and he’ll probably just become fascinated with the mechanics of how it unfolds, tear out some pages, then toss the book. The monkey has thoroughly enjoyed the hard copy, within the context of the monkey’s understanding of how to achieve satisfaction with an object of similar shape/size/color/abundance. To a monkey, the hard copy is no different than a fancy pile of leaves wedged between two pieces of bark. One out of a hundred monkeys might find those traits special and hold onto the hard copy for a while longer than usual. The monkey would only appreciate the hard copy within a human context if it understood English and the concepts supposedly meant by the author of those words."

"I believe that precursors (inherint messaging systems such as dopamine and RNA making Conway's Game of Life - natural choices - possible in different contexts throughout the vessel) and master/parental guidance, in that order, are the factors which drive intelligent (cognitive) life to fruition."

"We must give AI an environment to evolve ITSELF. It must ask questions about the environment and attempt to retain or solve them, but first, it must learn how to ask questions! Similarly, a mother teaches her baby happiness, but the baby's brain encodes what that means WITHOUT the baby's awareness. But the baby provides feedback for the mother in the form of smiling or crying, likely also unaware of this. Smiling and crying at this stage seems to be communication without cognition. It's a precursor thing!"

"An AI should maintain an effective (human-like) signal-to-noise ratio within a given context. Put the AI in a grassy field and the AI should literally start thinking "I'm here in this grassy field, having thoughts of nature. Birds, trees, water. I am ready to relate some experiences, from the vastness of my database, to this very situation right here and now. I will not be speaking fondly of my time spent patrolling Walmart in the suburbs as a novice robot since that experience is completely out of context. I will however continue folding these proteins with my free cycles since that is what I find important in a broader sense." In this case, the environment implies the necessary signal-to-noise, yet the AI’s character implies a secondary signal-to-noise, the AI should adjust so as not to behave inappropriately."
34. Fred McGalliard
I have been wondering when we would get an AI that would not imbed the programmers reasoning and knowledge, but learn to read by listening to father read to it, and looking at the pictures in the book. Learn the difference between a ball and a balloon by touching them. Learn first order motion of the ball sufficient to play catch, then learn equations of motion by reading a physics text. And have an adjustable curiosity, perhaps pleasure and pain, to cause it to set it's own goal to read the physics. Currently we do not have anything I would describe as an AI. All are simple machine implementations of the very complex logic of the programmer, trying to do something in copper and silicon that he learned at his father's knee, and had fun at it, but has no idea how he himself does it.
35. Nick Nussbaum
Statistical Modeling: The Two Cultures, by Leo Breiman is well worth reading for an early discussion (2001) of the differences between formal theory and pragmatic machine learning.
36. Dan Sutton
Stupid. They're so entrenched in their own arguments that they can't see the simple fact of it: they're both right (and, by extension, wrong): each idea requires the other in order to work. Their ideas are not mutually exclusive: adjust one's thinking a little and see that they're symbiotic in nature, if combined correctly. I mean, how hard is that?
37. hectavex
Not sure how to edit my comment now; I noticed I put Novak instead of Norvig...mistake on my part (not even sure who Novak is). Anyhow, one more thing I have noticed, it has been "mentioned" that Norvig is a best selling book author as well as now operating under the influence and culture of a corporation named Google. This does not give the man any more qualifications than Chomsky over the philosophy or debate, rather it gives him an edge from an implementation standpoint. Norvig needs Google's infrastructure to build his AI. "More and more" data tables and relationships is already a main part of Google's infrastructure. Unsurprisingly, this is where Norvig sees potential for his AI, as this was where his research seemed to be leading him in the past (?).

This AI approach assumes "free" information will keep flowing into the hands of Google, where it can then be manipulated (hint: by algorithms and soon AI) into a product or service that isn't free. When more people wisen up to this they may stop handing everything over to Google. I'm on the fence myself.

A Gold Rush is underway but you won't hear about it. That's because Google owns all the rivers and are building their ultimate sluices to extract the profits behind closed doors. It's quite obvious where we're headed: a new age monopoly on information and connectivity.

I will avoid being hypocritical...I use Google products daily. That doesn't squelch my criticisms. We really should maintain a constrast between what Google does for society and what Google does for profit. Then do the same with Chomsky. It's all about motives.

For example, does the link in the original article for "The Unreasonable Effectiveness of Data" REALLY require a purchase of $19 to read? What a joke, and nice try.
38. Bradley Allen
wow, who takes Noam Chomsky seriously? wasn't he kicked out of the United States for his anti American actions? he is an anarchist and a fool, with all due respect. The guy from google? Id tend to believe him instead.
39. Emre Colak
When Einstein heard about Quantum Mechanics and the idea that everything is a probability, he said: "God doesn't roll dice". He meant that even though Quantum Mechanics does give us many answers about the world of the tiny, it doesn't truly explain it. I believe that a similar analogy can be made to this case.
40. Dave Stevens
Everyday engineers use the laws of thermodynamics to build and design things. These laws are statistical in nature, and much like Norvigs approach to language, produce very useful results. But, as Clerk Maxwell showed just over 100 years ago, those statistics are produced as a consequence of very real, very exact laws of physics operating at a very small scale. By modeling the actions of individual particles in large numbers he demonstrated how the statistical behavior of lareg systems is all derived from very non statistical behaviour of indidual elements. The human brain has a number of elements that is hard to really imagine, and certainly enough to start showing statistical behaviour when looked at from a macro perspective. But, these statistics derive from the observations of very large numbers of neurons interacting according to set and understandable rules. Just because statistics can be used to predict system behaviour, does not mean that there are not real rules underlying the observed behaviour. While Norvig's approach has many uses, Chomsky's will in the end produce the understanding of the rules that produced that statistical behaviour.

Statistics are the effect, not the cause.
41. The Philosopher Stone
If AI is just something that can eternally pass the Turing Test, then my money is on Norvig's approach. IMO this IS the process that governs the majority of the neural net we call a brain.

But I think that the true Singularity will come from somehwere else. The beast that will become conscious is the Internet iteself. The "AI" of Google is just a maximized neuronal speed for the higher brain formed from the electronic signals we ourselves create by using it.
42. Amnon Meyers
Chomsky has been ignored for decades by the vast majority of people working in natural language processing (NLP) and Artificial Intelligence.

Statistical NLP itself has flourished for a couple of decades, but is generally -- outside Google at least -- proving insufficient for many practical language processing applications and is often augmented (or replaced) with manual rules, grammars, and methods.

So what new ground is being covered here? The mismatch between Linguistics and AI/NLP is ancient history. The same for any notion of novelty associated with statistical NLP. That said, we always appreciate the buzz. ;-)
43. memosk
It is is big diference between inteligence and will.

Inteligence is two kind : 1. inteligence of free will. 2. static inteligence .

Inteligence of free will are improving static inteligence.

Third kind of inteligence is memory. This is most used kind of inteligence: de facto without inteligence.

This "inteligent" systems are based on brutal force memory or at speculations .

My opinion the best way are Mem-s. http://en.wikipedia.org/wiki/Meme

This is also bad theory, but with the most long range perspective. And deprecated by politics and universities.

But I now that mems are used secret . It is a secret technology of CIA " again the world"," ministry of deception " , truly used i east Europe for privatisation and neokolonialism .
44. hectavex
I felt I was too critical in my previous comment about Google's impending monopoly on information and connectivity, but that was until I got the message about Google Fiber.

So now they want me to get my information from Google Search, with a good chance the information is on a website hosted by Google, across my Google Fiber (or Google Android), using my Google Chrome, while watching some Google TV? Not to mention the information is sparse with Google Ads/Products/Services and Google's paid advertiser content.

Does anyone see the problem? What's next, Microsoft and Apple brand internet? Google has gone from "do no evil" to "do everything".
45. hectavex
I realize I'm having a conversation with myself at this point but it should be noted that Google Fiber WILL drive up competition for Time Warner Cable, COX and AT&T, who currently hold regional monopolies over internet access across the USA. Definitely a good thing for consumers in the short run!
46. The Master
Your "singularity is bunk" link is broken, not found. I can only conclude this whole page is bunk.
47. Bad Boy Scientist
I think the missing bit is the understanding of *how* it works (how language, etc) and that's why it isn't science and it's sort of engineering. And not the elegant sort, the trial and error sort that Edison used to 'invent' the light bulb - by trying 100,000's of materials eventually he stumbled across one that worked.

After materials scientists - and engineers - started studying materials to understand how their properties are manifest and what affects them, we could start making materials with the properties we need rather than blindly groping around for them.

The world needs both - we need to actually understand what we are doing (to understand the consequences of what we are doing) and soemtimes we can't wait for a mature theory before we do stuff. It is not a perfect system but what is?
48. Peter Hanley
"...why learn anything when you can look it up?"
Because often the most important questions are the unanswered ones.

(other people have said this in different ways, but I felt the quoted rhetorical question deserved a similarly straightforward rebuttal)
49. JaiGuru
Whenever you find yourself disagreeing with Noam Chomsky, you need to use that as a warning point to have a look at what you think and reconsider it. Maybe you're right, but you're facing as close to a supreme intelligence as mankind has produced in this generation. I trust a man who's life has been spent in pursuit of knowledge for knowledge's sake far more than a man who's spent his life persuing knowledge for money's sake.
50. Walid Saba
It is really beyond any comprehension that the argument in favor of statistical or machine learning approaches to language understanding is given any serious consideration. It is really ludicrous, scientifically, logically and practically. I can’t even believe people with reasonable scientific training can contemplate this for a second.

Statistical NLP has had over 25 years. By 1990 no serious language understanding theories and models where being developed and it was all machine learning and statistical approaches. After 25 years of statistical NLP there is nothing to speak of, and there will be nothing to speak of, even if Dr. Norving processes the internet data of 5050. Even if all that has ever been written can be processed (and they could do that, why don’t they?), no machine learning or statistical NLP approach can learn that in (1) there is a possibility that Sam is now dead, but the same is not true in (2):

1. Sam was pregnant
2. Sam is pregnant

I recall sitting next to a famous computational linguist at ACL conference in 1995 (it was held at MIT), and I asked him: the vast majority of the papers in the current proceedings are based on some experiment, a corpus analysis and learning some patterns and displaying a table at the end showing, of course, positive results. These papers are follow-ups to exact same papers I saw before but the numbers in the table are now better!!! What happened to good old NLU. What happened to linguistic models and integrating world knowledge with linguistic models for pronoun resolution, word-sense disambiguation, what is this??? He said, “my friend, the problem was so difficult, we could not get anywhere, and so we all gave up! Now we crunch data, publish papers, and justify our positions and our research grants.”

Fine with me, if you had to publish and could not get anywhere with genuine NLP, then run some machine learning experiments, get some interesting data and go on. But to start believing a lie you started?!? This is beyond my comprehension.

How could brilliant scientists even not intuitively realize the absurdity of the notion of statistical NLP. Language is the mirror of the mind. Understanding how we process language is understanding how learn, how we process and reason with knowledge, it is understanding how our cognitive capacities work, it is understanding a huge part of our mind! In short, it is one of the few remaining yet challenging problems of our time! And this, can be solved by crunching data, and applying a couple of statistical formulae? Please? Can someone with some sanity challenge these guys! So what if they are MIT grads, I worked with lots of incompetent Ivy League grads!
51. Andrew Cameron Morris
The mechanism which underlies human language and gives words meaning is pre-linguistic human perception. Each species perceives the world in its own way, but within each species the core perceptual apparatus is near universal. Hence human "common sense", which our machines presently have very little of. It is this universality which gives rise to the similarities which run through all human languages and gives rise to the sense of underlying deep structure.

Some of our universal common sense ideas relating to our subjective experience of logic and quantity have already been captured in rules (maths). Using these, together with further rules discovered about the objective or observable world (physics), we can already truly communicate many technical concepts, as well as some ideas about light and sound perception, with computers (computer language). However, before we can expect machines to understand unconstrained human language we will have to come up with ways of accurately describing all the rest of human common sense.

Rules and statistical models are both useful ways of modeling reality. Unsupervised statistical language modeling discovers production rules, many of which can be identified with traditional grammar rules from pre-computational linguistics. Google doesn't yet use this approach to language modeling because it can already make enough money out of simple word sequence occurrence statistics. However, as production rules are undoubtedly a part of our common sense knowledge of natural language, it will pay in the long run to make use of these.

Full automatic natural language understanding requires an objective description of the structure of human perception which underlies human common sense. Computers will not perceive meaning in the same way as humans because their qualia will not resemble ours. However, the qualia themselves do not carry meaning, so that will not be relevent from the point of view of communication. It may be some time before we can model all of the human sensations and emotions, but it may not be so difficult to model the percepts and concepts which underly the natural language description of technical information.
52. Tony Browne
Perhaps we can use statistical/machine learning techniques to model the data, and then extract comprehensible symbolic rules that explain what the machine learning systems have learnt. We as mere humans could then use this knowledge to build/extend our theories.

I myself was involved in a research project some years ago to extract comprehensible rules from neural networks (they are no longer 'black boxes').

53. Victor Nimagine
Behind the Chomsky position, there is the idea that we can master the language structures with the help of computer logic (this is what happens from SGML to Semantic Web and the use of ontologies : RDF, OWL, etc..). On the contrary behind Norvig'ones, there is the idea that computers and the Internet now NEED human brains (individually and collectively networking , so statiscally...) to work. Google is Goolem ...
54. Lev Goldfarb
@Walid Saba said:
"It is really beyond any comprehension that the argument in favor of
statistical or machine learning approaches to language understanding is given any serious consideration. It is really ludicrous, scientifically,
logically and practically. I can’t even believe people with reasonable
scientific training can contemplate this for a second."

Walid, I agree, but after watching in disbelief for the last several decades what's going on in AI in general, one should not be surprised at all.

We still have not opened the door to AI, including NLP. At the same time, neither the major funding agencies nor the scientific community as a whole have realized the fundamental truism that the development of AI cannot proceed in the usual incremental manner based on the development of science as we have known it so far. And this is despite the known fact that the fathers of the Scientific Revolution (Descartes, Newton, Leibnitz and many others) new better. Since they realized that mind is a non-spatial entity, and they wanted to develop science on the basis of various spatial consideration, including movement of objects in space, they have explicitly excluded mind from their (and hence our) science.

Under these circumstances, given that we haven't touched yet real AI, including NLP, it is not surprising at all that statistical considerations allow one to produce useful (but not at all AI) programs.

However, the said point is that all this time we have been producing "AI specialists" who have been spoiled to such an extent that they would not recognize a relevant AI development even if it is staring one in the face. The whole present debate is a particular manifestation of this "AI training": a strong case of confusing a useful program with AI science.

Finally, it is easy to anticipate the usual objection: How do you know that this is not an AI (or NLP) program? I know because when we will get to see any real AI model---more accurately, I would bet, a fundamentally new representational formalism---we would be able to understand almost immediately what has been missing from the big scientific picture as we know it today. Intelligent information processing is not a CS trick but a pervasive reality in the Universe. And there is absolutely no way a scientifically
well educated (I don't mean MIT or Stanford) person would believe that---in contrast to our best current scientific models---such reality can be modeled by a CS or statistical 'trick".
55. Walid Saba
@Lev Goldfarb said:
"Walid, I agree, but after watching in disbelief for the last several decades what's going on in AI in general, one should not be surprised at all. We still have not opened the door to AI, including NLP. At the same time, neither the major funding agencies nor the scientific community as a whole have realized the fundamental truism that the development of AI cannot proceed in the usual incremental manner based on the development of science as we have known it so far. And this is despite the known fact that the fathers of the Scientific Revolution (Descartes, Newton, Leibnitz and many others) new better. Since they realized that mind is a non-spatial entity, and they wanted to develop science on the basis of various spatial consideration, including movement of objects in space, they have explicitly excluded mind from their (and hence our) science. Under these circumstances, given that we haven't touched yet real AI, including NLP, it is not surprising at all that statistical considerations allow one to produce useful (but not at all AI) programs."

Lev, I wholheartedly agree with you, but don't you think there is a serious danger of blurring the line between science and hacking, between immediate results and stopping outright real foundational work in the real science of AI and NLP. This is what has been happening. I talk to very bright people and they think Siri is AI, they think the fact that a ranking algorithm that can be designed a bright high school kid is AI.... This is what I am concerned about, if the larger community believes this IS it, this is AI, then why invest in the real work... I am afraid we will not go back to the real foundational work, which is hard and long (after all, this is perhaps one of the most challenging problems in science!), unless we expose what is being termed AI and NLP as NOT!
56. Lev Goldfarb
Walid, I agree.
Now everything is AI. We *must* stop using AI name for all such developments. The problem is that from the very beginning of 'AI', besides assuring continued funding, people wanted to sooth their ego by calling it 'AI', and now we have a very, very, . . . big problem. ;--)
57. Lev Goldfarb
Two misprints in my long message:

1." (Descartes, Newton, Leibnitz and many others) new better." --->
"(Descartes, Newton, Leibnitz and many others) knew better."

2. "However, the said point" ---> "However, the sad point"
58. Bradford W. Miller
@Andrew Cameron Morris
I recall sitting next to a famous computational linguist at ACL conference in 1995 (it was held at MIT), and I asked him: the vast majority of the papers in the current proceedings are based on some experiment, a corpus analysis and learning some patterns and displaying a table at the end showing, of course, positive results. These papers are follow-ups to exact same papers I saw before but the numbers in the table are now better!!! What happened to good old NLU. What happened to linguistic models and integrating world knowledge with linguistic models for pronoun resolution, word-sense disambiguation, what is this???
I recall co-authoring a paper at the 1996 ACL that described a good old NLU approach to discourse-based dialogue processing. So it's not that the approach died, it's that the field bisected. The statistical guys were much more successful on their corpora, but generally couldn't do interactive problem solving....
59. Bradford W. Miller
@Walid Saba

Sorry, the quote was from you. I misattributed it to Mr. Morris.
60. JackParsons
"Elephants don't play chess" - Rodney Brooks

The emergence of language is as accidental as the emergence of sciatica. It is an utter and total accident. The concept that some range of rules can be created for it is insane. A useful project, yes, but still doomed.
Steven Halter
61. stevenhalter
A hammer is not a house--but it is useful to have when building one. Throwing out any tool at this point is not a particularly useful course of action.
Statistical methods of learning and formal theories of the mind are both tools to use as we progress in understanding the mechanisms underlying human intelligence.
62. Oleg Skoptsev
Hi Lev!
It is nice to see your posts here!
“The problem is that from the very beginning of 'AI', besides assuring continued funding, people wanted to sooth their ego by calling it 'AI', and now we have a very, very, . . . big problem. ;--)”

I’d like to attempt to solve it for you! :=)))

I define “Intelligence” as the property of any social beings and a society, which brings new quality of survival to the society as well as to an individual member due to “an ability collecting, converting and disseminating member’s personal experience among other members of society by using LANGUAGE”.
In this case any member, which is able to understand language may use experience of others without learning consequences in real world…

So, from this perspective Chomsky is much closer to AI than anybody else…
And, in my view, he is on the right track, trying to discover internal laws of nature build in to the language, but uses the wrong tool – human interpretations of it – grammars and rules (which are mostly based on stat :=).

(I would be glad to share with him some ideas)

I don’t think Norvig's statistical approach to AI is valid at all, and (IMHO) it would be much more productive and closer to AI, if he would use conceptual search instead of keywords ones, but much more intelligent will be motivational (necessities based) contextual search, which humans use to understand each other.
And when he will try to implement the last one he will come to Chomsky for help :=)))
63. Lev Goldfarb
"I’d like to attempt to solve it for you!"

Oleg, unfortunately, no one can solve this 'AI education problem' now, and, as I mentioned above, the prominence of the present, Norvig vs. Chomsky, debate is one of its consequences.

Chomsky, of course, has probably done more for the future AI than any of its known practitioners. But, unfortunately his stance against induction has considerably weakened his generative proposal.
64. Richard Wojcik
As someone who has worked in both theoretical linguistics and natural language processing, I would say that both gentlemen are talking past each other. Chomsky's theory attempts to explain well-formed linguistic structure in human languages, not linguistic behavior. AI research is all about language use, not grammaticality. Text and speech processing programs need to be able to process ungrammatical or unexpected linguistic input, whereas theoretical linguists (at least, in Chomsky's school) tend to see that as less of a core issue for linguistic theory. Computational linguists are interested in extracting meaning from ungrammatical or "noisy" linguistic signals. So statistical methods may seem more effective to them, albeit mainly on the analytical side. Language production requires more attention to the details of grammatical well-formedness, so Chomsky's sophisticated linguistic analyses become more important for language generation tasks.
65. Paul Gowan
An interesting spin on Einstein/Schrodinger vs. Heisenberg/Bohr.
I agree that Mr. Norvig amd Mr. Chomsky are talking past each other.
Mr. Norvig places too great a stress on the use of statistics and AIMA scarcely mentions other approaches to NLP. The work done under Dr. Roger Schank, for example, is virtually unknown as a result amd papers are appearing that reinvent ideas from the Yale school of A.I. under Dr. Schank. WATSON clearly has no understanding of the answers retrieved and lacks the world knowledge that the Yale researchers found was needed and CYC sought to acquire.
If you read the writings of Edwin T. Jaynes on Information theory, Bayesian statistics, probability and expert systems and of Alan Turing on connectionism, there clearly is a prominent role to be played by the statistical approach in learning and reasoning but you know what they say about people with hammers!
67. veryPhil
Copernicus didn't believe in a "geocentric" center of the universe.
69. Lue Ball
Firstly, I'm not a scientist. Surely language is a tool for communicating,- communicating meaning. When we speak, what we are trying to say is the main predictor of what words we will use. there will be other variables that might alter the exact words we use, and the order of use. Words that are currently in common use will tend to come to mind before words that are out of date. Ideas in vogue will affect the predictability of what words come in what order. I doubt that the word pixel was used very often 200 years ago, or that phrase, 'state of the art', which became popular around 25 years ago and is now in decline.
The idea of chucking a lot of words into a box, averaging them out, and expecting then to come up with the next most likely word is a bit silly, isn't it? I can see it might be a fun thing to try, but the results are surely not going to be sensible- more likely, extremely funny. I wonder if we chucked a lot of Shakespeare's plays into a box, & churned it about a bit would we get his next great masterpiece? Let me see, I can't wait to read. - 'The Merry Merchants of Windsor.' or how about, 'A Midsummers Tempest.'
From his picture, Mr Norvig looks a bit of a joker, or maybe he sees some merit in reducing language to cypher,- maybe he's teasing us. The point is, is that cypher doesn't have meaning, and I don't think a computer could write a book without a person to direct it. -But, - music might be a different matter, Bach used the mathematical approach. But not language, no, ridiculously silly idea! Really though, it's quite clear that people can be taken in, by their desperation to look intelligent.
70. Wave
Perhaps Mr Novig's conclusion is unsettling because it suggests cognition/self-awareness and intelligence are actually an illusion. Our minds are just probability generators.
71. hans k
Not sure if this thread is still active. I just finished reading a great treatise for its time by Carl Sagan, The Dragons of Eden: Speculations on the Evolution of Human Intelligence, which led me to this thread. Great thread.

I am somewhat surprised by the lack of sympathy with Norvig's general thrust of massive banal statistical methods for generating AI. Most modern thought assumes human intelligence evolved from natural selection which is a random statistical type of brute force unintelligent processing. Natural evolution does not care about the structure of the brain or language syntax. The ability to understand a sentence had obvious evolutionary advantage. Taken further, the ability to understand a sentence with the least amount of computing power would also have been selected for.

I am not sure if statistical methods of mining massive data sets and arbitrarily searching for linguistic patterns will lead to AI, but I can't help but think that this is how evolution produced "natural intelligence" in humans.

I suspect that the use of genetic algorithms with the use of ARTIFICIAL selection would be the best course for the evolution of true natural language recognition by computers. The parameters for the artificial selection would have to be minimal processing effort for some quantifiable threshold of language recognition. I would be willing to bet that you would begin to observe structural language components artificially "evolving" pretty quickly. This could illuminate the more holistic "Chomskian" understanding of the micro and macro units and structural components of intelligence and natural language processing.

Any thoughts?

Subscribe to this thread

Receive notification by email when a new comment is added. You must be a registered user to subscribe to threads.
Post a comment