FROM SEARCH ENGINES TO QUESTION-ANSWERING SYSTEMS â€“ THE PROBLEMS OF WORLD KNOWLEDGE, RELEVANCE, DEDUCTION AND PRECISIATION

Lotfi ZADEH

Authors

Lotfi ZADEH University of California Berkeley

Abstract

FROM SEARCH ENGINES TO QUESTION-ANSWERING
SYSTEMS â€“ THE PROBLEMS OF WORLD
KNOWLEDGE, RELEVANCE, DEDUCTION AND
PRECISIATION
Lotfi A. Zadeh*
Existing search engines, with Google at the top, have manyÂ truly remarkable capabilities. Furthermore, constant progress isÂ being made in improving their performance. But what is notÂ widely recognized is that there is a basic capability whichÂ existing search engines do not have: deduction capability â€“ theÂ capability to synthesize an answer to a query by drawing onÂ bodies of information which reside in various parts of theÂ knowledge base. By definition, a question answering system, ora Q/A system for short, is a system which has deductionÂ capability. Can a search engine be upgraded to a questionansweringÂ system through the use of existing tools â€“ tools whichÂ are based on bivalent logic and probability theory? A view whichÂ is articulated in the following is that the answer is: No.Â The first obstacle is world knowledge â€“ the knowledgeÂ which humans acquire through experience, communication and Â education. Simple examples are: â€œIcy roads are slippery,â€Â â€œPrinceton usually means Princeton University,â€ â€œParis is theÂ capital of France,â€ and â€œThere are no honest politicians.â€ WorldÂ knowledge plays a central role in search, assessment of relevanceÂ and deduction.

*Professor in the Graduate School and Director, Berkeley initiative in Soft
Computing (BISC), Computer Science Division and the Electronics Research
Laboratory, Department of EECS, University of California, Berkeley, CA
94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712;
E-Mail: zadeh@cs.berkeley.edu . Research supported in part by ONR
N00014-02-1-0294, BT Grant CT1080028046, Omron Grant, Tekes Grant
and the BISC Program of UC Berkeley.

The problem with world knowledge is that it is, for the most part, perception-based. Perceptionsâ€“and especially perceptions of probabilitiesâ€“are intrinsically imprecise, reflecting the fact that human sensory organs, and ultimately the brain, have a bounded ability to resolve detail and store information. Imprecision of perceptions stands in the way of using conventional techniques â€“ techniques which are based on bivalent logic and probability theory â€“ to deal with perception-based information. A further complication is that much of world knowledge is negative knowledge in the sense that it relates to what is impossible and/or non-existent. For example, â€œA person cannot have two fathers,â€ and â€œNetherlands has no mountains.â€

The second obstacle centers on the concept of relevance. There is an extensive literature on relevance, and every search engine deals with relevance in its own way, some at a high level of sophistication. But what is quite obvious is that the problem of assessment of relevance is quite complex and far from solution.Â

There are two kinds of relevance: (a) question relevance and (b) topic relevance. Both are matters of degree. For example, on a very basic level, if the question is q: â€œNumber of cars in California?â€ and the available information is p: â€œPopulation of California is 37,000,000,â€ then what is the degree of relevance of p to q? Another example: To what degree is a paper entitled â€œA New Approach to Natural Language Understandingâ€ of relevance to the topic of machine translation.Â

Basically, there are two ways of approaching assessment of relevance: (a) semantic; and (b) statistical. To illustrate, in the number of cars example, relevance of p to qis a matter of semantics and world knowledge. In existing search engines, relevance is largely a matter of statistics, involving counts of links and words, with little if any consideration of semantics. Assessment of semantic relevance presents difficult problems whose solutions lie beyond the reach of bivalent logic and probability theory. What should be noted is that assessment of topic relevance is more amendable to the use of statistical techniques, which explains why existing search engines areÂ

much better at assessment of topic relevance then question relevance.

The third obstacle is deduction from perception-based information. As a basic example, assume that the question is q: What is the average height of Swedes?, and the available information is p: Most adult Swedes are tall. Another example is: Usually Robert returns from work at about 6 pm. What is the probability that Robert is at home at 6:15 pm? Neither bivalent logic nor probability theory provide effective tools for dealing with problems of this type. The difficulty is centered on deduction from premises which are both uncertain and imprecise.Â Â

Underlying the problems of world knowledge, relevance and deduction is a very basic problem â€“ the problem of natural language understanding. Much of world knowledge and web knowledge is expressed in a natural language. A natural language is basically a system for describing perceptions. Since perceptions are intrinsically imprecise, so are natural languages.

A prerequisite to mechanization of question-answering is mechanization of natural language understanding, and a prerequisite to mechanization of natural language understanding is precisiation of meaning of concepts and proposition drawn from a natural language. To deal effectively with world knowledge, relevance, deduction and precisiation, new tools are needed. The principal new tools are: Precisiated Natural Language (PNL); Protoform Theory (PFT); and the Generalized Theory of Uncertainty (GTU). These tools are drawn from fuzzy logicâ€“a logic in which everything is, or is allowed to be, a matter of degree.Â

The centerpiece of the new tools is the concept of a generalized constraint. The importance of the concept of a generalized constraint derives from the fact that in PNL and GTU it serves as a basis for generalizing the universally accepted view that information is statistical in nature. More specifically, the point of departure in PNL and GTU is the fundamental premise that, in general, information is representable as a systemÂ

of generalized constraints, with statistical information constituting a special case. This, much more general, view of information is needed to deal effectively with world knowledge, relevance, deduction, precisiation and related problems.

In summary, the principal objectives of this paper are: (a) to make a case for the view that a quantum jump in search engine IQ cannot be achieved through the use of methods based on bivalent logic and probability theory; and (b) to introduce and outline a collection of non-standard concepts, ideas and tools which are needed to achieve a quantum jump in search engine IQ.Â

Author Biography

Lotfi ZADEH, University of California Berkeley

Computer Science DivisionÂ Department of Electrical Engineering and Computer Sciences

FROM SEARCH ENGINES TO QUESTION-ANSWERING SYSTEMS â€“ THE PROBLEMS OF WORLD KNOWLEDGE, RELEVANCE, DEDUCTION AND PRECISIATION

Authors

Abstract

Author Biography

Lotfi ZADEH, University of California Berkeley

Downloads

Published

Issue

Section

Developed By

Information