**"Representation,
Reasoning, Learning"**

Daphne Koller | |

Stanford University |

My collaborators on some of this work: | ||

Nir Friedman, Lise Getoor, Avi Pfeffer, Eran Segal, Ben Taskar | ||

The many people whose work influenced mine | ||

My friends and colleagues, and especially the DAGS group, for their help and support | ||

ONR, DARPA, NSF, Sloan | ||

Stuart Russell, Joe Halpern, IJCAI Trustees | ||

My family for putting up with me while I prepared it |

Building Intelligence: 2001 | |

Representation | |

Reasoning | |

Learning | |

Where next |

Scheduling & planning | |

Navigating in uncertain environment | |

Chess | |

Diagnosis & maintenance | |

Speech recognition | |

Parsing natural language | |

Recognizing faces | |

Many more … | |

Building Intelligence: 2001 | |

Representation | |

Reasoning | |

Learning | |

Where Next |

**Representation: Design
Decisions**

Possible worlds are indivisible “atoms” | ||

e.g., states in a search algorithm | ||

Algorithms must treat worlds as black boxes | ||

e.g., access only via goal test, successor function | ||

Enumeration is all you can do! |

More transparent representation: | ||

World = assignment of values to attributes / truth values to propositional symbols | ||

Knowledge base relates these propositions |

Worlds are relational interpretations: | ||

Objects in the domain | ||

Properties of these objects | ||

Relations (links) between objects | ||

Relational languages allow universals: | ||

General patterns that apply to all objects |

**Representation: Design
Decisions**

All universals are false | ||

Smart students don’t always get A’s… | ||

True universals are rarely useful | ||

Smart students get either A, B, C, D, or F |

Probabilistic semantics: | ||

A set of possible worlds |

Probabilistic semantics: | ||

A set of possible worlds | ||

Each world associated with a probability |

Probabilistic semantics: | ||

A set of possible worlds | ||

Each world associated with a probability |

Propositional representation: | ||

Avoid enumeration of worlds! | ||

Represent distribution compactly | ||

Bayesian networks use locality |

Propositional representation: | ||

Avoid enumeration of worlds! | ||

Represent distribution compactly | ||

Bayesian networks use locality |

Semantics: | ||

Probability distribution over possible worlds | ||

Worlds = first-order interpretations | ||

Representation (syntax): | ||

Probabilistic relational models (PRMs) |

Specifies types of objects in domain, attributes of each type of object, & types of links between objects | ||

Many possible worlds for a given university | ||

All possible assignments of all attributes of all objects | ||

Infinitely many potential universities | ||

Each associated with a very different set of worlds |

**Probabilistic Relational
Models**

Universals: Probabilistic patterns hold for all objects in class | ||

Locality: Represent direct probabilistic dependencies | ||

Links give us potential interactions! |

PRM can be applied to any university | ||

Any set of students, courses, registrations, faculty, … | ||

Objects in the resulting model are not independent | ||

Links between objects induce correlations between them | ||

The result is a “web of influence” that spans many objects of different types |

Objects & links now first-class citizens in the model | ||

Þ Can introduce probabilistic model over them | ||

Different worlds can have different objects & links! | ||

Example reasoning pattern: | ||

CS221 is easy — everyone gets an A | ||

No, it’s not easy — it’s a graduate class, and only smart students take it |

Also need to represent: | ||

Time | ||

Events | ||

Actions | ||

Space | ||

Mental states of agents | ||

… | ||

There is much work on representing these concepts | ||

Need to integrate these with probabilistic language | ||

New questions are bound to arise… |

Building Intelligence: 2001 | ||

Representation | ||

Reasoning | ||

Learning | ||

Where Next |

**Reasoning: The Fight
Continues**

**Reasoning: The Fight
Continues**

The Complexity Monster: First Blood [Kautz, 1989] | |

The Complexity Monster Strikes Back [Russell, 1995] | |

The Return of the Complexity Monster [Koller, 2001] | |

Corollary: Use less expressive representations | |

Problem: | |

“Poly(size)” is great, but not if size is exponential! | |

Expressive language can capture structure, which can be exploited for faster inference |

**Complexity in
Probabilistic Models**

Our language exploits locality: | ||

All information is presented in terms of local influences | ||

Locality also allows divide & conquer inference | ||

In Bayesian networks | ||

In constraint satisfaction | ||

In logical reasoning |

Our language exploits locality: | ||

All information is presented in terms of local influences | ||

Locality also allows divide & conquer inference | ||

In Bayesian networks | ||

In constraint satisfaction | ||

In logical reasoning |

Our language exploits locality: | ||

All information is presented in terms of local influences | ||

Locality also allows divide & conquer inference | ||

In Bayesian networks | ||

In constraint satisfaction | ||

In logical reasoning | ||

Divide & conquer inference extends to hierarchical structures, if only a few long-range interactions | ||

Luckily, much of the world is local & hierarchical | ||

Departments in a university | ||

Geographical proximity | ||

Interactions between objects are often limited |

**Exploiting Hierarchical
Structure**

Problem: At inference time, many algorithms | ||

convert objects to propositions | ||

recreate the repetition we wanted to avoid | ||

Robinson [1965]: Lifted theorem proving | ||

Do proof using variables x | ||

Only make distinctions about x that matter to proof |

**First-Order Probabilistic
Inference**

Question: Can we use lifting in probabilistic case? | ||

Do probabilistic inference about “generic” objects x | ||

Only make distinctions that affect relevant probabilities |

**First-Order Probabilistic
Inference**

Question: Can we use lifting in probabilistic case? | ||

Do probabilistic inference about “generic” objects x | ||

Only make distinctions that affect relevant probabilities |

Question 1: What does this mean? | |||

In deterministic setting, unclear | |||

Probabilities provide formal framework: | |||

Weak influence = small effect on probabilities | |||

Question 2: Does it really help? | |||

If the world is stochastic, influence decays | |||

Over “distance” — long chains of influence | |||

Over “time” — as system evolves | |||

Over “quantity” — as many influences combine | |||

Recently, many successful algorithms that exploit this idea |

Building Intelligence: 2001 | ||

Representation | ||

Reasoning | ||

Learning | ||

Where Next |

Many possible knowledge bases: | ||

Different dependency structures | ||

Different parameters | ||

Which knowledge base is “good”? | ||

Probabilistic performance metric: | ||

How do we find a good one? | ||

Many learning algorithms have been developed |

Use reasoning to bootstrap learning | ||

Learn even when we never find out everything | ||

Substantial improvement in learned models | ||

“Lifelong” learning, from any data obtained | ||

Learn as a natural byproduct of reasoning | ||

Discover new concepts in data … |

Building Intelligence: 2001 | ||

Representation | ||

Reasoning | ||

Learning | ||

Where Next | ||

Perception and understanding | ||

Planning and decision making | ||

Rich language for actions that represents uncertainty about future | ||

Efficient algorithms for decision making under uncertainty that exploit representation structure | ||

Techniques for learning action models from interaction with real world |

Professors often meet with students | ||

Jane is probably a student | ||

Professors like to explain | ||

“She” is probably Prof. Sarah |

Statistical NLP reveals semantic patterns in words: | ||

Semantics is implicit | ||

We want to go from words to concepts | ||

represent, learn, & use patterns at semantic level |

Divide the problem into well-defined pieces | |

Make progress on each one | |

Build bridges to create a unified whole |

Divide the problem into well-defined pieces | |

Make progress on each one | |

Build bridges to create a unified whole |