"Representation, Reasoning, Learning"
Daphne Koller
Stanford University

Acknowledgements
My collaborators on some of this work:
Nir Friedman, Lise Getoor, Avi Pfeffer, Eran Segal,       Ben Taskar
The many people whose work influenced mine
My friends and colleagues, and especially the DAGS group, for their help and support
ONR, DARPA, NSF, Sloan
Stuart Russell, Joe Halpern, IJCAI Trustees
My family for putting up with me while I prepared it

Roadmap
Building Intelligence: 2001
Representation
Reasoning
Learning
Where next

We’ve Come a Long Way
Scheduling & planning
Navigating in uncertain environment
Chess
Diagnosis & maintenance
Speech recognition
Parsing natural language
Recognizing faces
Many more …

This is 2001 …

… But Where is HAL?

Roadmap
Building Intelligence: 2001
Representation
Reasoning
Learning
Where Next

Representation: Design Decisions

Atomic Worlds
Possible worlds are indivisible “atoms”
e.g., states in a search algorithm
Algorithms must treat worlds as black boxes
e.g., access only via goal test, successor function
Enumeration is all you can do!

Propositional Worlds
More transparent representation:
World = assignment of values to attributes / truth values to propositional symbols
Knowledge base relates these propositions

Object-Relational Worlds
Worlds are relational interpretations:
Objects in the domain
Properties of these objects
Relations (links) between objects
Relational languages allow universals:
General patterns that apply to all objects

Representation: Design Decisions

Universal Truths
All universals are false
Smart students don’t always get A’s…
True universals are rarely useful
Smart students get either A, B, C, D, or F

Probable Worlds
Probabilistic semantics:
A set of possible worlds

Probable Worlds
Probabilistic semantics:
A set of possible worlds
Each world associated with a probability

Probable Worlds
Probabilistic semantics:
A set of possible worlds
Each world associated with a probability

Bayesian Networks
Propositional representation:
Avoid enumeration of worlds!
Represent distribution compactly
Bayesian networks use locality

Bayesian Networks
Propositional representation:
Avoid enumeration of worlds!
Represent distribution compactly
Bayesian networks use locality

Taxonomy of Methods

A Bridge
Semantics:
Probability distribution over possible worlds
Worlds = first-order interpretations
Representation (syntax):
Probabilistic relational models (PRMs)

St. Nordaf University

Relational Schema
Specifies types of objects in domain, attributes of each type of object, & types of links between objects

Possible Worlds

Representing the Distribution
Many possible worlds for a given university
All possible assignments of all attributes of all objects
Infinitely many potential universities
Each associated with a very different set of worlds

Probabilistic Relational Models
Universals: Probabilistic patterns hold for all objects in class
Locality: Represent direct probabilistic dependencies
Links give us potential interactions!

PRM Semantics

PRMs are First-Order
PRM can be applied to any university
Any set of students, courses, registrations, faculty, …

The Ties that Bind
Objects in the resulting model are not independent
Links between objects induce correlations between them
The result is a “web of influence” that spans many objects of different types

The Web of Influence

The Web of Influence

The Web of Influence

We’re Not in Kansas Anymore
Objects & links now first-class citizens in the model
        Þ Can introduce probabilistic model over them
Different worlds can have different objects & links!
Example reasoning pattern:
CS221 is easy — everyone gets an A
No, it’s not easy — it’s a graduate class, and only smart students take it

Summary: Representation

Are We Done?
Also need to represent:
Time
Events
Actions
Space
Mental states of agents
There is much work on representing these concepts
Need to integrate these with probabilistic language
New questions are bound to arise…

Roadmap
Building Intelligence: 2001
Representation
Reasoning
Learning
Where Next

Reasoning: The Fight Continues

Reasoning: The Fight Continues
The Complexity Monster: First Blood [Kautz, 1989]
The Complexity Monster Strikes Back [Russell, 1995]
The Return of the Complexity Monster [Koller, 2001]

Expressive Languages
Corollary: Use less expressive representations
Problem:
      “Poly(size)” is great, but not if size is exponential!
Expressive language can capture structure, which can be exploited for faster inference

Complexity in Probabilistic Models

Weapon I: Locality
Our language exploits locality:
All information is presented in terms of local influences
Locality also allows divide & conquer inference
In Bayesian networks
In constraint satisfaction
In logical reasoning

Weapon I: Locality
Our language exploits locality:
All information is presented in terms of local influences
Locality also allows divide & conquer inference
In Bayesian networks
In constraint satisfaction
In logical reasoning

Weapon I: Locality
Our language exploits locality:
All information is presented in terms of local influences
Locality also allows divide & conquer inference
In Bayesian networks
In constraint satisfaction
In logical reasoning
Divide & conquer inference extends to hierarchical structures, if only a few long-range interactions

Hierarchies
Luckily, much of the world is local & hierarchical
Departments in a university
Geographical proximity
Interactions between objects are often limited

Exploiting Hierarchical Structure

Weapon II: Universal Patterns
Problem: At inference time, many algorithms
convert objects to propositions
recreate the repetition we wanted to avoid
Robinson [1965]:  Lifted theorem proving
Do proof using variables x
Only make distinctions about x  that matter to proof

First-Order Probabilistic Inference
Question: Can we use lifting in probabilistic case?
Do probabilistic inference about “generic” objects x
Only make distinctions that affect relevant probabilities

First-Order Probabilistic Inference
Question: Can we use lifting in probabilistic case?
Do probabilistic inference about “generic” objects x
Only make distinctions that affect relevant probabilities

Exploiting Universals

Weapon III: Approximation

Stochasticity is Your Friend
Question 1: What does this mean?
In deterministic setting, unclear
Probabilities provide formal framework:
          Weak influence = small effect on probabilities
Question 2: Does it really help?
If the world is stochastic, influence decays
Over “distance” — long chains of influence
Over “time” — as system evolves
Over “quantity” — as many influences combine
Recently, many successful algorithms that exploit this idea

Summary: Inference

Roadmap
Building Intelligence: 2001
Representation
Reasoning
Learning
Where Next

Learning for Reasoning
Many possible knowledge bases:
Different dependency structures
Different parameters
Which knowledge base is “good”?
Probabilistic performance metric:
How do we find a good one?
Many learning algorithms have been developed

TB Patients in San Francisco

TB Patients in San Francisco

A Web of Data

Standard Approach

What’s in a Link

What’s in an Anchor

Reasoning for Learning

Reasoning for Learning
Use reasoning to bootstrap learning
Learn even when we never find out everything
Substantial improvement in learned models
“Lifelong” learning, from any data obtained
Learn as a natural byproduct of reasoning
Discover new concepts in data …

Discovering Hidden Concepts

Discovering Hidden Concepts

Web of Influence, Yet Again

Summary: Learning

Another Bridge

Roadmap
Building Intelligence: 2001
Representation
Reasoning
Learning
Where Next

Some Important Next Steps
Perception and understanding
Planning and decision making
Rich language for actions that represents uncertainty about future
Efficient algorithms for decision making under uncertainty that exploit representation structure
Techniques for learning action models from interaction with real world

Perception

Probabilistic Perception

Hypothetical Bridge

Hypothetical Bridge

Understanding Language

Resolving Ambiguity
Professors often meet with students
Jane is probably a student
Professors like to explain
“She” is probably Prof. Sarah

Learning Semantic Models
Statistical NLP reveals semantic patterns in words:
Semantics is implicit
We want to go from words to concepts
represent, learn, & use patterns at semantic level

A Tale of Three Bridges
Divide the problem into well-defined pieces
Make progress on each one
Build bridges to create a unified whole

A Tale of Three Bridges
Divide the problem into well-defined pieces
Make progress on each one
Build bridges to create a unified whole

Slide 78