"Representation,
Reasoning, Learning"
|
|
|
Daphne Koller |
|
Stanford University |
Acknowledgements
|
|
|
|
My collaborators on some of this work: |
|
Nir Friedman, Lise Getoor, Avi Pfeffer,
Eran Segal, Ben Taskar |
|
The many people whose work influenced
mine |
|
My friends and colleagues, and
especially the DAGS group, for their help and support |
|
ONR, DARPA, NSF, Sloan |
|
Stuart Russell, Joe Halpern, IJCAI
Trustees |
|
My family for putting up with me while
I prepared it |
Roadmap
|
|
|
Building Intelligence: 2001 |
|
|
|
Representation |
|
|
|
Reasoning |
|
|
|
Learning |
|
|
|
Where next |
We’ve Come a Long Way
|
|
|
Scheduling & planning |
|
Navigating in uncertain environment |
|
Chess |
|
Diagnosis & maintenance |
|
Speech recognition |
|
Parsing natural language |
|
Recognizing faces |
|
Many more … |
|
|
This is 2001 …
… But Where is HAL?
Roadmap
|
|
|
Building Intelligence: 2001 |
|
|
|
Representation |
|
|
|
Reasoning |
|
|
|
Learning |
|
|
|
Where Next |
Representation: Design
Decisions
Atomic Worlds
|
|
|
|
Possible worlds are indivisible “atoms” |
|
e.g., states in a search algorithm |
|
Algorithms must treat worlds as black
boxes |
|
e.g., access only via goal test,
successor function |
|
Enumeration is all you can do! |
Propositional Worlds
|
|
|
|
More transparent representation: |
|
World = assignment of values to
attributes / truth values to propositional symbols |
|
Knowledge base relates these
propositions |
Object-Relational Worlds
|
|
|
|
Worlds are relational interpretations: |
|
Objects in the domain |
|
Properties of these objects |
|
Relations (links) between objects |
|
Relational languages allow universals: |
|
General patterns that apply to all
objects |
Representation: Design
Decisions
Universal Truths
|
|
|
|
All universals are false |
|
Smart students don’t always get A’s… |
|
True universals are rarely useful |
|
Smart students get either A, B, C, D,
or F |
Probable Worlds
|
|
|
|
Probabilistic semantics: |
|
A set of possible worlds |
Probable Worlds
|
|
|
|
Probabilistic semantics: |
|
A set of possible worlds |
|
Each world associated with a
probability |
Probable Worlds
|
|
|
|
Probabilistic semantics: |
|
A set of possible worlds |
|
Each world associated with a
probability |
Bayesian Networks
|
|
|
|
Propositional representation: |
|
Avoid enumeration of worlds! |
|
Represent distribution compactly |
|
Bayesian networks use locality |
Bayesian Networks
|
|
|
|
Propositional representation: |
|
Avoid enumeration of worlds! |
|
Represent distribution compactly |
|
Bayesian networks use locality |
Taxonomy of Methods
A Bridge
|
|
|
|
Semantics: |
|
Probability distribution over possible
worlds |
|
Worlds = first-order interpretations |
|
Representation (syntax): |
|
Probabilistic relational models (PRMs) |
St. Nordaf University
Relational Schema
|
|
|
|
Specifies types of objects in domain,
attributes of each type of object, & types of links between objects |
|
|
Possible Worlds
Representing the
Distribution
|
|
|
|
Many possible worlds for a given
university |
|
All possible assignments of all
attributes of all objects |
|
|
|
Infinitely many potential universities |
|
Each associated with a very different
set of worlds |
Probabilistic Relational
Models
|
|
|
|
Universals: Probabilistic patterns hold
for all objects in class |
|
Locality: Represent direct
probabilistic dependencies |
|
Links give us potential interactions! |
PRM Semantics
PRMs are First-Order
|
|
|
|
PRM can be applied to any university |
|
Any set of students, courses,
registrations, faculty, … |
|
|
The Ties that Bind
|
|
|
|
Objects in the resulting model are not
independent |
|
Links between objects induce
correlations between them |
|
|
|
The result is a “web of influence” that
spans many objects of different types |
The Web of Influence
The Web of Influence
The Web of Influence
We’re Not in Kansas
Anymore
|
|
|
|
Objects & links now first-class
citizens in the model |
|
Þ Can introduce probabilistic model over them |
|
Different worlds can have different
objects & links! |
|
Example reasoning pattern: |
|
CS221 is easy — everyone gets an A |
|
No, it’s not easy — it’s a graduate
class, and only smart students take it |
Summary: Representation
Are We Done?
|
|
|
|
Also need to represent: |
|
Time |
|
Events |
|
Actions |
|
Space |
|
Mental states of agents |
|
… |
|
There is much work on representing
these concepts |
|
Need to integrate these with
probabilistic language |
|
New questions are bound to arise… |
Roadmap
|
|
|
|
Building Intelligence: 2001 |
|
|
|
Representation |
|
|
|
Reasoning |
|
|
|
Learning |
|
|
|
Where Next |
Reasoning: The Fight
Continues
Reasoning: The Fight
Continues
|
|
|
The Complexity Monster: First Blood
[Kautz, 1989] |
|
|
|
The Complexity Monster Strikes Back
[Russell, 1995] |
|
|
|
The Return of the Complexity Monster
[Koller, 2001] |
|
|
Expressive Languages
|
|
|
Corollary: Use less expressive
representations |
|
Problem: |
|
“Poly(size)” is great, but not if size is exponential! |
|
Expressive language can capture
structure, which can be exploited for faster inference |
Complexity in
Probabilistic Models
Weapon I: Locality
|
|
|
|
Our language exploits locality: |
|
All information is presented in terms
of local influences |
|
Locality also allows divide &
conquer inference |
|
In Bayesian networks |
|
In constraint satisfaction |
|
In logical reasoning |
Weapon I: Locality
|
|
|
|
Our language exploits locality: |
|
All information is presented in terms
of local influences |
|
Locality also allows divide &
conquer inference |
|
In Bayesian networks |
|
In constraint satisfaction |
|
In logical reasoning |
Weapon I: Locality
|
|
|
|
Our language exploits locality: |
|
All information is presented in terms
of local influences |
|
Locality also allows divide &
conquer inference |
|
In Bayesian networks |
|
In constraint satisfaction |
|
In logical reasoning |
|
Divide & conquer inference extends
to hierarchical structures, if only a few long-range interactions |
|
|
Hierarchies
|
|
|
|
Luckily, much of the world is local
& hierarchical |
|
Departments in a university |
|
Geographical proximity |
|
Interactions between objects are often
limited |
Exploiting Hierarchical
Structure
Weapon II: Universal
Patterns
|
|
|
|
Problem: At inference time, many
algorithms |
|
convert objects to propositions |
|
recreate the repetition we wanted to
avoid |
|
|
|
Robinson [1965]: Lifted theorem proving |
|
Do proof using variables x |
|
Only make distinctions about x that matter to proof |
First-Order Probabilistic
Inference
|
|
|
|
Question: Can we use lifting in
probabilistic case? |
|
Do probabilistic inference about
“generic” objects x |
|
Only make distinctions that affect
relevant probabilities |
First-Order Probabilistic
Inference
|
|
|
|
Question: Can we use lifting in
probabilistic case? |
|
Do probabilistic inference about
“generic” objects x |
|
Only make distinctions that affect
relevant probabilities |
Exploiting Universals
Weapon III: Approximation
Stochasticity is Your
Friend
|
|
|
|
|
Question 1: What does this mean? |
|
In deterministic setting, unclear |
|
Probabilities provide formal framework: |
|
Weak influence = small effect on
probabilities |
|
|
|
Question 2: Does it really help? |
|
If the world is stochastic, influence
decays |
|
Over “distance” — long chains of
influence |
|
Over “time” — as system evolves |
|
Over “quantity” — as many influences
combine |
|
Recently, many successful algorithms
that exploit this idea |
Summary: Inference
Roadmap
|
|
|
|
Building Intelligence: 2001 |
|
|
|
Representation |
|
|
|
Reasoning |
|
|
|
Learning |
|
|
|
Where Next |
Learning for Reasoning
|
|
|
|
Many possible knowledge bases: |
|
Different dependency structures |
|
Different parameters |
|
|
|
Which knowledge base is “good”? |
|
Probabilistic performance metric: |
|
|
|
How do we find a good one? |
|
Many learning algorithms have been
developed |
TB Patients in San
Francisco
TB Patients in San
Francisco
A Web of Data
Standard Approach
What’s in a Link
What’s in an Anchor
Reasoning for Learning
Reasoning for Learning
|
|
|
|
Use reasoning to bootstrap learning |
|
Learn even when we never find out
everything |
|
Substantial improvement in learned
models |
|
|
|
“Lifelong” learning, from any data
obtained |
|
|
|
Learn as a natural byproduct of
reasoning |
|
|
|
Discover new concepts in data … |
Discovering Hidden
Concepts
Discovering Hidden
Concepts
Web of Influence, Yet
Again
Summary: Learning
Another Bridge
Roadmap
|
|
|
|
Building Intelligence: 2001 |
|
|
|
Representation |
|
|
|
Reasoning |
|
|
|
Learning |
|
|
|
Where Next |
|
|
Some Important Next Steps
|
|
|
|
Perception and understanding |
|
|
|
Planning and decision making |
|
Rich language for actions that
represents uncertainty about future |
|
Efficient algorithms for decision
making under uncertainty that exploit representation structure |
|
Techniques for learning action models
from interaction with real world |
Perception
Probabilistic Perception
Hypothetical Bridge
Hypothetical Bridge
Understanding Language
Resolving Ambiguity
|
|
|
|
Professors often meet with students |
|
Jane is probably a student |
|
Professors like to explain |
|
“She” is probably Prof. Sarah |
Learning Semantic Models
|
|
|
|
Statistical NLP reveals semantic
patterns in words: |
|
|
|
|
|
|
|
|
|
Semantics is implicit |
|
We want to go from words to concepts |
|
represent, learn, & use patterns at
semantic level |
A Tale of Three Bridges
|
|
|
Divide the problem into well-defined
pieces |
|
Make progress on each one |
|
Build bridges to create a unified whole |
A Tale of Three Bridges
|
|
|
Divide the problem into well-defined
pieces |
|
Make progress on each one |
|
Build bridges to create a unified whole |
Slide 78