"Representation, Reasoning, Learning"

Daphne Koller

Stanford University

Acknowledgements

My collaborators on some of this work:

Nir Friedman, Lise Getoor, Avi Pfeffer, Eran Segal, Ben Taskar

The many people whose work influenced mine

My friends and colleagues, and especially the DAGS group, for their help and support

ONR, DARPA, NSF, Sloan

Stuart Russell, Joe Halpern, IJCAI Trustees

My family for putting up with me while I prepared it

Roadmap

Building Intelligence: 2001

Representation

Reasoning

Learning

Where next

We’ve Come a Long Way

Scheduling & planning

Navigating in uncertain environment

Chess

Diagnosis & maintenance

Speech recognition

Parsing natural language

Recognizing faces

Many more …

This is 2001 …

… But Where is HAL?

Roadmap

Building Intelligence: 2001

Representation

Reasoning

Learning

Where Next

Representation: Design Decisions

Atomic Worlds

Possible worlds are indivisible “atoms”

e.g., states in a search algorithm

Algorithms must treat worlds as black boxes

e.g., access only via goal test, successor function

Enumeration is all you can do!

Propositional Worlds

More transparent representation:

World = assignment of values to attributes / truth values to propositional symbols

Knowledge base relates these propositions

Object-Relational Worlds

Worlds are relational interpretations:

Objects in the domain

Properties of these objects

Relations (links) between objects

Relational languages allow universals:

General patterns that apply to all objects

Representation: Design Decisions

Universal Truths

All universals are false

Smart students don’t always get A’s…

True universals are rarely useful

Smart students get either A, B, C, D, or F

Probable Worlds

Probabilistic semantics:

A set of possible worlds

Probable Worlds

Probabilistic semantics:

A set of possible worlds

Each world associated with a probability

Bayesian Networks

Propositional representation:

Avoid enumeration of worlds!

Represent distribution compactly

Bayesian networks use locality

Taxonomy of Methods

A Bridge

Semantics:

Probability distribution over possible worlds

Worlds = first-order interpretations

Representation (syntax):

Probabilistic relational models (PRMs)

St. Nordaf University

Relational Schema

Specifies types of objects in domain, attributes of each type of object, & types of links between objects

Possible Worlds

Representing the Distribution

Many possible worlds for a given university

All possible assignments of all attributes of all objects

Infinitely many potential universities

Each associated with a very different set of worlds

Probabilistic Relational Models

Universals: Probabilistic patterns hold for all objects in class

Locality: Represent direct probabilistic dependencies

Links give us potential interactions!

PRM Semantics

PRMs are First-Order

PRM can be applied to any university

Any set of students, courses, registrations, faculty, …

The Ties that Bind

Objects in the resulting model are not independent

Links between objects induce correlations between them

The result is a “web of influence” that spans many objects of different types

The Web of Influence

We’re Not in Kansas Anymore

Objects & links now first-class citizens in the model

Þ Can introduce probabilistic model over them

Different worlds can have different objects & links!

Example reasoning pattern:

CS221 is easy — everyone gets an A

No, it’s not easy — it’s a graduate class, and only smart students take it

Summary: Representation

Are We Done?

Also need to represent:

Time

Events

Actions

Space

Mental states of agents

…

There is much work on representing these concepts

Need to integrate these with probabilistic language

New questions are bound to arise…

Roadmap

Building Intelligence: 2001

Representation

Reasoning

Learning

Where Next

Reasoning: The Fight Continues

Reasoning: The Fight Continues

The Complexity Monster: First Blood [Kautz, 1989]

The Complexity Monster Strikes Back [Russell, 1995]

The Return of the Complexity Monster [Koller, 2001]

Expressive Languages

Corollary: Use less expressive representations

Problem:

“Poly(size)” is great, but not if size is exponential!

Expressive language can capture structure, which can be exploited for faster inference

Complexity in Probabilistic Models

Weapon I: Locality

Our language exploits locality:

All information is presented in terms of local influences

Locality also allows divide & conquer inference

In Bayesian networks

In constraint satisfaction

In logical reasoning

Divide & conquer inference extends to hierarchical structures, if only a few long-range interactions

Hierarchies

Luckily, much of the world is local & hierarchical

Departments in a university

Geographical proximity

Interactions between objects are often limited

Exploiting Hierarchical Structure

Weapon II: Universal Patterns

Problem: At inference time, many algorithms

convert objects to propositions

recreate the repetition we wanted to avoid

Robinson [1965]: Lifted theorem proving

Do proof using variables x

Only make distinctions about x that matter to proof

First-Order Probabilistic Inference

Question: Can we use lifting in probabilistic case?

Do probabilistic inference about “generic” objects x

Only make distinctions that affect relevant probabilities

Exploiting Universals

Weapon III: Approximation

Stochasticity is Your Friend

Question 1: What does this mean?

In deterministic setting, unclear

Probabilities provide formal framework:

Weak influence = small effect on probabilities

Question 2: Does it really help?

If the world is stochastic, influence decays

Over “distance” — long chains of influence

Over “time” — as system evolves

Over “quantity” — as many influences combine

Recently, many successful algorithms that exploit this idea

Summary: Inference

Roadmap

Building Intelligence: 2001

Representation

Reasoning

Learning

Where Next

Learning for Reasoning

Many possible knowledge bases:

Different dependency structures

Different parameters

Which knowledge base is “good”?

Probabilistic performance metric:

How do we find a good one?

Many learning algorithms have been developed

TB Patients in San Francisco

A Web of Data

Standard Approach

What’s in a Link

What’s in an Anchor

Reasoning for Learning

Reasoning for Learning

Use reasoning to bootstrap learning

Learn even when we never find out everything

Substantial improvement in learned models

“Lifelong” learning, from any data obtained

Learn as a natural byproduct of reasoning

Discover new concepts in data …

Discovering Hidden Concepts

Web of Influence, Yet Again

Summary: Learning

Another Bridge

Roadmap

Building Intelligence: 2001

Representation

Reasoning

Learning

Where Next

Some Important Next Steps

Perception and understanding

Planning and decision making

Rich language for actions that represents uncertainty about future

Efficient algorithms for decision making under uncertainty that exploit representation structure

Techniques for learning action models from interaction with real world

Perception

Probabilistic Perception

Hypothetical Bridge

Understanding Language

Resolving Ambiguity

Professors often meet with students

Jane is probably a student

Professors like to explain

“She” is probably Prof. Sarah

Learning Semantic Models

Statistical NLP reveals semantic patterns in words:

Semantics is implicit

We want to go from words to concepts

represent, learn, & use patterns at semantic level

A Tale of Three Bridges

Divide the problem into well-defined pieces

Make progress on each one

Build bridges to create a unified whole

Slide 78


	My collaborators on some of this work:
		Nir Friedman, Lise Getoor, Avi Pfeffer, Eran Segal, Ben Taskar
	The many people whose work influenced mine
	My friends and colleagues, and especially the DAGS group, for their help and support
	ONR, DARPA, NSF, Sloan
	Stuart Russell, Joe Halpern, IJCAI Trustees
	My family for putting up with me while I prepared it


	Building Intelligence: 2001

	Representation

	Reasoning

	Learning

	Where next


	Scheduling & planning
	Navigating in uncertain environment
	Chess
	Diagnosis & maintenance
	Speech recognition
	Parsing natural language
	Recognizing faces
	Many more …


	Possible worlds are indivisible “atoms”
		e.g., states in a search algorithm
	Algorithms must treat worlds as black boxes
		e.g., access only via goal test, successor function
	Enumeration is all you can do!


	More transparent representation:
		World = assignment of values to attributes / truth values to propositional symbols
	Knowledge base relates these propositions


	Worlds are relational interpretations:
		Objects in the domain
		Properties of these objects
		Relations (links) between objects
	Relational languages allow universals:
		General patterns that apply to all objects


	All universals are false
		Smart students don’t always get A’s…
	True universals are rarely useful
		Smart students get either A, B, C, D, or F


	Probabilistic semantics:
		A set of possible worlds
		Each world associated with a probability


	Propositional representation:
		Avoid enumeration of worlds!
		Represent distribution compactly
	Bayesian networks use locality


	Semantics:
		Probability distribution over possible worlds
		Worlds = first-order interpretations
	Representation (syntax):
		Probabilistic relational models (PRMs)


	Specifies types of objects in domain, attributes of each type of object, & types of links between objects


	Many possible worlds for a given university
		All possible assignments of all attributes of all objects

	Infinitely many potential universities
		Each associated with a very different set of worlds


	Universals: Probabilistic patterns hold for all objects in class
	Locality: Represent direct probabilistic dependencies
		Links give us potential interactions!


	PRM can be applied to any university
		Any set of students, courses, registrations, faculty, …


	Objects in the resulting model are not independent
		Links between objects induce correlations between them

	The result is a “web of influence” that spans many objects of different types


	Objects & links now first-class citizens in the model
	Þ Can introduce probabilistic model over them
	Different worlds can have different objects & links!
	Example reasoning pattern:
		CS221 is easy — everyone gets an A
		No, it’s not easy — it’s a graduate class, and only smart students take it


	Also need to represent:
		Time
		Events
		Actions
		Space
		Mental states of agents
		…
	There is much work on representing these concepts
	Need to integrate these with probabilistic language
	New questions are bound to arise…


	The Complexity Monster: First Blood [Kautz, 1989]

	The Complexity Monster Strikes Back [Russell, 1995]

	The Return of the Complexity Monster [Koller, 2001]


	Corollary: Use less expressive representations
	Problem:
	“Poly(size)” is great, but not if size is exponential!
	Expressive language can capture structure, which can be exploited for faster inference


	Our language exploits locality:
		All information is presented in terms of local influences
	Locality also allows divide & conquer inference
		In Bayesian networks
		In constraint satisfaction
		In logical reasoning


	Luckily, much of the world is local & hierarchical
		Departments in a university
		Geographical proximity
	Interactions between objects are often limited


	Problem: At inference time, many algorithms
		convert objects to propositions
		recreate the repetition we wanted to avoid

	Robinson [1965]: Lifted theorem proving
		Do proof using variables x
		Only make distinctions about x that matter to proof


	Question: Can we use lifting in probabilistic case?
		Do probabilistic inference about “generic” objects x
		Only make distinctions that affect relevant probabilities


Question 1: What does this mean?
	In deterministic setting, unclear
	Probabilities provide formal framework:
	Weak influence = small effect on probabilities

Question 2: Does it really help?
	If the world is stochastic, influence decays
		Over “distance” — long chains of influence
		Over “time” — as system evolves
		Over “quantity” — as many influences combine
	Recently, many successful algorithms that exploit this idea


	Many possible knowledge bases:
		Different dependency structures
		Different parameters

	Which knowledge base is “good”?
		Probabilistic performance metric:

	How do we find a good one?
		Many learning algorithms have been developed


	Use reasoning to bootstrap learning
		Learn even when we never find out everything
		Substantial improvement in learned models

	“Lifelong” learning, from any data obtained

	Learn as a natural byproduct of reasoning

	Discover new concepts in data …


	Perception and understanding

	Planning and decision making
		Rich language for actions that represents uncertainty about future
		Efficient algorithms for decision making under uncertainty that exploit representation structure
		Techniques for learning action models from interaction with real world


	Professors often meet with students
		Jane is probably a student
	Professors like to explain
		“She” is probably Prof. Sarah


	Statistical NLP reveals semantic patterns in words:




	Semantics is implicit
	We want to go from words to concepts
		represent, learn, & use patterns at semantic level


	Divide the problem into well-defined pieces
	Make progress on each one
	Build bridges to create a unified whole