1
|
- Daphne Koller
- Stanford University
|
2
|
- My collaborators on some of this work:
- Nir Friedman, Lise Getoor, Avi Pfeffer, Eran Segal, Ben Taskar
- The many people whose work influenced mine
- My friends and colleagues, and especially the DAGS group, for their help
and support
- ONR, DARPA, NSF, Sloan
- Stuart Russell, Joe Halpern, IJCAI Trustees
- My family for putting up with me while I prepared it
|
3
|
- Building Intelligence: 2001
- Representation
- Reasoning
- Learning
- Where next
|
4
|
- Scheduling & planning
- Navigating in uncertain environment
- Chess
- Diagnosis & maintenance
- Speech recognition
- Parsing natural language
- Recognizing faces
- Many more …
|
5
|
|
6
|
|
7
|
- Building Intelligence: 2001
- Representation
- Reasoning
- Learning
- Where Next
|
8
|
|
9
|
- Possible worlds are indivisible “atoms”
- e.g., states in a search algorithm
- Algorithms must treat worlds as black boxes
- e.g., access only via goal test, successor function
- Enumeration is all you can do!
|
10
|
- More transparent representation:
- World = assignment of values to attributes / truth values to
propositional symbols
- Knowledge base relates these propositions
|
11
|
- Worlds are relational interpretations:
- Objects in the domain
- Properties of these objects
- Relations (links) between objects
- Relational languages allow universals:
- General patterns that apply to all objects
|
12
|
|
13
|
- All universals are false
- Smart students don’t always get A’s…
- True universals are rarely useful
- Smart students get either A, B, C, D, or F
|
14
|
|
15
|
- Probabilistic semantics:
- A set of possible worlds
- Each world associated with a probability
|
16
|
- Probabilistic semantics:
- A set of possible worlds
- Each world associated with a probability
|
17
|
- Propositional representation:
- Avoid enumeration of worlds!
- Represent distribution compactly
- Bayesian networks use locality
|
18
|
- Propositional representation:
- Avoid enumeration of worlds!
- Represent distribution compactly
- Bayesian networks use locality
|
19
|
|
20
|
- Semantics:
- Probability distribution over possible worlds
- Worlds = first-order interpretations
- Representation (syntax):
- Probabilistic relational models (PRMs)
|
21
|
|
22
|
- Specifies types of objects in domain, attributes of each type of object,
& types of links between objects
|
23
|
|
24
|
- Many possible worlds for a given university
- All possible assignments of all attributes of all objects
- Infinitely many potential universities
- Each associated with a very different set of worlds
|
25
|
- Universals: Probabilistic patterns hold for all objects in class
- Locality: Represent direct probabilistic dependencies
- Links give us potential interactions!
|
26
|
|
27
|
- PRM can be applied to any university
- Any set of students, courses, registrations, faculty, …
|
28
|
- Objects in the resulting model are not independent
- Links between objects induce correlations between them
- The result is a “web of influence” that spans many objects of different
types
|
29
|
|
30
|
|
31
|
|
32
|
- Objects & links now first-class citizens in the model
- Þ Can introduce probabilistic model over them
- Different worlds can have different objects & links!
- Example reasoning pattern:
- CS221 is easy — everyone gets an A
- No, it’s not easy — it’s a graduate class, and only smart students take
it
|
33
|
|
34
|
- Also need to represent:
- Time
- Events
- Actions
- Space
- Mental states of agents
- …
- There is much work on representing these concepts
- Need to integrate these with probabilistic language
- New questions are bound to arise…
|
35
|
- Building Intelligence: 2001
- Representation
- Reasoning
- Learning
- Where Next
|
36
|
|
37
|
- The Complexity Monster: First Blood [Kautz, 1989]
- The Complexity Monster Strikes Back [Russell, 1995]
- The Return of the Complexity Monster [Koller, 2001]
|
38
|
- Corollary: Use less expressive representations
- Problem:
- “Poly(size)” is great, but
not if size is exponential!
- Expressive language can capture structure, which can be exploited for
faster inference
|
39
|
|
40
|
- Our language exploits locality:
- All information is presented in terms of local influences
- Locality also allows divide & conquer inference
- In Bayesian networks
- In constraint satisfaction
- In logical reasoning
|
41
|
- Our language exploits locality:
- All information is presented in terms of local influences
- Locality also allows divide & conquer inference
- In Bayesian networks
- In constraint satisfaction
- In logical reasoning
|
42
|
- Our language exploits locality:
- All information is presented in terms of local influences
- Locality also allows divide & conquer inference
- In Bayesian networks
- In constraint satisfaction
- In logical reasoning
- Divide & conquer inference extends to hierarchical structures, if
only a few long-range interactions
|
43
|
- Luckily, much of the world is local & hierarchical
- Departments in a university
- Geographical proximity
- Interactions between objects are often limited
|
44
|
|
45
|
- Problem: At inference time, many algorithms
- convert objects to propositions
- recreate the repetition we wanted to avoid
- Robinson [1965]: Lifted theorem
proving
- Do proof using variables x
- Only make distinctions about x
that matter to proof
|
46
|
- Question: Can we use lifting in probabilistic case?
- Do probabilistic inference about “generic” objects x
- Only make distinctions that affect relevant probabilities
|
47
|
- Question: Can we use lifting in probabilistic case?
- Do probabilistic inference about “generic” objects x
- Only make distinctions that affect relevant probabilities
|
48
|
|
49
|
|
50
|
- Question 1: What does this mean?
- In deterministic setting, unclear
- Probabilities provide formal framework:
- Weak influence = small
effect on probabilities
- Question 2: Does it really help?
- If the world is stochastic, influence decays
- Over “distance” — long chains of influence
- Over “time” — as system evolves
- Over “quantity” — as many influences combine
- Recently, many successful algorithms that exploit this idea
|
51
|
|
52
|
- Building Intelligence: 2001
- Representation
- Reasoning
- Learning
- Where Next
|
53
|
- Many possible knowledge bases:
- Different dependency structures
- Different parameters
- Which knowledge base is “good”?
- Probabilistic performance metric:
- How do we find a good one?
- Many learning algorithms have been developed
|
54
|
|
55
|
|
56
|
|
57
|
|
58
|
|
59
|
|
60
|
|
61
|
- Use reasoning to bootstrap learning
- Learn even when we never find out everything
- Substantial improvement in learned models
- “Lifelong” learning, from any data obtained
- Learn as a natural byproduct of reasoning
- Discover new concepts in data …
|
62
|
|
63
|
|
64
|
|
65
|
|
66
|
|
67
|
- Building Intelligence: 2001
- Representation
- Reasoning
- Learning
- Where Next
|
68
|
- Perception and understanding
- Planning and decision making
- Rich language for actions that represents uncertainty about future
- Efficient algorithms for decision making under uncertainty that exploit
representation structure
- Techniques for learning action models from interaction with real world
|
69
|
|
70
|
|
71
|
|
72
|
|
73
|
|
74
|
- Professors often meet with students
- Jane is probably a student
- Professors like to explain
- “She” is probably Prof. Sarah
|
75
|
- Statistical NLP reveals semantic patterns in words:
- Semantics is implicit
- We want to go from words to concepts
- represent, learn, & use patterns at semantic level
|
76
|
- Divide the problem into well-defined pieces
- Make progress on each one
- Build bridges to create a unified whole
|
77
|
- Divide the problem into well-defined pieces
- Make progress on each one
- Build bridges to create a unified whole
|
78
|
|