1
|
- Daphne Koller
- Stanford University
- Joint work with:
|
2
|
- The real world is composed of objects that have properties and are
related to each other
- Natural language is all about objects and how they relate to each other
- “George got an A in Geography 101”
|
3
|
|
4
|
|
5
|
- World = relational interpretation:
- Objects in the domain
- Properties of these objects
- Relations (links) between objects
|
6
|
- All universals are false
- Smart students get A’s in easy classes
- True universals are rarely useful
- Smart students get either A, B, C, D, or F
|
7
|
|
8
|
- Probabilistic semantics:
- A set of possible worlds
- Each world associated with a probability
|
9
|
|
10
|
- Bayesian Networks
- Representation & Semantics
- Reasoning
- Probabilistic Relational Models
- Collective Classification
- Undirected discriminative models
- Collective Classification Revisited
- PRMs for NLP
|
11
|
|
12
|
- Compact & natural representation:
- nodes have £ k parents Þ 2kn vs. 2n
params
- parameters natural and easy to elicit
|
13
|
|
14
|
|
15
|
- BN Inference is NP-hard
- Structure can use graph structure:
- Graph separation Þ
conditional independence
- Do separate inference in parts
- Results combined over interface.
|
16
|
- Belief propagation is an iterative message passing algorithm for
approximate inference in BNs
- Each iteration (until “convergence”):
- Nodes pass “beliefs” as messages to neighboring nodes
- Cons:
- Limited theoretical guarantees
- Might not converge
- Pros:
- Linear time per iteration
- Works very well in practice, even for dense networks
|
17
|
- Bayesian Networks
- Probabilistic Relational Models
- Language & Semantics
- Web of Influence
- Collective Classification
- Undirected discriminative models
- Collective Classification Revisited
- PRMs for NLP
|
18
|
- Bayesian nets use propositional representation
- Real world has objects, related to each other
|
19
|
- Combine advantages of relational logic & BNs:
- Natural domain modeling: objects, properties, relations
- Generalization over a variety of situations
- Compact, natural probability models
- Integrate uncertainty with relational model:
- Properties of domain entities can depend on properties of related
entities
- Uncertainty over relational structure of domain
|
20
|
|
21
|
- Specifies types of objects in domain, attributes of each type of object
& types of relations between objects
|
22
|
- Universals: Probabilistic patterns hold for all objects in class
- Locality: Represent direct probabilistic dependencies
- Links define potential interactions
|
23
|
|
24
|
|
25
|
|
26
|
- Bayesian Networks
- Probabilistic Relational Models
- Collective Classification & Clustering
- Learning models from data
- Collective classification of webpages
- Undirected discriminative models
- Collective Classification Revisited
- PRMs for NLP
|
27
|
|
28
|
- Parameter estimation:
- Probabilistic model with shared parameters
- Grades for all students share same model
- Can use standard techniques for max-likelihood or Bayesian parameter
estimation
- Structure learning:
- Define scoring function over structures
- Use combinatorial search to find high-scoring structure
|
29
|
|
30
|
- WebKB dataset
- Four CS department websites
- Bag of words on each page
- Links between pages
- Anchor text for links
- Experimental setup
- Trained on three universities
- Tested on fourth
- Repeated for all four combinations
|
31
|
|
32
|
|
33
|
|
34
|
|
35
|
|
36
|
|
37
|
|
38
|
- Bayesian Networks
- Probabilistic Relational Models
- Collective Classification & Clustering
- Undirected Discriminative Models
- Markov Networks
- Relational Markov Networks
- Collective Classification Revisited
- PRMs for NLP
|
39
|
- Acyclicity constraint limits expressive power:
- Two objects linked to by a student probably not both professors
- Allow arbitrary patterns over sets of objects & links
|
40
|
|
41
|
- Universals: Probabilistic patterns hold for all groups of objects
- Locality: Represent local probabilistic dependencies
- Sets of links give us possible interactions
|
42
|
- Instantiated RMN ¨ MN
- variables: attributes of all objects
- dependencies: determined by links & RMN
|
43
|
- Bayesian Networks
- Probabilistic Relational Models
- Collective Classification & Clustering
- Undirected Discriminative Models
- Collective Classification Revisited
- Discriminative training of RMNs
- Webpage classification
- Link prediction
- PRMs for NLP
|
44
|
- Parameter estimation is not closed form
- Convex problem Þ unique
global maximum
|
45
|
|
46
|
|
47
|
|
48
|
|
49
|
|
50
|
- WebKB data set size
- 1300 entities
- 180K attributes
- 5800 links
- Network size / school:
- Directed model
- 200,000 variables
- 360,000 edges
- Undirected model
- 40,000 variables
- 44,000 edges
- Difference in training time decreases substantially when
- some training data is unobserved
- want to model with hidden variables
|
51
|
- Even more interesting are the relationships between objects
- e.g., verbs are almost always relationships
|
52
|
|
53
|
|
54
|
|
55
|
|
56
|
|
57
|
|
58
|
|
59
|
- Four new department web sites:
- Berkeley, CMU, MIT, Stanford
- Labeled page type (8 types):
- faculty, student, research scientist, staff, research group, research
project, course, organization
- Labeled hyperlinks and virtual links (6 types):
- advisor, instructor, TA, member, project-of, NONE
- Data set size:
- 11K pages
- 110K links
- 2million words
|
60
|
- Error measured over links predicted to be present
- Link presence cutoff is at precision/recall break-even point (»30% for all models)
|
61
|
- PRMs inherit key advantages of probabilistic graphical models:
- Coherent probabilistic semantics
- Exploit structure of local interactions
- Relational models inherently more expressive
- “Web of influence”: use all available information to reach powerful
conclusions
- Exploit both relational information and power of probabilistic reasoning
|
62
|
- Bayesian Networks
- Probabilistic Relational Models
- Collective Classification & Clustering
- Undirected Discriminative Models
- Collective Classification Revisited
- PRMs for NLP
- Word-Sense Disambiguation
- Relation Extraction
- Natural Language Understanding (?)
|
63
|
- Neighboring words alone may not provide enough information to
disambiguate
- We can gain insight by considering compatibility between senses of
related words
|
64
|
- Objects: words in text
- Attributes: sense, gender, number, pos, …
- Links:
- Grammatical relations (subject-object, modifier,…)
- Close semantic relations (is-a, cause-of, …)
- Same word in different sentences (one-sense-per-discourse)
- Compatibility parameters:
- Learned from tagged data
- Based on prior knowledge (e.g., WordNet, FrameNet)
|
65
|
- Objects: words in text
- Attributes: sense, gender, number, pos, …
- Links:
- Grammatical relations (subject-object, modifier,…)
- Close semantic relations (is-a, cause-of, …)
- Same word in different sentences (one-sense-per-discourse)
- Compatibility parameters:
- Learned from tagged data
- Based on prior knowledge (e.g., WordNet, FrameNet)
|
66
|
|
67
|
|
68
|
- Professors often meet with students
- Jane is probably a student
- Professors like to explain
- “She” is probably Prof. Sarah
|
69
|
- Statistical NLP reveals patterns:
- Standard models learn patterns at word level
- But word-patterns are only implicit surrogates for underlying semantic
patterns
- “Teacher” objects tend to participate in certain relationships
- Can use this pattern for objects not explicitly labeled as a teacher
|
70
|
|
71
|
- Represent statistical patterns at semantic level
- What types of objects participate in what types of relationships
- Learn statistical models of semantics from text
- Reason using the models to obtain global semantic understanding of the text
|