Notes
Slide Show
Outline
1
Probabilistic Models
of Relational Data
  • Daphne Koller
  • Stanford University


  • Joint work with:
2
Why Relational?
  • The real world is composed of objects that have properties and are related to each other


  • Natural language is all about objects and how they relate to each other
    • “George got an A in Geography 101”
3
Attribute-Based Worlds
4
Attribute-Based Worlds
5
Object-Relational Worlds
  • World = relational interpretation:
    • Objects in the domain
    • Properties of these objects
    • Relations (links) between objects
6
Why Probabilities?
  • All universals are false
    • Smart students get A’s in easy classes
  • True universals are rarely useful
    • Smart students get either A, B, C, D, or F
7
Probable Worlds
  • Probabilistic semantics:
    • A set of possible worlds
8
Probable Worlds
  • Probabilistic semantics:
    • A set of possible worlds
    • Each world associated with a probability
9
Representation: Design Axes
10
Outline
  • Bayesian Networks
    • Representation & Semantics
    • Reasoning
  • Probabilistic Relational Models
  • Collective Classification
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP
11
Bayesian Networks
12
BN semantics
  • Compact & natural representation:
    • nodes have £ k parents  Þ 2kn  vs. 2n  params
    • parameters natural and easy to elicit
13
Reasoning using BNs
14
Reasoning using BNs
15
BN Inference
  • BN Inference is NP-hard
  • Structure can use graph structure:
    • Graph separation Þ conditional independence
    • Do separate inference in parts
    • Results combined over interface.
16
Approximate BN Inference
  • Belief propagation is an iterative message passing algorithm for approximate inference in BNs
  • Each iteration (until “convergence”):
    • Nodes pass “beliefs” as messages to neighboring nodes
  • Cons:
    • Limited theoretical guarantees
    • Might not converge
  • Pros:
    • Linear time per iteration
    • Works very well in practice, even for dense networks


17
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
    • Language & Semantics
    • Web of Influence
  • Collective Classification
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP
18
Bayesian Networks: Problem
  • Bayesian nets use propositional representation
  • Real world has objects, related to each other
19
Probabilistic Relational Models
  • Combine advantages of relational logic & BNs:
    • Natural domain modeling: objects, properties, relations
    • Generalization over a variety of situations
    • Compact, natural probability models


  • Integrate uncertainty with relational model:
    • Properties of domain entities can depend on properties of related entities
    • Uncertainty over relational structure of domain
20
St. Nordaf University
21
Relational Schema
  • Specifies types of objects in domain, attributes of each type of object & types of relations between objects


22
Probabilistic Relational Models
  • Universals: Probabilistic patterns hold for all objects in class
  • Locality: Represent direct probabilistic dependencies
    • Links define potential interactions
23
PRM Semantics
24
The Web of Influence
25
The Web of Influence
26
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification & Clustering
    • Learning models from data
    • Collective classification of webpages
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP
27
Learning PRMs
28
Learning PRMs
  • Parameter estimation:
    • Probabilistic model with shared parameters
      • Grades for all students share same model
    • Can use standard techniques for max-likelihood or Bayesian parameter estimation





  • Structure learning:
    • Define scoring function over structures
    • Use combinatorial search to find high-scoring structure
29
A Web of Data
30
Web Classification Experiments
  • WebKB dataset
    • Four CS department websites
    • Bag of words on each page
    • Links between pages
    • Anchor text for links
  • Experimental setup
    • Trained on three universities
    • Tested on fourth
    • Repeated for all four combinations
31
Standard Classification
32
Exploiting Links
33
Collective Classification
34
Learning w. Missing Data: EM
35
Discovering Hidden Types
36
Discovering Hidden Types
37
Discovering Hidden Types
38
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification & Clustering
  • Undirected Discriminative Models
    • Markov Networks
    • Relational Markov Networks
  • Collective Classification Revisited
  • PRMs for NLP
39
Directed Models: Limitations
  • Acyclicity constraint limits expressive power:
    • Two objects linked to by a student probably not both professors


  • Allow arbitrary patterns over sets of objects & links
40
Markov Networks
41
Relational Markov Networks
  • Universals: Probabilistic patterns hold for all groups of objects
  • Locality: Represent local probabilistic dependencies
    • Sets of links give us possible interactions
42
RMN Semantics
  • Instantiated RMN ¨ MN
  • variables: attributes of all objects
  • dependencies: determined by links & RMN
43
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification & Clustering
  • Undirected Discriminative Models
  • Collective Classification Revisited
    • Discriminative training of RMNs
    • Webpage classification
    • Link prediction
  • PRMs for NLP
44
Learning RMNs
  • Parameter estimation is not closed form
    • Convex problem Þ unique global maximum
45
Flat Models
46
Exploiting Links
47
More Complex Structure
48
More Complex Structure
49
Collective Classification: Results
50
Scalability
  • WebKB data set size
    • 1300 entities
    • 180K attributes
    • 5800 links
  • Network size / school:
    • Directed model
      • 200,000 variables
      • 360,000 edges
    • Undirected model
      • 40,000 variables
      • 44,000 edges
  • Difference in training time decreases substantially when
    • some training data is unobserved
    • want to model with hidden variables
51
Predicting Relationships
  • Even more interesting are the relationships between objects
    • e.g., verbs are almost always relationships
52
Flat Model
53
Flat Model
54
Collective Classification: Links
55
Link Model
56
Triad Model
57
Triad Model
58
Triad Model
59
WebKB++
  • Four new department web sites:
    • Berkeley, CMU, MIT, Stanford
  • Labeled page type (8 types):
    • faculty, student, research scientist, staff, research group, research project, course, organization
  • Labeled hyperlinks and virtual links (6 types):
    • advisor, instructor, TA, member, project-of, NONE
  • Data set size:
    • 11K pages
    • 110K links
    • 2million words
60
Link Prediction: Results
  • Error measured over links predicted to be present
  • Link presence cutoff is at precision/recall break-even point (»30% for all models)
61
Summary
  • PRMs inherit key advantages of probabilistic graphical models:
    • Coherent probabilistic semantics
    • Exploit structure of local interactions
  • Relational models inherently more expressive
  • “Web of influence”: use all available information to reach powerful conclusions
  • Exploit both relational information and power of probabilistic reasoning
62
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification & Clustering
  • Undirected Discriminative Models
  • Collective Classification Revisited
  • PRMs for NLP
    • Word-Sense Disambiguation
    • Relation Extraction
    • Natural Language Understanding (?)
63
Word Sense Disambiguation
  • Neighboring words alone may not provide enough information to disambiguate
  • We can gain insight by considering compatibility between senses of related words
64
Collective Disambiguation
  • Objects: words in text
  • Attributes: sense, gender, number, pos, …
  • Links:
    • Grammatical relations (subject-object, modifier,…)
    • Close semantic relations (is-a, cause-of, …)
    • Same word in different sentences (one-sense-per-discourse)
  • Compatibility parameters:
    • Learned from tagged data
    • Based on prior knowledge (e.g., WordNet, FrameNet)
65
Collective Disambiguation
  • Objects: words in text
  • Attributes: sense, gender, number, pos, …
  • Links:
    • Grammatical relations (subject-object, modifier,…)
    • Close semantic relations (is-a, cause-of, …)
    • Same word in different sentences (one-sense-per-discourse)
  • Compatibility parameters:
    • Learned from tagged data
    • Based on prior knowledge (e.g., WordNet, FrameNet)
66
Relation Extraction
67
Understanding Language
68
Resolving Ambiguity
  • Professors often meet with students
    • Jane is probably a student
  • Professors like to explain
    • “She” is probably Prof. Sarah
69
Acquiring Semantic Models
  • Statistical NLP reveals patterns:







  • Standard models learn patterns at word level
  • But word-patterns are only implicit surrogates for underlying semantic patterns
    • “Teacher” objects tend to participate in certain relationships
    • Can use this pattern for objects not explicitly labeled as a teacher
70
Complementary Approaches
71
Statistics: from Words to Semantics
  • Represent statistical patterns at semantic level
    • What types of objects participate in what types of relationships

  • Learn statistical models of semantics from text


  • Reason using the models to obtain global semantic understanding of the text