1

 Daphne Koller
 Stanford University
 Joint work with:

2

 The real world is composed of objects that have properties and are
related to each other
 Natural language is all about objects and how they relate to each other
 “George got an A in Geography 101”

3


4


5

 World = relational interpretation:
 Objects in the domain
 Properties of these objects
 Relations (links) between objects

6

 All universals are false
 Smart students get A’s in easy classes
 True universals are rarely useful
 Smart students get either A, B, C, D, or F

7


8

 Probabilistic semantics:
 A set of possible worlds
 Each world associated with a probability

9


10

 Bayesian Networks
 Representation & Semantics
 Reasoning
 Probabilistic Relational Models
 Collective Classification
 Undirected discriminative models
 Collective Classification Revisited
 PRMs for NLP

11


12

 Compact & natural representation:
 nodes have £ k parents Þ 2^{k}n vs. 2^{n}
params
 parameters natural and easy to elicit

13


14


15

 BN Inference is NPhard
 Structure can use graph structure:
 Graph separation Þ
conditional independence
 Do separate inference in parts
 Results combined over interface.

16

 Belief propagation is an iterative message passing algorithm for
approximate inference in BNs
 Each iteration (until “convergence”):
 Nodes pass “beliefs” as messages to neighboring nodes
 Cons:
 Limited theoretical guarantees
 Might not converge
 Pros:
 Linear time per iteration
 Works very well in practice, even for dense networks

17

 Bayesian Networks
 Probabilistic Relational Models
 Language & Semantics
 Web of Influence
 Collective Classification
 Undirected discriminative models
 Collective Classification Revisited
 PRMs for NLP

18

 Bayesian nets use propositional representation
 Real world has objects, related to each other

19

 Combine advantages of relational logic & BNs:
 Natural domain modeling: objects, properties, relations
 Generalization over a variety of situations
 Compact, natural probability models
 Integrate uncertainty with relational model:
 Properties of domain entities can depend on properties of related
entities
 Uncertainty over relational structure of domain

20


21

 Specifies types of objects in domain, attributes of each type of object
& types of relations between objects

22

 Universals: Probabilistic patterns hold for all objects in class
 Locality: Represent direct probabilistic dependencies
 Links define potential interactions

23


24


25


26

 Bayesian Networks
 Probabilistic Relational Models
 Collective Classification & Clustering
 Learning models from data
 Collective classification of webpages
 Undirected discriminative models
 Collective Classification Revisited
 PRMs for NLP

27


28

 Parameter estimation:
 Probabilistic model with shared parameters
 Grades for all students share same model
 Can use standard techniques for maxlikelihood or Bayesian parameter
estimation
 Structure learning:
 Define scoring function over structures
 Use combinatorial search to find highscoring structure

29


30

 WebKB dataset
 Four CS department websites
 Bag of words on each page
 Links between pages
 Anchor text for links
 Experimental setup
 Trained on three universities
 Tested on fourth
 Repeated for all four combinations

31


32


33


34


35


36


37


38

 Bayesian Networks
 Probabilistic Relational Models
 Collective Classification & Clustering
 Undirected Discriminative Models
 Markov Networks
 Relational Markov Networks
 Collective Classification Revisited
 PRMs for NLP

39

 Acyclicity constraint limits expressive power:
 Two objects linked to by a student probably not both professors
 Allow arbitrary patterns over sets of objects & links

40


41

 Universals: Probabilistic patterns hold for all groups of objects
 Locality: Represent local probabilistic dependencies
 Sets of links give us possible interactions

42

 Instantiated RMN ¨ MN
 variables: attributes of all objects
 dependencies: determined by links & RMN

43

 Bayesian Networks
 Probabilistic Relational Models
 Collective Classification & Clustering
 Undirected Discriminative Models
 Collective Classification Revisited
 Discriminative training of RMNs
 Webpage classification
 Link prediction
 PRMs for NLP

44

 Parameter estimation is not closed form
 Convex problem Þ unique
global maximum

45


46


47


48


49


50

 WebKB data set size
 1300 entities
 180K attributes
 5800 links
 Network size / school:
 Directed model
 200,000 variables
 360,000 edges
 Undirected model
 40,000 variables
 44,000 edges
 Difference in training time decreases substantially when
 some training data is unobserved
 want to model with hidden variables

51

 Even more interesting are the relationships between objects
 e.g., verbs are almost always relationships

52


53


54


55


56


57


58


59

 Four new department web sites:
 Berkeley, CMU, MIT, Stanford
 Labeled page type (8 types):
 faculty, student, research scientist, staff, research group, research
project, course, organization
 Labeled hyperlinks and virtual links (6 types):
 advisor, instructor, TA, member, projectof, NONE
 Data set size:
 11K pages
 110K links
 2million words

60

 Error measured over links predicted to be present
 Link presence cutoff is at precision/recall breakeven point (»30% for all models)

61

 PRMs inherit key advantages of probabilistic graphical models:
 Coherent probabilistic semantics
 Exploit structure of local interactions
 Relational models inherently more expressive
 “Web of influence”: use all available information to reach powerful
conclusions
 Exploit both relational information and power of probabilistic reasoning

62

 Bayesian Networks
 Probabilistic Relational Models
 Collective Classification & Clustering
 Undirected Discriminative Models
 Collective Classification Revisited
 PRMs for NLP
 WordSense Disambiguation
 Relation Extraction
 Natural Language Understanding (?)

63

 Neighboring words alone may not provide enough information to
disambiguate
 We can gain insight by considering compatibility between senses of
related words

64

 Objects: words in text
 Attributes: sense, gender, number, pos, …
 Links:
 Grammatical relations (subjectobject, modifier,…)
 Close semantic relations (isa, causeof, …)
 Same word in different sentences (onesenseperdiscourse)
 Compatibility parameters:
 Learned from tagged data
 Based on prior knowledge (e.g., WordNet, FrameNet)

65

 Objects: words in text
 Attributes: sense, gender, number, pos, …
 Links:
 Grammatical relations (subjectobject, modifier,…)
 Close semantic relations (isa, causeof, …)
 Same word in different sentences (onesenseperdiscourse)
 Compatibility parameters:
 Learned from tagged data
 Based on prior knowledge (e.g., WordNet, FrameNet)

66


67


68

 Professors often meet with students
 Jane is probably a student
 Professors like to explain
 “She” is probably Prof. Sarah

69

 Statistical NLP reveals patterns:
 Standard models learn patterns at word level
 But wordpatterns are only implicit surrogates for underlying semantic
patterns
 “Teacher” objects tend to participate in certain relationships
 Can use this pattern for objects not explicitly labeled as a teacher

70


71

 Represent statistical patterns at semantic level
 What types of objects participate in what types of relationships
 Learn statistical models of semantics from text
 Reason using the models to obtain global semantic understanding of the text
