Professor
department
extract
information
computer
science
machine
learning
…
Standard Classification
Categories:
faculty
course
project
student
other
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
words only
Naïve
Bayes
Page
...
Category
Word1
WordN
Let us consider the more limited task of simply recognizing which webpage corresponds to which type of entity.  The most standard approach is to classify the webpages into one of several categories: faculty, student, project, etc, using the words in the webpage as features, e.g., using the naïve Bayes model.