||Of course, data is not
always so nicely arranged for us as in a relational database. Let us consider the biggest source of data
--- the world wide web. Consider the
webpages in a computer science department.
Here is one webpage, which links to another. This second webpage links to a third, which
links back to the first two. There is
also a webpage with a lot of outgoing links to webpages on this site. This is not nice clean data. Nobody labels these webpages for us, and
tells us what they are. We would like
to learn to understand this data, and conclude from it that we have a
“Professor Tom Mitchell” one of whose interests is a project called “WebKB”. “Sean Slattery” is one of the students on
the project, and Professor Mitchell is his advisor. Finally, Tom Mitchell is a member of the CS
CMU faculty, which contains many other faculty members. How do we get from the raw data to this
type of analysis?