A more interesting approach is based on the realization that this domain is relational.  It has objects that are linked to each other.  And the links are very meaningful in and of themselves.  For example, a student webpage is very likely to point to his advisor’s page.  In the Kansas slide, I showed that by making links first-class citizens in the model, we can introduce a probabilistic model over them.  Indeed, we can represent precisely this dependency, by asserting that the existence of a link depends on the category of the two pages pointing to it. This allows us to use, for example, a webpage that we are fairly sure is a student page to give evidence about the fact that a page it points to is a faculty webpage.  In fact, they can both give evidence about each other, giving rise to a form of “collective classification”.

Yet another place where links are useful is if we explicitly model the notion of a directory page.  If a page is a faculty directory, it is highly likely to point to faculty webpages.  Thus, we can use evidence about a faculty webpage that we are fairly certain about to infer that a page pointing to it is probably a faculty directory, and based on that increase our belief that other pages that this page points to are also faculty pages.

This is just the web of influence applied to this domain!