Let us consider the more limited task of simply recognizing which webpage corresponds to which type of entity.  The most standard approach is to classify the webpages into one of several categories: faculty, student, project, etc, using the words in the webpage as features, e.g., using the naïve Bayes model.