Probabilistic Supervised Learning and Clustering in Relational Data (2001)by B. Taskar, E. Segal, and D. Koller
Supervised and unsupervised learning methods have traditionally focused on data consisting of independent instances of a single type. However, many real-world domains are best described by relational models in which instances of multiple types are related to each other in complex ways. For example, in a scientific paper domain, papers are related to each other via citation, and are also related to their authors. In this case, the label of one entity (e.g., the topic of the paper) is often correlated with the labels of related entities. We propose a general class of models for classification and clustering in relational domains that capture probabilistic dependencies between related instances. We show how to learn such models efficiently from data. We present empirical results on two real world data sets. Our experiments in a transductive classification setting indicate that accuracy can be significantly improved by modeling relational dependencies. Our algorithm automatically induces a very natural behavior, where our knowledge about one instance helps us classify related ones, which in turn help us classify others. In an unsupervised setting, our models produced coherent clusters with a very natural interpretation, even for instance types that do not have any attributes.
B. Taskar, E. Segal, and D. Koller (2001). "Probabilistic Supervised Learning and Clustering in Relational Data." Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 870-876).
author = "B. Taskar and E. Segal and D. Koller",
booktitle = "Proceedings of the Seventeenth International Joint
Conference on Artificial Intelligence (IJCAI)",
title = "Probabilistic Supervised Learning and Clustering in
publisher = "Morgan Kaufman",
address = "Seattle, Washington",
pages = "870--876",
year = "2001",