Learning Bayesian Network Structure from Massive Datasets: The ``Sparse
N. Friedman, D. Pe'er, and I. Nachman
To appear, Proc. Fifteenth Conf. on Uncertainty in Artificial Intelligence
Postscript version (240K)
Learning Bayesian networks is often cast as an optimization problem, where
the computational task is to find a structure that maximizes a statistically
motivated score. By and large, existing learning tools address this
optimization problem using standard heuristic search techniques.
Since the search space is extremely large, such search procedures can spend
most of the time examining candidates that are extremely unreasonable.
This problem becomes critical when we deal with data sets that are large
either in the number of instances, or the number of attributes.
In this paper, we introduce an algorithm that achieves faster learning
by restricting the search space. This iterative algorithm restricts
the parents of each variable to belong to a small subset of candidates.
We then search for a network that satisfies these constraints. The learned
network is then used for selecting better candidates for the next iteration.
We evaluate this algorithm both on synthetic and real-life data. Our results
show that it is significantly faster than alternative search procedures
without loss of quality in the learned structures.
Back to Nir's publications page