On the Sample Complexity of Learning Bayesian Networks
N. Friedman and Z. Yakhini
In Proc. Twelfth Conf. on
Uncertainty in Artificial Intelligence (UAI 96).
Postscript version
(184K)
PDF version.
Abstract
In recent years there has been an increasing interest in learning
Bayesian networks from data. One of the most effective methods
for
learning such networks is based on the minimum description
length (MDL) principle. Previous work has shown that this learning
procedure is asymptotically successful: with probability one,
it will converge to the target distribution, given a sufficient number of
samples. However, the rate of this convergence has been hitherto unknown.
In this work we examine the sample complexity of MDL based learning
procedures for Bayesian networks. We show that the number of samples needed to
learn an $\epsilon$-close approximation (in terms of entropy distance)
with confidence $\delta$ is
$O\left((\frac{1}{\epsilon})^{\frac{4}{3}}\log\frac{1}{\epsilon}
\log\frac{1}{\delta}\log\log\frac{1}{\delta}\right)$.
This means that the
sample complexity is a low-order polynomial in the error threshold and
sub-linear in the confidence bound.
We also discuss how the constants in this term depend on the
complexity of the target distribution. Finally, we address questions
of asymptotic minimality and propose a method for using the sample
complexity results to speed up the learning process.
Back to Nir's
publications page
nir@cs.huji.ac.il