Estimating Continuous Distributions in Bayesian Classifiers

	      George H. John                 Pat Langley
	   Computer Science Dept.         Robotics Laboratory
			 Stanford University
			  Stanford, CA 94305
		   {gjohn,langley}@cs.stanford.edu
	   http://robotics.stanford.edu/{~gjohn,~langley}/

  When modeling a probability distribution with a Bayesian network, we
  are faced with the problem of how to handle continuous variables.
  Most previous work has either solved the problem by discretizing, or
  assumed that the data are generated by a single Gaussian.  In this
  paper we abandon the normality assumption and instead use statistical
  methods for nonparametric density estimation.  For a naive Bayesian
  classifier, we present experimental results on a variety of natural
  and artificial domains, comparing two methods of density estimation:
  assuming normality and modeling each conditional distribution with a
  single Gaussian; and using nonparametric kernel density estimation.
  We observe large reductions in error on several natural and artificial
  data sets, which suggests that kernel estimation is a useful tool for
  learning Bayesian models.
  
Citation: 
  
John, George H. and Langley, Pat (1995) Estimating Continuous
Distributions in Bayesian Classifiers.  In P. Besnard and
S. Hanks (Eds.), Proceedings of the Eleventh Conference on
Uncertainty in Artificial Intelligence.  Morgan Kaufmann
Publishers, 1995.