Automatic Parameter Selection by Minimizing Estimated Error Ron Kohavi and George H. John Computer Science Department Stanford University Stanford, CA 94305 {ronnyk,gjohn}@cs.Stanford.EDU http://robotics.stanford.edu/~{ronnyk,gjohn} We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a ``wrapper'' method, considering determination of the best parameters as a discrete function optimization problem. The method uses best-first search and cross-validation to wrap around the basic induction algorithm: the search explores the space of parameter values, running the basic algorithm many times on training and holdout sets produced by cross-validation to get an estimate of the expected error of each parameter setting. Thus, the final selected parameter settings are tuned for the specific induction algorithm and dataset being studied. We report experiments with this method on 33 datasets selected from the UCI and StatLog collections using C4.5 as the basic induction algorithm. At a 90% confidence level, our method improves the performance of C4.5 on nine domains, degrades performance on one, and is statistically indistinguishable from C4.5 on the rest. On the sample of datasets used for comparison, our method yields an average 13% relative decrease in error rate. We expect to see similar performance improvements when using our method with other machine learning algorithms. Citation: Kohavi, R. and John, G. H. (1995), Automatic Parameter Selection by Minimizing Estimated Error, in Prieditis & Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, San Francisco, CA.