Data Mining and Statistics in Medicine: An Application in Prostate Cancer Detection Vida S. Tigrani George H. John UCSF School of Medicine S245 Epiphany San Francisco, CA 94143-0454 2300 Geng Rd, Palo Alto, CA 94303 vtigran@itsa.ucsf.edu gjohn@epiphany.com Data mining is an umbrella term referring to the process of discovering patterns in data, typically with the aid of powerful algorithms to automate part of the search. These methods come from disciplines such as statistics, machine learning (artificial intelligence), pattern recognition, neural networks, and databases. Two data analysts with different heritages might approach a similar problem quite differently. In particular, this paper shows how the same problem, prostate cancer detection, is approached by an M.D. and a data mining analyst with a background in machine learning: hypothesis testing by the former, and bagged classification models by the latter. We then survey medical data analysis literature, describing the common statistical methods employed by physicians for clinical studies, and the advances in data mining and machine learning research that have been motivated by medical data analysis. Citation: Tigrani, Vida and John, George H. (1998) Data Mining and Statistics in Medicine: An Application in Prostate Cancer Detection. In JSM98, the Proceedings of the Joint Statistical Meetings, Section on Physical and Engineering Sciences. American Statistical Association.