June 9, 2010

Señor Jaime Oncins, there is an interesting history to that philosophical distinction related to alternative hypotheses, which I first became aware of when reading this paper by Clayton Scott and Rob Nowak.

As Lehmann describes, there was apparently a huge ongoing fracas between Fisher, whose view included p-values and all the stuff that science researchers worry about, and Neyman, whose view included type I errors (false alarms) and type II errors (missed detections) and all the stuff that radar engineers worry about.  (Apparently both of them hated Bayesian hypothesis testing.)  Here’s a passage from Lehmann: “Neyman did not believe in the need for a special inductive logic but felt that the usual processes of deductive thinking should suffice.  More specifically, he had no use for Fisher’s idea of likelihood.  In his discussion of Fisher’s 1935 paper (Neyman, 1935, p. 74, 75) he expressed the thought that it should be possible ‘to construct a theory of mathematical statistics … based solely upon the theory of probability,’ and went on to suggest that the basis for such a theory can be provided by ‘the conception of frequency of errors in judgment.’  This was the approach that he and Pearson had earlier described as ‘inductive behavior’; in the case of hypothesis testing, the behavior consisted of either rejecting the hypothesis or (provisionally) accepting it.”

I personally feel much more comfortable with the Neyman view and associated objects such as receiver operating characteristics (ROCs), and get an uneasiness in my stomach when t-tests and p-values are bandied about.

As you intimated right after mentioning the philosophical distinction, some work that I did recently with Ryan Prenger, Tracy Lemmond, Barry Chen, and Bill Hanley considers false alarms, missed detections, and ROCs.  Our paper entitled Class-Specific Error Bounds for Ensemble Classifiers, which will be presented in July at the KDD conference, develops loose but highly predictive generalization error bounds for false alarms and missed detections at all operating points for ensemble classifiers such as random forests.  These Prenger bounds provide guidelines for how to push up the ROC to reduce missed detections in the ultra-low missed detection regime when missed detections are really costly or to push out the ROC to reduce false alarms in the ultra-low false alarm regime when false alarms are really costly.

There is some p-value stuff going on in our KDD paper,  but it does not have a statistically significant effect on my stomach uneasiness.


One comment

  1. […] Ashvins The Ultimate Machinists « ROC Anthropometry June 24, 2010 I was saddened to hear of the recent death of Manute Bol.  As you […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: