ROC Analysis for Evaluation of Machine Learning Algorithms Larry - - PowerPoint PPT Presentation
ROC Analysis for Evaluation of Machine Learning Algorithms Larry - - PowerPoint PPT Presentation
ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical Engineering and Computer Science Washington State University References Provost et al., The Case Against Accuracy Estimation for Comparing
References
Provost et al., “The Case Against
Accuracy Estimation for Comparing Induction Algorithms,” International Conference on Machine Learning, 1998.
Rob Holte’s talk on ROC analysis at
www.cs.ualberta.ca/~ holte/Learning/ROCtalk/
Motivation
Most comparisons of machine learning
algorithms use classification accuracy
Problems with this approach
May be different costs associated with false
positive and false negative errors
Training data may not reflect true class
distribution
Motivation
Perhaps maximizing accuracy is still okay
Alter class distribution to reduce FP/FN costs
Problems
Only works on 2-class case Assigning true costs is difficult Unsure of true class distribution
So, must show classifier L1 better than L2
under more general conditions
ROC Analysis
Receiver Operating Characteristic (ROC)
Originated from signal detection theory Common in medical diagnosis Becoming common in ML evaluations
ROC curves assess predictive behavior
independent of error costs or class distributions
Confusion Matrix
True Positive rate TP = # TP/# P False Positive rate FP = # FP/# N Rates independent of class distribution
Classified As True Class Positive Negative Positive # TP # FN Negative # FP # TN
ROC Curves
ROC space
False positive (FP) rate on X axis True positive (TP) rate on Y axis
Each classifier represented by a point in ROC
space corresponding to its (FP,TP) pair
For continuous-output models, classifiers
defined based on varying thresholds on
- utput
Example ROC Curve
True positive rate False positive rate 0.25 0.5 0.75 1.0 1.0 0.75 0.5 0.25 Learner L1 Learner L2 Learner L3 Random
Domination in ROC Space
Learner L1 dominates L2 if L2’s ROC
curve is beneath L1’s curve
If L1 dominates L2, then L1 better than
L2 for all possible costs and class distributions
If neither dominates (L2 and L3), then
there are times when L2 maximizes accuracy, but does not minimize cost
Expected ROC Curve
Perform k-fold cross-validation on each
learner
ROC curve from each fold i treated as a
function Ri such that TP = Ri(FP)
R(FP) = mean (Ri(FP)) Generate ROC curve by evenly sampling R
along FP axis
Compute confidence intervals according to
binomial distribution over resulting TP values
^ ^
Accuracy vs. ROC Curves
Hypothesis
Standard learning algorithms produce dominating ROC
models
Answer: No
Results on 10 datasets from UCI repository show only one
instance of a dominating model
Thus, learners maximizing accuracy typically do not
dominate in ROC space
Thus, worse than others for some costs and class
distributions
Non-dominating ROC curves can still provide regions
- f superiority for different learners
Summary
Results comparing accuracy of learning
algorithms are questionable
Especially in scenarios with non-uniform
costs and class distributions
ROC curves provide a better look at
where different learners minimize cost
Recommends proper ROC analysis for