TY - JOUR
T1 - A CROC stronger than ROC
T2 - Measuring, visualizing and optimizing early retrieval
AU - Swamidass, S. Joshua
AU - Azencott, Chloé Agathe
AU - Daily, Kenny
AU - Baldi, Pierre
N1 - Funding Information:
Funding: Laurel Wilkening Faculty Innovation award; an NIH Biomedical Informatics Training grant (LM-07443-01); NSF grants EIA-0321390, CCF-0725370, and IIS-0513376 (to P.B.); IBM PhD Fellowship (to C.A.); Physician Scientist Training Program of the Washington University Pathology Department (to S.J.S.).
PY - 2010/4/7
Y1 - 2010/4/7
N2 - Motivation: The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem. Results: To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix. Availability: Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/. Contact: pfbaldi@ics.uci.edu.
AB - Motivation: The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem. Results: To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix. Availability: Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/. Contact: pfbaldi@ics.uci.edu.
UR - http://www.scopus.com/inward/record.url?scp=77952832818&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btq140
DO - 10.1093/bioinformatics/btq140
M3 - Article
C2 - 20378557
AN - SCOPUS:77952832818
VL - 26
SP - 1348
EP - 1356
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
IS - 10
M1 - btq140
ER -