Abstract
Background: In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that - assuming each molecular biomarker to be a diagnostic test - enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis. Results: Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed. Conclusion: The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Original language | English |
---|---|
Article number | 442 |
Journal | BMC bioinformatics |
Volume | 7 |
DOIs | |
State | Published - Oct 10 2006 |