TY - JOUR
T1 - Optimization of multi-classifiers for computational biology
T2 - Application to gene finding and expression
AU - Romero-Zaliz, Rocío
AU - Rubio-Escudero, Cristina
AU - Zwir, Igor
AU - del Val, Coral
N1 - Funding Information:
This work was supported in part by the Spanish Ministry of Science and Technology (MEC) under project TIN-2006-12879 and the Consejeria de Innovacion, Investigacion y Ciencia de la Junta de Andalucia under project TIC-02788. I. Zwir is a senior research scientist supported by the Howard Hughes Medical Institute and the “Ramon y Cajal” program of the MEC, C. del Val was supported by the “Programa de Retorno de Investigadores” from the Junta de Andalucia.
Funding Information:
We applied our methodology to the both referred problems. In the gene finding problem, we used the EGASP sets from the ENCODE Genome Annotation Assessment Project (EGASP) [, ]. These datasets contain manually curated fragments of the human genome originating from the ENCODE project []. This data set was selected by the EGASP assessment because the genes encoded in these regions were not used to train any particular gene predictor. Therefore, it is not a biased dataset. In the case of analysis of the microarrays, we used a dataset derived from the analysis of longitudinal blood expression profiles of human volunteers treated with intravenous endotoxin, compared to those treated with a placebo in order to study the inflammation and human response to injury. This dataset was part of a Large-scale Collaborative Research Project sponsored by the National Institute of General Medical Sciences [].
PY - 2010/3
Y1 - 2010/3
N2 - Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome and their expression. We propose a multi-objective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain optimal methods' aggregations. The results obtained show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems. The methodology proposed here is an automatic method generator, and a step forward to exploit all already existing methods, by providing alternative optimal methods' aggregations to answer concrete queries for a certain biological problem with a maximized accuracy of the prediction. As more approaches are integrated for each of the presented problems, de novo accuracy can be expected to improve further.
AB - Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome and their expression. We propose a multi-objective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain optimal methods' aggregations. The results obtained show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems. The methodology proposed here is an automatic method generator, and a step forward to exploit all already existing methods, by providing alternative optimal methods' aggregations to answer concrete queries for a certain biological problem with a maximized accuracy of the prediction. As more approaches are integrated for each of the presented problems, de novo accuracy can be expected to improve further.
KW - Gene expression
KW - Gene finding
KW - Multiobjective
UR - http://www.scopus.com/inward/record.url?scp=77949266573&partnerID=8YFLogxK
U2 - 10.1007/s00214-009-0648-3
DO - 10.1007/s00214-009-0648-3
M3 - Article
AN - SCOPUS:77949266573
SN - 1432-881X
VL - 125
SP - 599
EP - 611
JO - Theoretical Chemistry Accounts
JF - Theoretical Chemistry Accounts
IS - 3-6
ER -