TY - GEN
T1 - Gradient-based feature selection for conditional random fields and its applications in computational genetics
AU - Chen, Minmin
AU - Chen, Yixin
AU - Brent, Michael R.
AU - Tenney, Aaron E.
PY - 2009
Y1 - 2009
N2 - Gene prediction is one of the first and most important steps in understanding the genome of a species, and different approaches haven been proposed. In 2007, a de novo gene predictor, called CONTRAST, based on Conditional Random Fields (CRFs) is introduced, and proved to substantially outperform previous predictors. However, the oversize feature set used in the model has posed several issues, like overfitting problem and excessive computational demand. To resolve these issues, we did a thorough survey of two existing feature selection methods for CRFs, namely the gain-based and gradient-based methods, and applied the later one to CONTRAST. The results show that with the gradient-based feature selection scheme, we are able to achieve comparable or even better prediction accuracy on testing data, using only a very small fraction of the features from the candidate pool. The feature selection method also helps researchers better understand the underlying structure of the genomic sequences, further provides insights of the function and evolutionary dynamics of genomes.
AB - Gene prediction is one of the first and most important steps in understanding the genome of a species, and different approaches haven been proposed. In 2007, a de novo gene predictor, called CONTRAST, based on Conditional Random Fields (CRFs) is introduced, and proved to substantially outperform previous predictors. However, the oversize feature set used in the model has posed several issues, like overfitting problem and excessive computational demand. To resolve these issues, we did a thorough survey of two existing feature selection methods for CRFs, namely the gain-based and gradient-based methods, and applied the later one to CONTRAST. The results show that with the gradient-based feature selection scheme, we are able to achieve comparable or even better prediction accuracy on testing data, using only a very small fraction of the features from the candidate pool. The feature selection method also helps researchers better understand the underlying structure of the genomic sequences, further provides insights of the function and evolutionary dynamics of genomes.
UR - http://www.scopus.com/inward/record.url?scp=77949510876&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2009.82
DO - 10.1109/ICTAI.2009.82
M3 - Conference contribution
AN - SCOPUS:77949510876
SN - 9781424456192
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 750
EP - 757
BT - ICTAI 2009 - 21st IEEE International Conference on Tools with Artificial Intelligence
T2 - 21st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2009
Y2 - 2 November 2009 through 5 November 2009
ER -