This work describes ANN-Spec, a machine learning algorithm and its application to discovering un-gapped patterns in DNA sequence. The approach makes use of an Artificial Neural Network and a Gibbs sampling method to define the Specificity of a DNA-binding protein. ANN-Spec searches for the parameters of a simple network (or weight matrix) that will maximize the specificity for binding sequences of a positive set compared to a background sequence set. Binding sites in the positive data set are found with the resulting weight matrix and these sites are then used to define a local multiple sequence alignment. Training complexity is O(lN) where l is the width of the pattern and N is the size of the positive training data. A quantitative comparison of ANN-Spec and a few related programs is presented. The comparison shows that ANN-Spec finds patterns of higher specificity when training with a background data set. The program and documentation are available from the authors for UNIX systems.
|Number of pages
|Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
|Published - 2000