TY - GEN
T1 - Mining and predicting CpG islands
AU - Previti, Christopher
AU - Harari, Oscar
AU - Del Val, Coral
PY - 2007
Y1 - 2007
N2 - A DNA sequence can be described as a string composed of four symbols: A, T, C and G. Each symbol represents a chemically distinct nucleotide molecule. Combinations of two nucleotides are called dinucleotides and CpG islands represent regions of a DNA sequence, certain substrings, which are enriched in CpG dinucleotides (C followed by G). CpG islands represent a prominent and enigmatic feature of vertebrate genomes. They are associated with the promoters of more than 60% of all human genes and represent a critical target for transcriptional control. Mcthylation of these CpG islands leads to structural changes in the DNA that stops the expression of any associated gene (genesilencing). The factors that provoke or impede mcthylation are currently all but unknown. In general, the maintenance of a particular pattern of methylated CpG dinucleotides represents a critical regulatory system during a host of normal developmental processes, but the erroneous methylation of CpG islands and the resulting gene-silencing can lead to the development of cancer. In this work, we present a novel unsupervised machine learning method that is capable of distinguishing biologically significant classes of CpG islands, including the separation of methylated and unmethylated CpG islands. This method represents an important novel approach that will aid in the computational prediction of methylation, which is commonly used in the pre-selection of worthwhile sequences for methylation experiments.
AB - A DNA sequence can be described as a string composed of four symbols: A, T, C and G. Each symbol represents a chemically distinct nucleotide molecule. Combinations of two nucleotides are called dinucleotides and CpG islands represent regions of a DNA sequence, certain substrings, which are enriched in CpG dinucleotides (C followed by G). CpG islands represent a prominent and enigmatic feature of vertebrate genomes. They are associated with the promoters of more than 60% of all human genes and represent a critical target for transcriptional control. Mcthylation of these CpG islands leads to structural changes in the DNA that stops the expression of any associated gene (genesilencing). The factors that provoke or impede mcthylation are currently all but unknown. In general, the maintenance of a particular pattern of methylated CpG dinucleotides represents a critical regulatory system during a host of normal developmental processes, but the erroneous methylation of CpG islands and the resulting gene-silencing can lead to the development of cancer. In this work, we present a novel unsupervised machine learning method that is capable of distinguishing biologically significant classes of CpG islands, including the separation of methylated and unmethylated CpG islands. This method represents an important novel approach that will aid in the computational prediction of methylation, which is commonly used in the pre-selection of worthwhile sequences for methylation experiments.
UR - http://www.scopus.com/inward/record.url?scp=50249140777&partnerID=8YFLogxK
U2 - 10.1109/FUZZY.2007.4295540
DO - 10.1109/FUZZY.2007.4295540
M3 - Conference contribution
AN - SCOPUS:50249140777
SN - 1424412102
SN - 9781424412105
T3 - IEEE International Conference on Fuzzy Systems
BT - 2007 IEEE International Conference on Fuzzy Systems, FUZZY
T2 - 2007 IEEE International Conference on Fuzzy Systems, FUZZY
Y2 - 23 July 2007 through 26 July 2007
ER -