TY - JOUR
T1 - Independent component analysis for initial approximation determination in identification of active modules in biological graphs
AU - Gainullina, Anastasiia N.
AU - Sukhov, Vladimir D.
AU - Shalyto, Anatoly A.
AU - Sergushichev, Alexey A.
N1 - Publisher Copyright:
© 2020, ITMO University. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Subject of Research. The identification of active modules in biological graphs, for example, gene graphs, is one of the important approaches to the interpretation of experimental biological data. One of the approaches for its solution is the application of an algorithm of the joint clustering in network and correlation spaces. The algorithm finds groups of genes that are located simultaneously close in the gene graph and have a high pairwise correlation according to the matrix of gene expression values. The algorithm is iterative and one of its key parameters is the chosen initial approximation, which affects both the run time and the quality of the results. We consider the determination problem of an initial approximation for this algorithm. A procedure based on independent component analysis is proposed for the problem solution. Method. The method of independent component analysis is applied to a centered matrix of expression values at the first step of the proposed procedure for finding of an initial approximation. Then, the genes specific to the component with a given level of statistical significance are identified for each component. The gene groups obtained for all independent components are chosen as the initial approximation. Main Results. The procedure application based on the independent component analysis reduces the number of gene groups in the initial approximation without the loss of accuracy. This fact, in turn, speeds up the running time of the clustering algorithm by an order of magnitude with the quality maintenance of the results. Practical Relevance. Acceleration of the algorithm of the joint clustering in network and correlation spaces without quality loss of the results increases significantly its convenience and simplifies its application for the interpretation of transcriptome data in bioinformatics and computational biology.
AB - Subject of Research. The identification of active modules in biological graphs, for example, gene graphs, is one of the important approaches to the interpretation of experimental biological data. One of the approaches for its solution is the application of an algorithm of the joint clustering in network and correlation spaces. The algorithm finds groups of genes that are located simultaneously close in the gene graph and have a high pairwise correlation according to the matrix of gene expression values. The algorithm is iterative and one of its key parameters is the chosen initial approximation, which affects both the run time and the quality of the results. We consider the determination problem of an initial approximation for this algorithm. A procedure based on independent component analysis is proposed for the problem solution. Method. The method of independent component analysis is applied to a centered matrix of expression values at the first step of the proposed procedure for finding of an initial approximation. Then, the genes specific to the component with a given level of statistical significance are identified for each component. The gene groups obtained for all independent components are chosen as the initial approximation. Main Results. The procedure application based on the independent component analysis reduces the number of gene groups in the initial approximation without the loss of accuracy. This fact, in turn, speeds up the running time of the clustering algorithm by an order of magnitude with the quality maintenance of the results. Practical Relevance. Acceleration of the algorithm of the joint clustering in network and correlation spaces without quality loss of the results increases significantly its convenience and simplifies its application for the interpretation of transcriptome data in bioinformatics and computational biology.
KW - Clustering
KW - Correlation
KW - Gene expression
KW - Graphs
KW - Independent component analysis
UR - http://www.scopus.com/inward/record.url?scp=85097559421&partnerID=8YFLogxK
U2 - 10.17586/2226-1494-2020-20-6-888-892
DO - 10.17586/2226-1494-2020-20-6-888-892
M3 - Article
AN - SCOPUS:85097559421
SN - 2226-1494
VL - 6
SP - 888
EP - 892
JO - Scientific and Technical Journal of Information Technologies, Mechanics and Optics
JF - Scientific and Technical Journal of Information Technologies, Mechanics and Optics
ER -