MOTIVATION: Small non-coding RNAs (sRNAs) have major roles in the post-transcriptional regulation in prokaryotes. The experimental validation of a relatively small number of sRNAs in few species requires developing computational algorithms capable of robustly encoding the available knowledge and using this knowledge to predict sRNAs within and across species.
RESULTS: We present a novel methodology designed to identify bacterial sRNAs by incorporating the knowledge encoded by different sRNA prediction methods and optimally aggregating them as potential predictors. Because some of these methods emphasize specificity, whereas others emphasize sensitivity while detecting sRNAs, their optimal aggregation constitutes trade-off solutions between these two contradictory objectives that enhance their individual merits. Many non-redundant optimal aggregations uncovered by using multiobjective optimization techniques are then combined into a multiclassifier, which ensures robustness during detection and prediction even in genomes with distinct nucleotide composition. By training with sRNAs in Salmonella enterica Typhimurium, we were able to successfully predict sRNAs in Sinorhizobium meliloti, as well as in multiple and poorly annotated species. The proposed methodology, like a meta-analysis approach, may begin to lay a possible foundation for developing robust predictive methods across a wide spectrum of genomic variability.
AVAILABILITY AND IMPLEMENTATION: Scripts created for the experimentation are available at http://m4m.ugr.es/SupInfo/sRNAOS/sRNAOSscripts.zip.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.