Diagnostic yield of targeted next generation sequencing in various cancer types: An information-theoretic approach

Ian S. Hagemann, Patrick K. O'Neill, Ivan Erill, John D. Pfeifer

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


The information-theoretic concept of Shannon entropy can be used to quantify the information provided by a diagnostic test. We hypothesized that in tumor types with stereotyped mutational profiles, the results of NGS testing would yield lower average information than in tumors with more diverse mutations. To test this hypothesis, we estimated the entropy of NGS testing in various cancer types, using results obtained from clinical sequencing. A set of 238 tumors were subjected to clinical targeted NGS across all exons of 27 genes. There were 120 actionable variants in 109 cases, occurring in the genes KRAS, EGFR, PTEN, PIK3CA, KIT, BRAF, NRAS, IDH1, and JAK2. Sequencing results for each tumor were modeled as a dichotomized genotype (actionable mutation detected or not detected) for each of the 27 genes. Based upon the entropy of these genotypes, sequencing was most informative for colorectal cancer (3.235 bits of information/case) followed by high grade glioma (2.938 bits), lung cancer (2.197 bits), pancreatic cancer (1.339 bits), and sarcoma/STTs (1.289 bits). In the most informative cancer types, the information content of NGS was similar to surgical pathology examination (modeled at approximately 2-3 bits). Entropy provides a novel measure of utility for laboratory testing in general and for NGS in particular. This metric is, however, purely analytical and does not capture the relative clinical significance of the identified variants, which may also differ across tumor types.

Original languageEnglish
Pages (from-to)441-447
Number of pages7
JournalCancer Genetics
Issue number9
StatePublished - Sep 1 2015


  • Comparative effectiveness research
  • High-throughput nucleotide sequencing
  • Information theory
  • Molecular diagnostic techniques
  • Neoplasms


Dive into the research topics of 'Diagnostic yield of targeted next generation sequencing in various cancer types: An information-theoretic approach'. Together they form a unique fingerprint.

Cite this