Optimization of Population Frequency Cutoffs for Filtering Common Germline Polymorphisms from Tumor-Only Next-Generation Sequencing Data

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Clinical next-generation sequencing assays are often run on tumor specimens without a matched normal specimen, which complicates the differentiation of germline from somatic variants. In tumor-only testing, population data are often used to infer germline status, though no consensus exists on the exact population frequency (PF) cutoff above which a variant should be considered likely germline. In this study, five population databases plus the Catalog of Somatic Mutations in Cancer were used to demonstrate the impact of changing the PF cutoff on assignment of variants as germline versus somatic. The 1% to 2% PF cutoffs widely used in bioinformatic pipelines resulted in high sensitivity for classification of somatic variants, but unnecessarily reduced sensitivity for germline variants. Using optimized PF cutoffs, the source of variants in The Cancer Genome Atlas (TCGA) data could be predicted with >95% accuracy. Further exploration of four TCGA cancer data sets indicated that the optimal cutoff is influenced by both cancer type and the assay region of interest. Comparing TCGA data to data generated from a clinical, hybridization capture test (approximately 615 kb capture space) showed that PF cutoffs may not be transferable between assays, even when the gene set is held constant. Thus, filtering approaches need to be carefully designed and optimized, and should be assay-specific to support tumor-only testing until tumor-normal testing becomes routine in the clinical setting.

Original languageEnglish
Pages (from-to)903-912
Number of pages10
JournalJournal of Molecular Diagnostics
Volume21
Issue number5
DOIs
StatePublished - Sep 2019

Fingerprint Dive into the research topics of 'Optimization of Population Frequency Cutoffs for Filtering Common Germline Polymorphisms from Tumor-Only Next-Generation Sequencing Data'. Together they form a unique fingerprint.

Cite this