TY - JOUR
T1 - Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection
AU - ICGC-TCGA DREAM Somatic Mutation Calling Challenge Participants
AU - Lee, Anna Y.
AU - Ewing, Adam D.
AU - Ellrott, Kyle
AU - Hu, Yin
AU - Houlahan, Kathleen E.
AU - Bare, J. Christopher
AU - Espiritu, Shadrielle Melijah G.
AU - Huang, Vincent
AU - Dang, Kristen
AU - Chong, Zechen
AU - Caloian, Cristian
AU - Yamaguchi, Takafumi N.
AU - Kellen, Michael R.
AU - Chen, Ken
AU - Norman, Thea C.
AU - Friend, Stephen H.
AU - Guinney, Justin
AU - Stolovitzky, Gustavo
AU - Haussler, David
AU - Margolin, Adam A.
AU - Stuart, Joshua M.
AU - Boutros, Paul C.
AU - Barnes, Bret D.
AU - Birol, Inanc
AU - Chen, Xiaoyu
AU - Chiu, Readman
AU - Cox, Anthony J.
AU - Ding, Li
AU - Fritz, Markus H.Y.
AU - Grigoriev, Andrey
AU - Hach, Faraz
AU - Kawash, Joseph K.
AU - Korbel, Jan O.
AU - Kruglyak, Semyon
AU - Liao, Yang
AU - McPherson, Andrew
AU - Nip, Ka Ming
AU - Rausch, Tobias
AU - Sahinalp, S. Cenk
AU - Sarrafi, Iman
AU - Saunders, Christopher T.
AU - Schulz-Trieglaff, Ole
AU - Shaw, Richard
AU - Shi, Wei
AU - Smith, Sean D.
AU - Song, Lei
AU - Wang, Difei
AU - Ye, Kai
N1 - Funding Information:
This study was conducted with the support of the Ontario Institute for Cancer Research to P.C.B. through funding provided by the Government of Ontario. This work was supported by Prostate Cancer Canada and is proudly funded by the Movember Foundation—Grant #RS2014-01. This study was conducted with the support of Movember funds through Prostate Cancer Canada and with the additional support of the Ontario Institute for Cancer Research, funded by the Government of Ontario. This project was supported by Genome Canada through a Large-Scale Applied Project contract to P.C.B., S.P. Shah, and R.D. Morin. This work was supported by the Discovery Frontiers: Advancing Big Data Science in Genomics Research program, which is jointly funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Canadian Institutes of Health Research (CIHR), Genome Canada, and the Canada Foundation for Innovation (CFI). P.C.B. was supported by a Terry Fox Research Institute New Investigator Award and a CIHR New Investigator Award. K.E.H. was supported by a CIHR Computational Biology Undergraduate Summer Student Health Research Award. A.D.E was supported by an Australian Research Council Discovery Early Career Researcher Award DE150101117 and by the Mater Foundation. The following National Institutes of Health (NIH) grants supported this work: R01-CA180778 (J.M.S.) and U24-CA143858 (J.M.S.). The funders played no role in study design, data collection, data analysis, data interpretation, or in writing of this manuscript.
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/11/6
Y1 - 2018/11/6
N2 - Background: The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information. Results: To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches. Conclusions: The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon.
AB - Background: The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information. Results: To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches. Conclusions: The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon.
KW - Benchmarking
KW - Cancer genomics
KW - Crowdsourcing
KW - Simulation
KW - Somatic mutations
KW - Structural variants
KW - Whole-genome sequencing
UR - http://www.scopus.com/inward/record.url?scp=85056286059&partnerID=8YFLogxK
U2 - 10.1186/s13059-018-1539-5
DO - 10.1186/s13059-018-1539-5
M3 - Article
C2 - 30400818
AN - SCOPUS:85056286059
SN - 1474-7596
VL - 19
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 188
ER -