TY - JOUR
T1 - Expanding Access to Large-Scale Genomic Data While Promoting Privacy
T2 - A Game Theoretic Approach
AU - Wan, Zhiyu
AU - Vorobeychik, Yevgeniy
AU - Xia, Weiyi
AU - Clayton, Ellen Wright
AU - Kantarcioglu, Murat
AU - Malin, Bradley
N1 - Publisher Copyright:
© 2017 American Society of Human Genetics
PY - 2017/2/2
Y1 - 2017/2/2
N2 - Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals—the Sequence and Phenotype Integration Exchange (SPHINX)—and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.
AB - Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals—the Sequence and Phenotype Integration Exchange (SPHINX)—and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.
KW - Electronic Medical Records and Genomics Network
KW - Sequence and Phenotype Integration Exchange
KW - adversarial modeling
KW - game theory
KW - genetic algorithm
KW - genomic data privacy
KW - genomic data sharing policy
KW - re-identification risk
KW - sensitivity analysis
KW - summary statistics
UR - https://www.scopus.com/pages/publications/85008517738
U2 - 10.1016/j.ajhg.2016.12.002
DO - 10.1016/j.ajhg.2016.12.002
M3 - Article
C2 - 28065469
AN - SCOPUS:85008517738
SN - 0002-9297
VL - 100
SP - 316
EP - 322
JO - American journal of human genetics
JF - American journal of human genetics
IS - 2
ER -