TY - JOUR
T1 - Securely measuring the overlap between private datasets with cryptosets
AU - Swamidass, S. Joshua
AU - Matlock, Matthew
AU - Rozenblit, Leon
N1 - Publisher Copyright:
© 2015 Swamidass et al.
PY - 2015/2/25
Y1 - 2015/2/25
N2 - Many scientific questions are best approached by sharing data-collected by different groups or across large collaborative networks-into a combined analysis. Unfortunately, some of the most interesting and powerful datasets-like health records, genetic data, and drug discovery data-cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
AB - Many scientific questions are best approached by sharing data-collected by different groups or across large collaborative networks-into a combined analysis. Unfortunately, some of the most interesting and powerful datasets-like health records, genetic data, and drug discovery data-cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
UR - http://www.scopus.com/inward/record.url?scp=84923770806&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0117898
DO - 10.1371/journal.pone.0117898
M3 - Article
C2 - 25714898
AN - SCOPUS:84923770806
SN - 1932-6203
VL - 10
JO - PloS one
JF - PloS one
IS - 2
M1 - e0117898
ER -