TY - JOUR
T1 - Spot the difference
T2 - Comparing results of analyses from real patient data and synthetic derivatives
AU - Foraker, Randi E.
AU - Yu, Sean C.
AU - Gupta, Aditi
AU - Michelson, Andrew P.
AU - Pineda Soto, Jose A.
AU - Colvin, Ryan
AU - Loh, Francis
AU - Kollef, Marin H.
AU - Maddox, Thomas
AU - Evanoff, Bradley
AU - Dror, Hovav
AU - Zamstein, Noa
AU - Lai, Albert M.
AU - Payne, Philip R.O.
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Background: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). Results: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. Discussion and conclusion: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
AB - Background: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). Results: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. Discussion and conclusion: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
KW - data analysis
KW - electronic health records and systems
KW - precision health care
KW - protected health information
KW - synthetic data
UR - http://www.scopus.com/inward/record.url?scp=85113703443&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooaa060
DO - 10.1093/jamiaopen/ooaa060
M3 - Article
AN - SCOPUS:85113703443
SN - 2574-2531
VL - 3
SP - 557
EP - 566
JO - JAMIA Open
JF - JAMIA Open
IS - 4
ER -