TY - JOUR
T1 - Characterizing substructure via mixture modeling in large-scale genetic summary statistics
AU - Colorado Center for Personalized Medicine
AU - Stoneman, Hayley R.
AU - Price, Adelle M.
AU - Trout, Nikole Scribner
AU - Lamont, Riley
AU - Tifour, Souha
AU - Pozdeyev, Nikita
AU - Anderson, Heather D.
AU - Aquilante, Christina L.
AU - Arbogast, Kelsey
AU - Arehart, Christopher H.
AU - Brooks, Ian M.
AU - Brunetti, Tonya M.
AU - Brutus-Lestin, Judith
AU - Burke, Elizabeth E.
AU - Casteel, Emily M.
AU - Cole, Joanne B.
AU - Coughlin, Curtis R.
AU - Crooks, Kristy
AU - Crawford, Jacob
AU - Culver, Erin
AU - Edelmann, Michelle N.
AU - Fisher, Matthew J.
AU - Franklin, Alan W.
AU - Frye, Teresa C.
AU - George, Hunter
AU - Gignoux, Chris R.
AU - Gilliland, Elizabeth K.
AU - Greene, Casey S.
AU - Hawkes, Brooke
AU - Hearst, Emily
AU - Hendricks, Audrey E.
AU - Johnson, Randi K.
AU - Julian, Colleen G.
AU - Kao, Dave
AU - Konigsberg, Iain
AU - Ku, Lisa
AU - Kudron, Elizabeth L.
AU - Lacy, Rashawnda
AU - Lange, Ethan M.
AU - Lee, Yee Ming
AU - Lesny, Joe A.
AU - Lin, Meng
AU - Lowery, Jan T.
AU - Vargas, Luciana B.
AU - Maldonado, Betzaida L.
AU - Marceau, Darcy
AU - Martin, James L.
AU - Gates, Brianna L.
AU - Mayer, David
AU - McDaniel, Nicole L.
AU - Monte, Andrew
AU - Moore, Ethan
AU - Nadrash, Ann
AU - Pattee, Jack
AU - Pozdeyev, Nikita
AU - Radwan, Alaa
AU - Rafaels, Nick
AU - Raghavan, Sridharan
AU - Rasouli, Neda
AU - Shalowitz, Elise L.
AU - Sherif, Hoda
AU - Shortt, Johnathan A.
AU - Stewart, Adrian M.
AU - Sutton, Kristen J.
AU - Swartz, Carolyn T.
AU - Tanaka, Anna
AU - Taylor, Matthew R.G.
AU - Teague, Candace
AU - Todd, Emily B.
AU - Trinkley, Katy E.
AU - Wiley, Laura K.
AU - Crooks, Kristy
AU - Lin, Meng
AU - Rafaels, Nicholas
AU - Gignoux, Christopher R.
AU - Marker, Katie M.
AU - Hendricks, Audrey E.
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2025/2/6
Y1 - 2025/2/6
N2 - Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
AB - Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
KW - admixed
KW - confounding
KW - equitable research
KW - federated learning
KW - genetic similarity
KW - genetic summary data
KW - harmonization
KW - local ancestry
KW - population stratification
KW - selection
KW - substructure
KW - summary data
UR - http://www.scopus.com/inward/record.url?scp=85216698822&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2024.12.007
DO - 10.1016/j.ajhg.2024.12.007
M3 - Article
C2 - 39824191
AN - SCOPUS:85216698822
SN - 0002-9297
VL - 112
SP - 235
EP - 253
JO - American journal of human genetics
JF - American journal of human genetics
IS - 2
ER -