TY - JOUR
T1 - Replication
T2 - The need for matched data, part 2
AU - Culverhouse, R.
AU - Hinrichs, T.
AU - Stassen,
AU - Begleiter, H.
AU - Reich, T.
PY - 2000/8/7
Y1 - 2000/8/7
N2 - Reproducibility of results has proved to be a major issue for molecular genetic studies of complex diseases. Crucial factors are phenotypic heterogeneity, unknown ethnic variation, differences in family structure, environmental impacts, and secular trends. If two samples are dissimilar structurally, linkage results derived from them may not be compatible. This problem can be addressed using a suitable similarity measure, pooling data sets and redividing them so that the new sets are better matched with respect to crucial factors. We applied this method to WaveI and WaveII data from The Collaborative Study on the Genetics of Alcoholism (COGA), assigning a vector to each pedigree representing its family structure and pattern of missing genotype data. We paired families in such a way that the sum of the squared vector length differences between matched pairs became minimal. The 2 families from each matched pair were then randomly allocated to 2 new data sets. One measure of our success on the COGA data is that while the first division of the data displayed significantly different allele frequency distributions at over 50 markers, the matched data sets differed significantly at only one of the 351 markers. The two sets in the new data division were also well matched regarding ethnicity, gender, family size, and affection status. We believe that this method can counteract some of the impediments to reproducing linkage signals for complex diseases.
AB - Reproducibility of results has proved to be a major issue for molecular genetic studies of complex diseases. Crucial factors are phenotypic heterogeneity, unknown ethnic variation, differences in family structure, environmental impacts, and secular trends. If two samples are dissimilar structurally, linkage results derived from them may not be compatible. This problem can be addressed using a suitable similarity measure, pooling data sets and redividing them so that the new sets are better matched with respect to crucial factors. We applied this method to WaveI and WaveII data from The Collaborative Study on the Genetics of Alcoholism (COGA), assigning a vector to each pedigree representing its family structure and pattern of missing genotype data. We paired families in such a way that the sum of the squared vector length differences between matched pairs became minimal. The 2 families from each matched pair were then randomly allocated to 2 new data sets. One measure of our success on the COGA data is that while the first division of the data displayed significantly different allele frequency distributions at over 50 markers, the matched data sets differed significantly at only one of the 351 markers. The two sets in the new data division were also well matched regarding ethnicity, gender, family size, and affection status. We believe that this method can counteract some of the impediments to reproducing linkage signals for complex diseases.
UR - http://www.scopus.com/inward/record.url?scp=33749113050&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:33749113050
SN - 1552-4841
VL - 96
SP - 569
JO - American Journal of Medical Genetics - Neuropsychiatric Genetics
JF - American Journal of Medical Genetics - Neuropsychiatric Genetics
IS - 4
ER -