Replication: The need for matched data, part 2

R. Culverhouse, T. Hinrichs, Stassen, H. Begleiter, T. Reich

Research output: Contribution to journalArticlepeer-review


Reproducibility of results has proved to be a major issue for molecular genetic studies of complex diseases. Crucial factors are phenotypic heterogeneity, unknown ethnic variation, differences in family structure, environmental impacts, and secular trends. If two samples are dissimilar structurally, linkage results derived from them may not be compatible. This problem can be addressed using a suitable similarity measure, pooling data sets and redividing them so that the new sets are better matched with respect to crucial factors. We applied this method to WaveI and WaveII data from The Collaborative Study on the Genetics of Alcoholism (COGA), assigning a vector to each pedigree representing its family structure and pattern of missing genotype data. We paired families in such a way that the sum of the squared vector length differences between matched pairs became minimal. The 2 families from each matched pair were then randomly allocated to 2 new data sets. One measure of our success on the COGA data is that while the first division of the data displayed significantly different allele frequency distributions at over 50 markers, the matched data sets differed significantly at only one of the 351 markers. The two sets in the new data division were also well matched regarding ethnicity, gender, family size, and affection status. We believe that this method can counteract some of the impediments to reproducing linkage signals for complex diseases.

Original languageEnglish
Pages (from-to)569
Number of pages1
JournalAmerican Journal of Medical Genetics - Neuropsychiatric Genetics
Issue number4
StatePublished - Aug 7 2000


Dive into the research topics of 'Replication: The need for matched data, part 2'. Together they form a unique fingerprint.

Cite this