Objective: Rapid developments in understanding the molecular mechanisms underlying cognitive deficits in neurodevelopmental disorders have increased expectations for targeted, mechanism-based treatments. However, translation from preclinical models to human clinical trials has proven challenging. Poor reproducibility of cognitive endpoints may provide one explanation for this finding. We examined the suitability of cognitive outcomes for clinical trials in children with neurofibromatosis type 1 (NF1) by examining test-retest reliability of the measures and the application of data reduction techniques to improve reproducibility. Methods: Data were analyzed from the STARS clinical trial (n = 146), a multi-center double-blind placebo-controlled phase II trial of lovastatin, conducted by the NF Clinical Trials Consortium. Intra-class correlation coefficients were generated between pre- and post-performances (16-week interval) on neuropsychological endpoints in the placebo group to determine test-retest reliabilities. Confirmatory factor analysis was used to reduce data into cognitive domains and account for measurement error. Results: Test-retest reliabilities were highly variable, with most endpoints demonstrating unacceptably low reproducibility. Data reduction confirmed four distinct neuropsychological domains: executive functioning/attention, visuospatial ability, memory, and behavior. Test-retest reliabilities of latent factors improved to acceptable levels for clinical trials. Applicability and utility of our model was demonstrated by homogeneous effect sizes in the reanalyzed efficacy data. Interpretation: These data demonstrate that single observed endpoints are not appropriate to determine efficacy, partly accounting for the poor test-retest reliability of cognitive outcomes in clinical trials in neurodevelopmental disorders. Recommendations to improve reproducibility are outlined to guide future trial design.