ObjectiveTo perform a longitudinal analysis of clinical features associated with neurofibromatosis type 1 (NF1) based on demographic and clinical characteristics and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.MethodsUsing NF1 as a model system, we perform retrospective data analyses using a manually curated NF1 clinical registry and electronic health record (EHR) information and develop machine learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.ResultsMales and females were evenly represented in the cohort. White children were more likely to develop OPG (odds ratio [OR]: 2.11, 95% confidence interval [CI]: 1.11-4.00, p = 0.02) relative to their non-White peers. Median age at diagnosis of OPG was 6.5 years (1.7-17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95% CI: 1.33-2.70, p < 0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an area under the receiver operator characteristic (AUROC) of 0.74 and predicted diagnosis of OPG with an AUROC of 0.82.ConclusionsUsing readily available clinical and EHR data, we successfully recapitulated several important and clinically relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naive machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.