A predictive algorithm to identify ever smoking in medical claims-based epidemiologic studies

Irene Faust, Mark Warden, Alejandra Camacho-Soto, Brad A. Racette, Susan Searles Nielsen

Research output: Contribution to journalArticlepeer-review


Purpose: To develop and validate an algorithm to estimate probability of ever smoking using administrative claims. Methods: Using population-based samples of Medicare-aged individuals (121,278 Behavioral Risk Factor Surveillance System survey respondents and 207,885 Medicare beneficiaries), we developed a logistic regression model to predict probability of ever smoking from demographic and claims data. We applied the model in 1,657,266 additional Medicare beneficiaries and calculated area under the receiver operating characteristic curve (AUC) using presence or absence of a tobacco-specific diagnosis or procedure code as our “gold standard.” We used these “gold standard” and lung/laryngeal cancer codes to over-ride predicted probability as 100%. We calculated Spearman's rho between probability from this full algorithm and smoking assessed in prior Parkinson disease studies, by substituting our observed and prior (“true”) smoking-Parkinson disease odds ratios into the attenuation equation. Results: The predictive model contained 23 variables, including basic demographics, high alcohol consumption, asthma, cardiovascular disease and associated risk factors, selected cancers, and indicators of routine medical usage. The AUC was 67.6% (95% confidence interval 67.5%−67.7%) comparing smoking probability to tobacco-specific diagnosis or procedure codes. Spearman's rho for the full algorithm was 0.82. Conclusions: Ever smoking might be approximated in administrative data for use as a continuous, probabilistic variable in epidemiologic analyses.

Original languageEnglish
Pages (from-to)59-67.e6
JournalAnnals of Epidemiology
StatePublished - Sep 2023


  • Administrative claims
  • Healthcare
  • Medicare
  • Smoking
  • Tobacco


Dive into the research topics of 'A predictive algorithm to identify ever smoking in medical claims-based epidemiologic studies'. Together they form a unique fingerprint.

Cite this