Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction

Jeffrey J. Czajka, Tolutola Oyetunde, Yinjie J. Tang

Research output: Contribution to journalArticlepeer-review

34 Scopus citations


Predicting bioproduction titers from microbial hosts has been challenging due to complex interactions between microbial regulatory networks, stress responses, and suboptimal cultivation conditions. This study integrated knowledge mining, feature extraction, genome-scale modeling (GSM), and machine learning (ML) to develop a model for predicting Yarrowia lipolytica chemical titers (i.e., organic acids, terpenoids, etc.). First, Y. lipolytica production data, including cultivation conditions, genetic engineering strategies, and product information, was manually collected from literature (~100 papers) and stored as either numerical (e.g., substrate concentrations) or categorical (e.g., bioreactor modes) variables. For each case recorded, central pathway fluxes were estimated using GSMs and flux balance analysis (FBA) to provide metabolic features. Second, a ML ensemble learner was trained to predict strain production titers. Accurate predictions on the test data were obtained for instances with production titers >1 g/L (R2 = 0.87). However, the model had reduced predictability for low performance strains (0.01–1 g/L, R2 = 0.29) potentially due to biosynthesis bottlenecks not captured in the features. Feature ranking indicated that the FBA fluxes, the number of enzyme steps, the substrate inputs, and thermodynamic barriers (i.e., Gibbs free energy of reaction) were the most influential factors. Third, the model was evaluated on other oleaginous yeasts and indicated there were conserved features for some hosts that can be potentially exploited by transfer learning. The platform was also designed to assist computational strain design tools (such as OptKnock) to screen genetic targets for improved microbial production in light of experimental conditions.

Original languageEnglish
Pages (from-to)227-236
Number of pages10
JournalMetabolic Engineering
StatePublished - Sep 2021


  • Computational strain design
  • FBA
  • Machine learning
  • Pathway bottlenecks
  • Yarrowia lipolytica


Dive into the research topics of 'Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction'. Together they form a unique fingerprint.

Cite this