TY - JOUR
T1 - A comparative evaluation of machine learning algorithms for predicting syngas fermentation outcomes
AU - Roell, Garrett W.
AU - Sathish, Ashik
AU - Wan, Ni
AU - Cheng, Qianshun
AU - Wen, Zhiyou
AU - Tang, Yinjie J.
AU - Bao, Forrest Sheng
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/8
Y1 - 2022/8
N2 - Clostridium carboxidivorans can use syngas to produce acids and alcohols. However, simulating gas fermentation dynamics remains challenging. This study employed data transformation and machine learning (ML) approaches to predict syngas fermentation behavior. Syngas composition and fermentative metabolite concentrations (features) were paired with the production rates (prediction targets) of acetate, ethanol, butyrate, and butanol at each time point. This transformation avoided the use of time as a feature. Data augmentation by polynomial smoothing of experimental measurements was used to create a database for supervised learning of 836 rate instances from 10 gas compositions. Seven families of ML algorithms were compared, including neural networks, support vector machines, random forests, elastic nets, lasso regressors, k-nearest neighbors, and Bayesian ridge regressors. These algorithms predicted production rates for training data with Pearson correlation coefficients (R2 > 0.9), but they showed poorer performance for predicting unseen test data. Among the algorithms, random forests and support vector machines produced the most accurate predictions for the test data, which could regenerate product concentration curves (R2 ≈ 0.85). In contrast, neural networks had a higher risk of overfitting. Additionally, ML-based feature importance analysis highlighted the significant impacts of CO and H2 on alcohol production, which offersguidance for model predictive control. Together, these findings can help direct future applications of ML algorithms to complex bioprocesses with limited data.
AB - Clostridium carboxidivorans can use syngas to produce acids and alcohols. However, simulating gas fermentation dynamics remains challenging. This study employed data transformation and machine learning (ML) approaches to predict syngas fermentation behavior. Syngas composition and fermentative metabolite concentrations (features) were paired with the production rates (prediction targets) of acetate, ethanol, butyrate, and butanol at each time point. This transformation avoided the use of time as a feature. Data augmentation by polynomial smoothing of experimental measurements was used to create a database for supervised learning of 836 rate instances from 10 gas compositions. Seven families of ML algorithms were compared, including neural networks, support vector machines, random forests, elastic nets, lasso regressors, k-nearest neighbors, and Bayesian ridge regressors. These algorithms predicted production rates for training data with Pearson correlation coefficients (R2 > 0.9), but they showed poorer performance for predicting unseen test data. Among the algorithms, random forests and support vector machines produced the most accurate predictions for the test data, which could regenerate product concentration curves (R2 ≈ 0.85). In contrast, neural networks had a higher risk of overfitting. Additionally, ML-based feature importance analysis highlighted the significant impacts of CO and H2 on alcohol production, which offersguidance for model predictive control. Together, these findings can help direct future applications of ML algorithms to complex bioprocesses with limited data.
KW - Clostridium carboxidivorans
KW - Data transformation
KW - Model predictive control
KW - Neural network
KW - Random forest
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85136492271&partnerID=8YFLogxK
U2 - 10.1016/j.bej.2022.108578
DO - 10.1016/j.bej.2022.108578
M3 - Article
AN - SCOPUS:85136492271
SN - 1369-703X
VL - 186
JO - Biochemical Engineering Journal
JF - Biochemical Engineering Journal
M1 - 108578
ER -