Light-duty vehicle emissions regulations worldwide impose stringent limits on particulate matter (PM) emissions, necessitating accurate modelling and prediction of particulate emissions across a range of sizes (as low as 10 nm). It has been shown that the decision tree-based ensemble machine learning technique known as Random Forest can accurately predict particle size, concentration, and accumulation mode geometric standard deviation (GSD) for particulate emission diameters as low as 23 nm from a highly boosted gasoline direct injection (GDI) engine operating on a single fuel, while also offering insights into the underlying factors of emissions production because of the interpretable nature of decision trees.
This work builds on the prior Random Forest research as its basis and further investigates the relative performance of five decision tree-based machine learning techniques in predicting these particulate emission parameters and extends the work to 10 nm particles. In addition to Random Forest, the selected techniques consist of four gradient boosting models: GBM, XGBoost, LightGBM, and CatBoost. Moreover, the influences of fuel chemistry are assessed by using data from 13 gasoline fuel blends, including blends with ethanol and methanol – common bio- and e-fuels. The results show that the CatBoost model achieves the highest prediction accuracy (R2 between 0.77 and 0.932), even when the feature set is reduced to improve computational efficiency. Random Forest and LightGBM are also shown to be suitable for PM emissions estimation. Permutation feature importance was used to highlight the dependence of PM emissions on both fuel and engine operating parameters – offering new insights into the effect of fuel properties on particulate emissions and their formation in highly boosted engines.