Semi-active suspension systems enhance ride comfort and handling performance by
adaptively modulating damping characteristics. However, conventional model-based
controllers often fail to maintain optimal performance under uncertain and
time-varying vehicle conditions. This article proposes Bayesian
Optimization–Tuned Proximal Policy Optimization with Non-Parametric Rewards
(BO-NRPPO), a novel reinforcement learning (RL) framework that integrates
Bayesian Optimization (BO) with Proximal Policy Optimization (PPO) and a
non-parametric reward function (NRF). The proposed approach enables adaptive
self-tuning, data-driven reward shaping, and uncertainty-aware policy learning.
Moreover, a Trapezoidal Simple Moving Average (TSMA)–based reward normalization
scheme is introduced to accelerate convergence and stabilize training.
Simulation results across diverse driving scenarios demonstrate that BO-NRPPO
outperforms the passive suspension, the classical Linear Quadratic Regulator
(LQR), and PPO with parametric rewards. Specifically, compared to the passive
suspension and the LQR baseline, BO-NRPPO achieves up to 6.63% and 5.14%
improvements in handling stability, respectively. Concurrently, it delivers
maximum enhancements of 46.96% and 42.55% in ride comfort over these two
baselines. For real-world vehicle applications, this adaptive self-tuning
capability significantly reduces the time-consuming manual calibration efforts
typically required in chassis development. Furthermore, Hardware-in-the-loop
(HiL) validation confirms its real-time applicability and robustness under
uncertain driving conditions, highlighting its immense potential as a scalable
intelligent suspension control solution.