For the safe and reliable deployment of lithium-ion batteries, accurate state of
health (SOH) estimation is paramount. However, most existing data-driven
methodologies depend exclusively on single-modal data, such as voltage-capacity
or incremental capacity (IC) curves. Such limited data frequently fails to offer
a holistic understanding of the complex battery degradation process. To address
this limitation, this paper proposes a novel multi-modal feature fusion network.
This network can effectively combine three different but complementary data
modalities: historical point features, voltage-capacity and IC sequence
features, as well as degraded image features. To this end, the framework
incorporates a one-dimensional convolutional neural network (1D-CNN) for
analyzing point features, leverages a Transformer encoder to process sequence
features, and employs ResNet for identifying spatio-temporal patterns in
degraded images. These heterogeneous features are then collaboratively
integrated through a fusion network. This model was validated on the CSIE
dataset, and the results showed that its performance was superior to that of the
single-modal method. At 0.5C discharge rate, the average RMSE of the fusion
model was 0.34%, the MAPE was 0.31%, and the R2 reached 0.9937. In
addition, this method demonstrates excellent robustness at different discharge
rates. Even at a high discharge rate like 3C, it can still maintain high
accuracy.