Bearings are fundamental components in automotive systems, ensuring smooth operation, efficiency, and longevity. They are widely used in various automotive systems such as wheel hubs, transmissions, engines, steering systems etc. Early detection of bearing defects during End-of-Line (EOL) testing and operational phases is crucial for preventive maintenance, thereby preventing system malfunctions. In the era of Industry 4.0, vibrational, accelerometer, and other IoT sensors are actively engaged in capturing performance data and identifying defects. These sensors generate vast amounts of data, enabling the development of advanced data-driven applications and leveraging deep learning models. While deep learning approaches have shown promising results in bearing fault diagnosis, they often require extensive data, complex model architectures, and specialized hardware. This study proposes a novel method leveraging the capabilities of Vision Language Models (VLMs) and Large Language Models