This paper addresses the issues of long-term signal loss in localization and cumulative drift in SLAM-based online mapping and localization in autonomous valet parking scenarios. A GPS, INS, and SLAM fusion localization framework is proposed, enabling centimeter-level localization with wide scene adaptability at multiple scales. The framework leverages the coupling of LiDAR and Inertial Measurement Unit (IMU) to create a point cloud map within the parking environment. The IMU pre-integration information is used to provide rough pose estimation for point cloud frames, and distortion correction, line and plane feature extraction are performed for pose estimation. The map is optimized and aligned with a global coordinate system during the mapping process, while a visual Bag-of-Words model is built to remove dynamic features. The fusion of prior map knowledge and various sensors is employed for in-scene localization, where a GPS-fusion Bag-of-Words model is used for vehicle pose initialization. Finally, Error-State Kalman filtering is conducted for point cloud matching and IMU pre-integration information, resulting in filtered accurate poses. In our Bag-of-Words-based localization approach, YOLO object detection is used to exclude keyframes that may have dynamic features. When the vehicle reaches a similar scene, it triggers pose optimization to improve the accuracy and stability of initial localization. This paper validates the proposed SLAM system on multiple sequences of KITTI dataset to demonstrate the accuracy of prior maps. Finally, a vehicle platform was built for localization experiments in parking scenarios. In the absence of sufficient GPS signal, the optimal RMSE of the trajectory can reach 5.24cm, with an angle error within 0.35 °.