Owing to many categories, various scales, complex backgrounds and object occlusion of traffic participants in the RGB images of roadside perception, there are a certain number of object false detection and missed detection. Cascade R-CNN, a two-stage object detection network, is an image algorithm with better effect. In this paper, we makes several optimizations to improve the detection accuracy based on this network. The specific improvements are as follows: ResNeXt50 is used as the backbone structure to replace the original ResNet50. We obtain the aspect ratio that meets the requirements of roadside perception image by counting the aspect ratio of the annotation box. By expanding the feature pyramid network to PANet, we further integrate multi-scale information, which contributes to improve the ability to extract low-level object information. Since the bounding box coordinates calculated by Smooth L1 Loss are independent, we introduce GIoU to optimize the regression accuracy of the bounding box.. In order to evaluate the roadside perception algorithm more comprehensively and accurately, this paper adds quantitative metrics such as detection rate, correct detection rate, missed detection rate, false detection rate on the basis of common metrics (AP, Precision, Recall). On the self-built roadside dataset Vanjee_RS_2D, in the case of the same FLOPS, the mAP increased by 0.95% as just using ResNeXt50. After adding GIoU and PANet, the mAP increased by 0.32%and 0.76%. The mAP of the optimized model is 2.03% higher, where other metrics are also optimized accordingly.