Perception is a key component of automated vehicles (AVs). However, sensors
mounted to the AVs often encounter blind spots due to obstructions from other
vehicles, infrastructure, or objects in the surrounding area. While recent
advancements in planning and control algorithms help AVs react to sudden object
appearances from blind spots at low speeds and less complex scenarios,
challenges remain at high speeds and complex intersections.
Vehicle-to-infrastructure (V2I) technology promises to enhance scene
representation for connected and automated vehicles (CAVs) in complex
intersections, providing sufficient time and distance to react to adversary
vehicles violating traffic rules. Most existing methods for infrastructure-based
vehicle detection and tracking rely on LIDAR, RADAR, or sensor fusion methods,
such as LIDAR–camera and RADAR–camera. Although LIDAR and RADAR provide accurate
spatial information, the sparsity of point cloud data limits their ability to
capture detailed object contours of objects far away, resulting in inaccurate 3D
object detection results. Furthermore, the absence of LIDAR or RADAR at every
intersection increases the cost of implementing V2I technology. To address these
challenges, this article proposes a V2I framework that utilizes monocular
traffic cameras at road intersections to detect 3D objects. The results from the
roadside unit (RSU) are then combined with the on-board system using an
asynchronous late fusion method to enhance scene representation. Additionally,
the proposed framework provides a time delay compensation module to compensate
for the processing and transmission delay from the RSU. Lastly, the V2I
framework is tested by simulating and validating a scenario similar to the one
described in an industry report by Waymo. The results show that the proposed
method improves the scene representation and the CAV’s perception range, giving
it enough time and space to react to the adversary vehicles.