With the development of vehicles equipped with automated driving systems, the need for systematic evaluation of AV performance has grown increasingly imperative. According to ISO 34502, one of the safety test objectives is to learn the minimum performance levels required for diverse scenarios. To address this need, this paper combines two essential methodologies - scenario-based testing procedures and scoring systems - to systematically evaluate the behavioral competence of AVs. In this study, we conduct comprehensive testing across diverse scenarios within a simulator environment following Mcity AV Driver Licensing Test procedure. These scenarios span several common real-world driving situations, including BV Cut-in, BV Lane Departure into VUT Path from Opposite Direction, BV Left Turn Across VUT Path, and BV Right Turn into VUT Path scenarios. Furthermore, the test cases are divided into different risk levels, allowing the AV to be tested in a variety of risk-level situations, with a focus on high-value test cases to increase testing efficiency. Our evaluation leverages the driving assessment (DA) methodology, providing a quantitative framework for objectively assessing AV’s system performance. In this system, scores are systematically assigned based on the comprehensive testing results, yielding a precise understanding of Autoware.ai’s capabilities. This assessment is especially valuable given Autoware.ai’s status as an open-source AV software. The results of our testing provide a promising evaluation of Autoware.ai’s performance. The combination of the Mcity AV Driver Licensing Test and DA methodology contributes to a holistic understanding of Autoware.ai’s behavior competence and demonstrates its capacity to handle safety-critical events in the test scenarios. These findings could not only help understanding of the performance of the vehicle under test (VUT) but also help developers identify the issues in their AV.