Real-time traffic event information is essential for various applications,
including travel service improvement, vehicle map updating, and road management
decision optimization. With the rapid advancement of Internet, text published
from network platforms has become a crucial data source for urban road traffic
events due to its strong real-time performance and wide space-time coverage and
low acquisition cost. Due to the complexity of massive, multi-source web text
and the diversity of spatial scenes in traffic events, current methods are
insufficient for accurately and comprehensively extracting and geographizing
traffic events in a multi-dimensional, fine-grained manner, resulting in this
information cannot be fully and efficiently utilized. Therefore, in this study,
we proposed a “data preparation - event extraction - event geographization”
framework focused on traffic events, integrating geospatial information to
achieve efficient text extraction and spatial representation. First, the text
data is preprocessed, with road-related information extracted and summarized to
prepare for subsequent tasks. Next, a step-wise method for automated extraction
is introduced. Trigger words and rules of spatial relationship are set to
identify spatial elements within the text, then dictionaries of proper and
general names are applied to further recognize candidate entities. Finally, we
adopt a method for entity disambiguation by introducing spatial constraints such
as direction. Based on spatial scenes, entities representing different elements
are organized to perform spatial computing, realizing the multi-dimensional
geographization of events. A case study in Shanghai demonstrated the
effectiveness of the proposed method, showing that it improves the completeness
and accuracy of traffic event extraction while enhancing the diversity and
accuracy of geographization.