May 13, 2024

AI and Visualisations: A Data Driven All-Rounded Approach for Road Safety

Unveiling accident triggers: how we uncovered patterns and signals through Exploratory Data Analysis and Machine learning from a collision avoidance dataset.

Exploring the intersection of AI and road safety, highlighting a collaborative project aimed at leveraging data insights to enhance road safety measures. We delve into the use of AI-based Advanced Driver Assistance Systems and the analysis of various alert types to develop solutions like the Vehicle Risk Score and Hotspot Identification tools.

Roads are the lifeline of the modern economy. Life unveils itself in all its vibrancy on roads. From serving the very purpose of transport to facilitating unforgettable long drives, roads see umpteen stories unfold on its premises every single day. However, as with all stories, it is not always sunshine and roses. The same roads that serve livelihoods bear silent witness to mishaps that are the antithesis of life, i.e. accidents. Road accidents often have rippling effects, shatter more lives than visible, but, barring exceptional circumstances, are largely avoidable. The nature and scale of this problem are so dire that any remedy that alleviates this problem, is often welcome. This holds especially true for a country like India, which with its population of 1.3 billion and growing, needs to find scalable solutions for the 461.000 accidents that plague its roads every year.

Enter AI! While the talk of the town is LLMs and ChatGPT, the strength of AI dates long before the advent of these technologies. From defeating the Chess Grandmaster, Garry Kasparov in 1997 to defeating the top-ranked Go player Ke Jie in 2017, AI has often had its moments in history. Before you can get your hopes high, AI is not a magic wand that can single-handedly diminish the quantum of road accidents. AI Experts often underscore the importance of domain expertise while devising realistic solutions. Afterall, AI-based systems do not operate in a vacuum. With this perspective, the AI for Road Safety Challenge began as a collaborative project of IRASTE, FruitPunch AI and International Institute of Information Technology, Hyderabad. With 9 data professionals from different parts of the world, split into three teams, this collaborative project with the aim of obtaining insights that could pave the way for custom solutions to make our roads safer.

This challenge centred around Alerts, those warning bells that would be sent to the driver in the light of an anomalous event on the road. The Collision Avoidance System (CAS) and Driver Monitoring System (DMS) of the AI-based Advanced Driver Assistance Systems (ADAS), which consider various factors such as pedestrian traffic, vehicular traffic, and the speed of the vehicle to alert the drivers of potential collisions are at the heart of said alerts. The dataset provided comprised 1.8 million alerts collected over the period of a month. These alerts are classified as follows:

  1. Front Collision Warning (FCW): Alert for a potential collision with another vehicle in the lane; in front of the given vehicle
  2. Headway-Monitoring Warning (HMW): Alert when a distance to another vehicle in the lane is less than a safe distance to the driven vehicle at the given speed
  3. Lane Departure Warning (LDW): Alert when the driven vehicle moves out of a lane without using a lane-change indicator
  4. Pedestrian-Collision Warning (PCW): Alert for a potential collision with a pedestrian, in front of the driven vehicle
  5. Hard Brake Warning
  6. Asleep Warning
  7. Drowsy Warning
  8. On Phone Warning
  9. Distracted Warning
  10. No Seat Belt Warning
  11. Smoking Warning

The dataset, hence, contained different alerts and the environmental parameters characterising the alert. The following are the features of the dataset.

  1. GPS Coordinates, i.e. Latitude and Longitude, of the alert (Spatial)
  2. Date and Time of the alert (Temporal)
  3. Speed of the vehicle during the alert (Continuous)

With this dataset in place, we set out to understand and explore the data to the best of possibilities. It is worth noting that, in this challenge, there is no explicit goal such as developing a certain algorithm because the various stakeholders recognize the need for all-rounded solutions that can address the various facets of road safety as opposed to developing tailor-made solutions targeting a particular aspect. Hence, the participants were let to discover their own paths, all of which would ultimately lead to the goal of road safety.

Complementary Datasets:

While the data, in itself, was a great starting point to steer the analysis, we recognized the importance of supplementing the source dataset with complementary datasets. To this end, we used the following two datasets in our pursuit to identify more impactful insights.

  1. *Weather Data for the state of Telangana *– Everyone knows that driving on crisp sunny mornings is far more desirable than driving with rain battering our cars. We wanted to see if this anecdotal evidence of comfort could be translated into data-backed evidence on alerts. Hence, we obtained the data on weather parameters such as Rainfall, Temperature (Minimum and Maximum) and Humidity (Minimum and Maximum) from the Government of Telangana.
  2. *Data on health providers (all place categories matching hospitals or clinics) and education centre *– A comprehensive dataset on geographical coordinates, locality, region, name, and category of health providers and education centres of Nagpur was obtained through Overture Maps Foundation (OMF) Places Data Layer (obtained through Overture Maps).

Tools and Packages Used:

The following tools were used to facilitate the analysis and results that are described later on.

  1. Geopy: Geopy is a popular Python client for multiple geocoding web services. Using the *reverse *method which resolves a pair of coordinates into an address, facilitated obtaining location-specific insights.
  2. Folium: Folium is an interactive map tool which is a powerful Python library that helps create several types of Leaflet maps. These maps were helpful in visualising the alert locations and the hotspots.
  3. Streamlit: Streamlit is an open-source Python library used to rapidly build and share data science apps. This tool was used to implement the Vehicle Risk Score and Hotspot Mapping at Road level as web apps. 
  4. H3 Indexing: H3 Indexing is a type of multi-resolution grid system that is used to index coordinates into a hexagonal grid. In this project, Uber’s H3 indexing system at the Resolution level of 8, covering an approximate area of less than 1 km was opted for. This was to strike a balance between granularity and coverage.
  5. is a powerful open-source library developed by Uber Technologies for creating interactive and customizable geospatial visualizations.

With these datasets and tools in place, we then carried out the following types of analysis, to let the data tell its story.

  1. Exploratory Data Analysis (EDA): Very basic, yet very necessary. The insights obtained from said analysis validates a lot of popular observations. For example, due to lesser traffic at nights, vehicles may be driven at higher speeds. This was proven right in our analysis. However, while exploring the relationship between speed and alerts, though the average speed of LDW alerts was higher than that of HMW, it is worth noting that alerts are triggered at lower speeds as well. Hence, a considerable risk exists even while the vehicle is at lower speeds.

    Figure 1: Distribution of Speed across the different types of alerts i.e. (a) LDW (b) HMW
    While splitting the roads networks into Highways and Residential Neighbourhoods, it can be observed that neighbourhoods. To provide context, highways are an extensive network of roads that connect major cities within a state (State Highways) and different states (National Highways). These roads are generally marked by sparse civilian presence and high vehicle speed. The residential roads, as the name suggests, are at the heart of civilian activity and hence are not prone to higher speed.
    Figure 2: Spatial Representation of the Distribution of Alerts across Highways and Residential Neighbourhoods 
    Figure 3: Animation of trips in Nagpur developed using showing the highest traffic density on highways
    Link to full animation

  2. Alert Sequencing: A comprehensive analysis of the sequence of alerts was conducted with a specific focus on exploring the probability and patterns of one alert leading to another within a 5-second window. It can be observed that in many cases there is a significant probability of one alert being followed by another. For example, 25% of headway monitoring warnings (cas_hmw) are followed by another warning within a mere 5 seconds. This might be indicative of a behavioural pattern where the drivers just disregard the alerts and this calls for interventional measures. Interestingly, our analysis also yielded a surprising result: we found that a warning is never followed by a Hard Brake Warning within 5 seconds. This discovery contradicts our initial expectations, as Forward Collision Warnings would typically prompt drivers to brake hard in certain situations.

Figure 4: Heatmap of Alert Sequences Probabilities within 5 seconds

*Tools Developed: *

The following tools were developed leveraging the data.

  1. Vehicle Risk Score: Addressing the crucial need for personalised risk assessment, the team introduces an advanced Vehicle Risk Score solution. This tool calculates a risk score for each vehicle on a scale of 1 to 5, with 5 representing the highest level of risk. The risk score is not only dynamic but also customizable at a vehicle level, allowing stakeholders to filter and view risk scores between specified dates or for the entire month. This flexibility empowers fleet managers and city officials to tailor risk assessments to their specific needs.

Table 1: Risk Score Table

Figure 5: A snippet of the Vehicle Risk Analysis

  1. Hotspot Identification using Spatial Tools: Appreciating the need to be able to visualise the alerts from a geographical perspective, the following tools were developed.

a) Hotspot Identification using R-tree Spatial Indexing: A hotspot is defined as an area from where the number of alerts sent crosses the threshold (Number of Datapoints/1000). The algorithm for Hotspot Identification using R-tree Spatial Indexing was done as follows. Each coordinate is examined to determine the number of points that fall within a specified radius. This approach allows the algorithm to identify potential hotspots based on the density of nearby points. Furthermore, the algorithm retains overlapping hotspots without eliminating or merging them. This ensures a comprehensive view of all high-alert areas and effectively preserves the spatial information, providing valuable insights into the distribution and concentration of potential hotspots.

b) City-level Hotspot Identification Tool: Leveraging the capabilities of Streamlit and Folium, a dynamic tool for City Level Hotspot Identification was developed. This allows users to obtain views at an alert type, district, city, or road level. Stakeholders, including city officials, can strategically utilise these views to identify high-risk spots in a city or on specific roads

Figure 6: Clustered Hotspot Analysis
c) Spatial Distribution of Alerts using H3 Indexing and Using the demo, (which allows users to directly upload data sources and analyse data), animations depicting the trips of each vehicle throughout a day were generated. In addition, a Spatial Risk Index was formulated by combining the total occurrences of alerts within specific spatial aggregation areas with the counts of hospitals and schools in those regions. The expectation was that areas with elevated risk would exhibit characteristics such as a high number of alerts, a limited number of hospitals, and several schools. This spatial risk index provides a holistic perspective for prioritizing interventions and resource allocation in enhancing road safety
Figure 7: Spatial Distribution of Risk Index in Nagpur


Having done all this, we will now elaborate how our analysis would help the different stakeholders.

The drivers: The heatmap shown in Figure 4, presents a picture of why we still need to engage with drivers. It can be seen that most alerts are followed by another alert in the short span of 5 seconds. This can be translated as the drivers not paying attention to the alerts and continuing to engage in risky behaviour. Thus, more robust measures need to be implemented to ensure that drivers are rather forced to abide by the rules and not engage in risky behaviour.

The Policy Makers: While the spatial risk index provides a holistic perspective for prioritising interventions and resource allocation in enhancing road safety, the Vehicle Risk Score empowers fleet managers and city officials to tailor risk assessments to their specific needs. In another instance of hotspot analysis, it has been observed that HMW alerts occur most frequently in urban areas, LDW alerts are frequent on highways while PCW alerts are rampant on public hotspots such as bus stands and hospitals. This helps policy makers make tailored decisions for improved road safety instead of a single decision that might not be robust. Even the basic EDA that has been carried out can lead to improved safety measures on highways, marked by intense traffic density.

Engineers, both Data and Transport: The analysis also resulted in several glaring observations such as a warning is never followed by a Hard Brake Warning within 5 seconds and Hard Brake Warnings consistently occur at a speed of 0. While Forward Collision Warnings would typically prompt drivers to brake hard in certain situations, it is unlikely that Hard Brake warnings would occur at the speed of 0. This suggests a potential discrepancy or inconsistency in the data. Exploring this anomaly further is essential to ensure the overall reliability and integrity of the dataset. It is hence imperative that both Data and Transport Engineers come together to address the accuracy of the data collected. 

With these findings in place, we hope the issue of addressing road safety through the perspective of AI has been addressed. It is interesting to note that having presented with the same dataset, the three teams have come up with three different solutions that, however, echoed the same sentiment. Road Safety is a broad subject that needs tailor-made solutions and not a one-size-fits-all solution. We hope this is the first of many steps that leverage the power of AI in improving the roads for all of its users. 

Author: Sharadhi Alape Suryanarayana

Artificial Intelligence
Deep Learning
Explainable AI
Time Series Forecasting
Subscribe to our newsletter

Be the first to know when a new AI for Good challenge is launched. Keep up do date with the latest AI for Good news.

* indicates required
Thank you!

We’ve just sent you a confirmation email.

We know, this can be annoying, but we want to make sure we don’t spam anyone. Please, check out your inbox and confirm the link in the email.

Once confirmed, you’ll be ready to go!

Oops! Something went wrong while submitting the form.