February 10, 2023

How we detect oil spills on open sea and support response teams

By using computer vision and segmentation we aim to assist drones of the Rijkswaterstaat response team in a quicker oil clean-up that needs fewer chemicals. See all results 🚀

Impact on marine life

These smaller, often unnoticed spills cause long-lasting effects on marine organisms as the oil spills are not detected in a timely manner, and many organisms can be oiled or poisoned in the meanwhile. Birds and marine mammals are severely affected by oil spills as they need to make regular contact with the surface of the water, where much of the oil floats. These organisms run the risk of ingesting  or inhaling oil, or getting covered in it—all of which are harmful. When an oil spill reaches the shore, it affects the lives of birds and mammals that nest on the coast as well.

Need for timely response

When oil spills happen, a fraction of the oil evaporates quickly, but the rest of the oil weathers and forms a persistent oil slick on the water surface, which can spread out in the direction of the wind. It becomes difficult to clean up oil spills once they have dispersed into very thin films and broken down into smaller particles. So it is important to begin cleanup as fast as possible to reduce the impact of a spill.

In addition to being a menace to marine life and the environment, small and sometimes deliberate oil spills are an economic burden for governments. Detected spills are usually cleaned up by oil spill response groups tasked by government agencies and they require massive resources to perform their duties.

AI against Oil Spills Challenge

Rijkswaterstaat, the Directorate-General for Public Works and Water Management in the Netherlands, is are responsible for water quality and  cleaning up oil spills in inland waterways and ports.  They teamed up with FruitPunch AI to improve their response speed to spills  in the AI against Oil Spills Challenge. 

A group of 15 AI for Good engineers from different parts of the world joined hands in this Challenge. We split up into two separate subgroups, each focusing on a different type of oil spill: “Inland Team” for spills in inland ports and waterways and “Sea Team” for spills on open sea, and started our 10-week crunch.

Our goals for this challenge were three-fold, namely: 

  1. Detect the presence of an oil spill and map the area containing the oil spill in images captured with drones (for inland waterways) or satellite images (for seas).
  2. Classify the oil spill areas based on thickness and identify the areas containing the “thickest” portions.
  3. Estimate the volume of oil spills as per the BAPOL agreement.

This blog describes  the journey of the Sea Team in detecting oil spills in open seas and oceans.

Dataset hunting

We identified a source for potential oil spill data in the US Office of Satellite and Product Operations - NESDIS Marine Pollution Products; but we found that it lacked the associated satellite images which serve as input for training machine learning models. So, we set out on our journey of finding readymade datasets that could help us in our task. 

We soon realised that oil spill detection from satellite images is a budding research area, and no “curated” oil spill datasets exist in the public domain. So we took it upon ourselves to curate an open-access dataset based on Marine Pollution Surveillance Reports (MPSR) from US NESDIS Marine Pollution Products and make it available for the research community and data science enthusiasts.

We initially identified 800+ oil spill events from the archives of NESDIS Marine Pollution Products; during the course of the challenge, we expanded the list to 2200+ covering potential oil spills in the North American region and sourced from 20+ satellites. 

Research phase

Since multispectral images (MSI) were key to the estimation of thickness and no large scale implementations were available, we decided to create a baseline using synthetic aperture radar images (SAR) from Sentinel 1 and then compare it with the results from MSI images from Sentinel 2 and Landsat 8. Based on the availability of time and resources, we decided to explore a few of thickness estimation ideas on MSI images.

Though different research papers used different metrics, we decided to evaluate the performance of our models on the basis of two F-measures, namely: Intersection over Union (IoU, aka Jaccard Index) and F1 Score (aka Dice Coefficient), as they were the most suited for our use case: our datasets have a high imbalance between the oil spill and sea class.


Satellite images are well known for their vibrant false color rendering, but they can be difficult to work with in the oil spill context as each satellite product was huge in size (1 to 2 GB) and needed complex preprocessing before they could be used in machine learning models. The actual oil spills were restricted to a tiny area within the large image captured by the radar. 

We found a savior in Alaska Satellite Facility, which provided a cloud-based API (HyP3 SDK) that can perform preprocessing—such as Radiometric Terrain Correction and Speckle Filtering—at the push of a button on all Sentinel 1 products. 

We used Level-1 Ground Range Detected (GRD) products and chose VV polarization for training deep learning models. We tried several semantic segmentation architectures for the Sentinel 1 pipeline and then limited our search space to UNet++, DeepLabV3, and DeepLabV3+ with EfficientNet (B0-B3) backbone initialized with ImageNet pre-trained weights, as they provided the best results. The best metrics obtained on the test dataset were an IoU of 0.65 and an F1 Score of 0.79.

New spectral indices for MSI images

Based on the analysis of spectral reflectance across all bands we were able to identify two new spectral indices, one each for Sentinel 2 and Landsat 8, that help in the quick identification of oil spills from MSI images and also provide compact feature representation for use as input to train machine learning models. 

With further rigorous statistical analysis and testing on a geographically diverse dataset, the following indices can be officially rolled out for use by the general public (the numbers in the formulae represent the band numbers):

New oil spill index was chosen as input for training semantic segmentation models; whereas for the Landsat 8 pipeline, the new oil spill index created for Landsat 8 MSI was chosen as input. The following table shows a comparative view of different choices for band combinations and the effectiveness of options using a newly identified spectral index:


We trained several models within the short time frame of 10 weeks. The models that provided the best results for IoU and F1 Score on test dataset are shown in below figure:

The following chart compares the performance of different segmentation architectures used in the Sentinel 1 pipeline with regards to IoU and F1 Score on the test dataset (Encoder, Encoder Weights, Batch Size, etc were kept the same as the best performing model): 

The following are few samples from predictions made on test dataset for different pipelines: 
Sentinel 1
Sentinel 2
Landsat 8

Key Takeaways

The models based on SAR images (Sentinel 1) gave better results in this Challenge, when compared to the models based on MSI images (Sentinel 2 / Landsat 8). This seems to be in line with the fact that the majority of literature on oil spill detection is based on SAR imagery. 

However, it is also important to note that the number of samples available for MSI imagery pipelines was only 40% - 60% of the number of samples available for SAR imagery and this may have also played a major role in the metrics of MSI imagery-based models being lower.

Since we used a dataset that is based on oil spill events around the United States, the resulting model  may be biased toward the conditions prevalent in North America. We think it is important to create a more geographically diverse dataset that contains oil spill events from different regions of the world. 

In addition, it is essential for spill response teams to define a workflow in their process to add shape files and satellite images collected during their cleanup operations to a common oil spill dataset. This will greatly aid in the creation of sufficient data samples for building accurate machine learning models in the future.

Future Direction

We came up with several other ideas and approaches that could not be implemented in this  AI for Good Challenge due to time and computational resource constraints. A few of the notable ideas are listed below for exploration in future Challenges or projects:

  1. Training semantic segmentation models on de-correlated RGB bands of Sentinel 2 and Landsat 8 (oil spill areas only) and segmenting thicker oil spill areas.
  2. Training semantic segmentation models using Level-2 Science Product - Provisional Aquatic Reflectance data from Landsat 8 (link).
  3. Training binary semantic segmentation models on oil spill look-alikes in Ten Geo Phenomena SAR Dataset (link) with 37000 samples (the number of samples with oil spill look-alikes may be much smaller than this) and re-purpose the model for oil spill detection by fine tuning it on smaller size oil spill dataset.
  4. Exploring options to utilize MARIDA: Marine Debris Archive (MSI Data) (link) in a way similar to Ten Geo Phenomena SAR Dataset and fine tune Sentinel 2 model.
  5. Harmonize MSI imagery across multiple satellites to create a single consolidated dataset and training models on the harmonized dataset.

Personal Experiences


I think the magic of this Challenge lay in the fact that it was an A to Z type of problem: we started with a problem statement, some initial data, and little prior knowledge and had three difficult and ambitious goals ahead of us. Because we had to find most of the data ourselves, build multiple machine learning pipelines, perform data analysis, and convey meaningful information from our results to stakeholders all with our own (small) team within the timeline of the Challenge, it meant that we had to overcome a lot of challenges along the way. This is probably the reason that we learned as much as we did. Because of the intensity and expectations we had of our project, we had to adapt to different scenarios, different domains of expertise, and different workflows that we encountered throughout the weeks. 

I feel like all of us gained a lot of new knowledge and many new insights through this. Myself, since the start of the AI against Oil Spills Challenge, I have acquired knowledge about programmatic aspects of Geographic Information Systems (GIS), as well as data collection (and the importance thereof in domain-specific problems), machine learning pipelines - in particular segmentation models that were at the heart of our pipelines, and collaborative working in a multi-disciplinary team.

All in all, this has been an inspiring Challenge for me and it took me through  a period in which I learned multiple new skills and further developed several other ones.


This was personally a difficult challenge for me and was out of my comfort zone, given my limited knowledge of satellite images and machine learning, as well as my other commitments. Due to the different levels of experience and expertise of each member, we were moving at our own pace. 

Building production-ready machine learning models from scratch in only 10 weeks with a team of 5 was most definitely challenging, however, with the team’s hard work, we were able to deliver the results. Through the AI against Oil Spills Challenge, I had the opportunity to get my feet wet in the AI/ML space, segmentation models, as well as in image processing, with the help and patience of my team members. 

I believe what brought us all together was our interest and passion in machine learning and how to use it towards what we care about—real-life problems. The challenge was most definitely difficult yet inspiring and most importantly, I learned a lot about myself and what I know and didn’t know, so that I can further improve my skills and knowledge for upcoming Challenges!


This Challenge gave me an opportunity to apply machine learning skills to solve meaningful real-world problems and to also enhance my skills in the process. The masterclasses provided by Rijkswaterstaathelped us learn the foundations of the domain in a very short time. One of the best aspects of this Challenge was the coming together of learning-oriented persons with varied skill sets and levels of experience. This helped me learn new skills from my peers, gain intuitions of various ML concepts by working closely with experienced team members, and bounce ideas, not just within our subgroup, but also between the two subgroups in this Challenge. Having no prior knowledge of GIS, it was challenging to work with imagery from multiple satellites, but at the same time, it was exciting, interesting, and very fulfilling. Though ours was a very small team, everyone chipped in at the right time with the right contribution and helped meet our key objectives. More than anything, contributing towards a good cause gave me great satisfaction and I have to thank FruitPunch for coming up with this wonderful AI for Good platform.

AI for Oil spills Members: Aliaksandr Hancharenka, Agustin Iniguez Rabago, Chi Nguyen, Emile Dhifallah, Ponniah Kameswaran, Bram de Wit, Leonardo Iheme, Resham Sundar, Sahil Chachra, Shubham Baid, Timothy Malche, Muhammad Uzair Ghous

AI for Earth
Remote Sensing
Deep Learning
Challenge results
Subscribe to our newsletter

Be the first to know when a new AI for Good challenge is launched. Keep up do date with the latest AI for Good news.

* indicates required
Thank you!

We’ve just sent you a confirmation email.

We know, this can be annoying, but we want to make sure we don’t spam anyone. Please, check out your inbox and confirm the link in the email.

Once confirmed, you’ll be ready to go!

Oops! Something went wrong while submitting the form.