Sunday, August 28, 2022
Edge Computing
Edge Computing
AI for Wildlife
AI for Wildlife

Model Optimization and Pruning of Poacher-detecting YOLOv5

Optimizing a YOLOv5 model for NVIDIA Jetson Nano to increase the inference speed and reduce memory footprint, focusing on inference speed not the absolute mAP.

This is an account of the 10-week teamwork of the AI for Wildlife Challenge 3 engineers working on a poacher-detecting model running to edge hardware of an autonomous drone. The group was split into 4 teams: Model Optimization, CI/CD, Hardware and Autonomous Flight. In this article, you'll learn about the results of the Model Optimization team.

The Model Optimization team’s goal for the challenge was to optimize the YOLOv5 model for the NVIDIA Jetson Nano, a small computer for AI IoT applications, to i) increase the inference speed and ii) reduce memory footprint. Our focus was mainly on the inference speed, not the absolute mAP per se.

We started by reviewing literature and our preliminary research resulted in 4 potential paths to explore:

  • Convert the YOLOv5 model to TensoRT format Which could be used to get better optimisation 
  • Look at PyTorch native optimizations 
  • Apply Ultralytics’ optimisation / pruning techniques. Review YOLOv5 repo and find out what exactly are they doing in terms of optimisation / pruning while training and conversion (eg. mixed-precision training?) in comparison with the native PyTorch optimization toolkit
  • Review ONNX conversion and optimizations to convert the trained YOLO model to ONNX and use ONNX optimisation toolkit

We experimented with optimizing sparse models with ONNX. For CPU inference, DeepSparse Engine produced a speedup. However, it was much slower than native PyTorch for GPU.

We ran experiments with NeuralMagic and Nebullvm recipe based optimization libraries. The former  did not produce significant improvements of the results, the latter proved to be a lot to work with when setting up the environment.

“At the beginning of the challenge, I felt I did not belong or not as skilled or knowledgeable as the other members. I could barely understand the jargon and how I would be of value to the team or the challenge. However, through engagement and asking questions (everyone was friendly and helpful), I quickly understood that FruitPunchAI challenges are about learning, impact and networks. I came to understand it is a platform to enhance my DS / ML skills and our society with AI. By the end of the challenge, I had gained confidence and an appreciation of how not knowing is an opportunity to learn. It is also encouraging that our contributions will be helping the rangers.” - Sabelo Mcebo Makhanya, Model Optimization Team

We also tried converting the YOLOv5 models from the previous challenge to the Tensor RT Engine INT8 calibrated to the Jetson Nano 4GB itself. It failed. TensorRT engines turned out to be hardware-specific. One cannot convert a model to INT8 on some device and run an inference with it on the Jetson Nano. However, we could build and run FP16 on the Nano.

Our results showed the YOLOv5 Small with image size 640 x 640 in FP16 mode was the most sensible to be used on the current dataset and a desktop GPU.

Final Jetson Nano Results

TensorRT experiments, 10 Watt mode

5W mode vs 10W mode - YOLOv5 Small FP32

The results showed that we don’t have to use the higher power mode. We can use the lower power mode without sacrificing performance or accuracy. 

We concluded that YOLOv5s was the better choice given the accuracy and inference speed from the results. Input image size 640 x 640 is suitable for the current dataset(s). FP16 precision was a go to, since we didn’t lose accuracy while boosting inference speed. On the Jetson Nano, TensoRT was the best choice with the best performance of all the optimizations we tried. It’s also great for CI/CD automation as for the codebase.

Next Steps

As for the next steps it would be worthwhile to explore different hardware accelerators. On-drone tests should come handy to test the baseline and define SMART goals for further model optimization. And with the YOLOv5 architecture constantly upgrading, there’s always potential to explore structured pruning.

Model Optimization Team

Sahil Chachra, Manu Chauhan, Jaka Cikač, Sabelo Makhanya, Sinan Robillard

AI for Wildlife 3 Engineers:

Model Optimization - Sahil Chachra, Manu Chauhan, Jaka Cikač, Sabelo Makhanya, Sinan Robillard

CI/ CD - Aisha Kala, Mayur Ranchod, Sabelo Makhanya, Rowanne Trapmann, Ethan Kraus, Barbra Apilli, Samantha Biegel, Adrian Azoitei

Hardware - Kamalen Reddy, Estine Clasen, Maks Kulicki, Thembinkosi Malefo, Michael Bernhardt

Autonomous Flight - Thanasis Trantas, Emile Dhifallah, Nima Negarandeh, Gerson Foks, Ryan Wolf 

How did the CI/CD, Hardware and the Autonomous Flight teams approach the problem of the Challenge?

Discover more in the full version of the AI for Wildlife 3 case study.

Read full version
Subscribe to our newsletter

Be the first to know when a new AI for Good challenge is launched. Keep up do date with the latest AI for Good news.

Thank you!

We’ve just sent you a confirmation email.

We know, this can be annoying, but we want to make sure we don’t spam anyone. Please, check out your inbox and confirm the link in the email.

Once confirmed, you’ll be ready to go!

Oops! Something went wrong while submitting the form.

Previous publication

You are reading the most recent publication!

Return to publications page

Next publication

You have reached the end of our publications, you must love our content!

Return to publications page
GIS
GIS
Remote Sensing
Remote Sensing
AI for Earth
AI for Earth
Explainable AI
Explainable AI
Time Series Forecasting
Time Series Forecasting
AI for Health
AI for Health
Autonomous
Autonomous
TinyML
TinyML
MLOps
MLOps
Edge Computing
Edge Computing
AI for Wildlife
AI for Wildlife