Developing a CI/CD pipeline that automatically retrains the existing poacher-detecting model, which can be deployed to the drone with improved performance.
This is an account of the 10-week teamwork of the AI for Wildlife Challenge 3 engineers working on a poacher-detecting model running to edge hardware of an autonomous drone. The group was split into 4 teams: Model Optimization, CI/CD, Hardware and Autonomous Flight. In this article, you'll learn about the results of the CI/CD team.
The aim of the CI/CD team was to develop a functioning CI/CD pipeline that automatically processes new data to train an existing model, which can be deployed to the SPOTS drone with improved performance.
In order to do so, we established following subgoals:
Create a CI/CD pipeline to automate the building, testing and deployment of the code used to train the model.
Create an MLOps pipeline to run the complete machine learning workflow - fetching data, pre-processing, model training, evaluating, optimizing and deploying - using code deployed from GitLab.
Make the MLOps pipeline to be triggered manually by a user with a simple click on a button, if possible.
Make the pipeline abstract, loosely coupled with the infrastructure it runs on. This should make it possible to transfer it from one cloud service provider to another, or to an on-premise set-up with minimal overhead.
We started off with research into the available technologies. We went for GitLab to create the CI/CD pipeline. This is where we also stored the code and converted it to docker images. These images were stored in an IBM cloud registry which is called by the components in Kubeflow. These components form the machine learning pipeline that gets the data; preprocesses it; trains; evaluates; and then stores the model in Google Drive.
“The challenge not only improved my machine learning and coding skills but led to my growth as an individual. I learned how to better work with people, to collaborate with team members from all over the world in a virtual environment, all that while balancing other areas of my life. I’m glad I could contribute to saving endangered species.” - Barbra Apilli, CI/CD Team
Easy-to-use MLOps Pipeline
After a thorough consideration of how the user will interact with the system in the future, we created a workflow:
The user first uploads new data to Google Drive.
The data goes to the Kubeflow dashboard, starting a new run.
After the run, code in the pipeline uploads the model to Google Drive.
This is where it’s fetched from and uploaded onto the drone.
When a new run is started, code from GitLab is converted into docker images by a process known as containerisation. Code for the different machine learning components resides in individual branches of the GitLab repository. Each branch contains code informing Kubeflow about the pipeline structure and the paths to the inputs and outputs of each component. This code is known as pipeline definitions. Any modification of the code triggers the CI/CD pipeline to generate a new image with updates. This image is then pushed to IBM Cloud Container Registry.
The MLOps pipeline in Kubeflow retrieves these images and runs the code in the 4 machine learning workflow components:
Data Retrieval contains the code that fetches the newly uploaded data from Google Drive.
Data Preprocessing handles the separation of uploaded data into different datasets to be used in the training of the model - training, validation and testing dataset.
The data was cleaned - grayed and blurry images (presumed to be large bodies of water) were removed since they didn’t contain relevant information.
The cleaned data was augmented.
The datasets directory structure was then modified to fit the YOLOv5 models.
Model Training. The preprocessed training dataset is used to train the model. The training code is obtained from the official Ultralytics YOLOv5 docker image.
Model Deployment. When the pipeline run has finished, the trained model is pushed to Google Drive.
The Kubeflow framework can be deployed on all major cloud platforms as well as on-premise. The cloud platform we used was IBM Cloud Kubernetes Service.
The biggest challenge for us was to grasp the GitLab CI/CD and Kubeflow pipelines and to ensure that all the technologies worked well with each other. We were able to create the pipeline, yet there is still room for improvement.
We’ve outlined the follow-up path for the next CI/CD team to start with:
Implement model evaluation and create tests for the different components.
Fetch an available pre-trained model from Google Drive to start a new run.
Automate the manual processes within the pipeline so non-technical users don’t have to worry about picking up the tech.
Ensure a safe and secure storage of credentials used to run the pipeline.
Create a web application to trigger the machine learning workflow.