The Bear Necessity of AI in Conservation
The AI for Bears Challenge results which aims to improve the monitoring and identification of bears using advanced computer vision techniques.
With rising global temperatures, the world is facing more and more natural disasters in the form of extreme drought and subsequent fires, as well as extreme rainfall and flooding. Amidst this climate crisis, it is the responsibility of every single organization to implement sustainable business practices. One company that aids organizations in becoming a more sustainable version of themselves is Metabolic.
Metabolic supports companies in a variety of ways, one of which is systems mapping, where the entire business and its impacts on the environment are mapped. As part of systems mapping, large quantities of product descriptions (with different levels of detail, different languages, and different contexts) are processed to estimate their environmental impact. Central Product Classification (CPC) codes and Life Cycle Assessment (LCA) classes are used as standardized methods for environmental impact assessment. This process, however, involves a lot of manual work and is often difficult due to the variety in the level of detail, the languages, and the context of the product description.
This raises the question: how can we automatically translate our diverse product descriptions to CPC codes and LCA classes? One option Fruitpunch AI was eager to try out was leveraging the upcoming field of LLMs and zero-shot learning. Curious to see how? Read on!
To carry out this challenge, three teams were established with a total of 20 AI engineers. Each team set out to explore different parts of the solution. All teams tried out different applications of LLM’s to see how they could be most valuable and effective.
The following teams worked on this challenge:
We worked with three different subsets of data, divided into different sheets, that presented information extracted from client invoices. Two of the subsets had columns with text representing the product that was listed in the invoice, and matching CPC codes or LCA names, and one of them was not matched beforehand.
The Central Product Classification (CPC) uses a hierarchical and decimal-based coding system. It's structured into different levels: sections (identified by the first digit), divisions (identified by the first two digits), groups (identified by the first three digits), classes (identified by the first four digits), and subclasses (identified by all five digits together).
For instance, the codes for the sections range from 0 to 9. Each section can be divided into up to nine divisions. At the second digit of the code, each division can, in turn, be further divided into up to nine groups. This pattern can continue with groups being divided into classes and classes into subclasses.
In total, there are 10 sections, 71 divisions, 329 groups, 1,299 classes, and 2,887 subclasses in the CPC system.
The first step in any data science project is data preprocessing. Data preprocessing is crucial to ensure the dataset's cleanliness and uniformity. This involved:
With the dataset preprocessed, the team moved on to feature extraction, valuable dimensions were added to the dataset using ChatGPT-generated content (GPT-3.5 Turbo). The following features were extractedL
An example of the resulting dataset can be found in figure 3.
In addition to enhancing our dataset's features, we also enriched our target labels, specifically the Central Product Classification v2.1 taxonomy, with explanatory notes. This taxonomy comprises over 4000 labels, and for many of these labels, explanatory notes were added. These notes offered detailed context and clarifications about the target labels, ensuring that our machine-learning models had a better understanding of the classification system. We assumed that this enrichment not only improved model interpretability but also aided in more accurate predictions and analyses.
Once the dataset was enriched, it could be fed into supervised learning methods to predict the CPC codes. For supervised learning, there are a variety of potential models, such as ClaudeAI, ChatGPT Code Interpreter, and Hierarchical models. However, due to the time constraint of this challenge, the team focussed their experiments mainly on Claude.AI
Claude.AI is a language model with one remarkable feature - a large context window. This unique characteristic allows Claude.AI to process and understand large files, making it an invaluable tool for businesses dealing with extensive textual data. To put its capabilities to the test, three intriguing experiments were conducted:
Claude.AI has an impressive 80% accuracy rate in matching client text to the correct CPC title when using the train/test classification. When using the zero-shot method, it may provide reasonable information but is prone to making errors. The good news is that it can handle large batches of data ranging from 100 to 200 records efficiently. Interestingly, the response size doesn't have a significant impact on errors and hallucinations; it's the method that plays a more crucial role.
Besides the more traditional supervised method, our teams also experiment with zero-shot learning. In zero-shot learning, the AI is fed data that belongs to classes that were not observed during training and is asked to predict these classes. The zero-shot team took a model-centric focus. Rather than modifying the data, they focused on finding the best ways to use existing models to process untreated data. This approach typically makes sense at the beginning of a project because it requires the least changes to the data and model. In a typical project lifecycle, this would be followed by data enrichment and fine-tuning. In our experiments, we looked at two approaches: prompt engineering GPT 3.5 and text retrieval using embeddings.
We used GPT-3.5-turbo, a conversational LLM, prompts to describe the task, and desired output and provide an example. The biggest limitation of this approach was the number of classes with lower CPC levels. In our experiments, the level 3 classes already exceeded the context window limit. As context windows in future models increase, this may be less of an issue.
Figure 5 shows the results of the zero-shot classification efforts. The results are split out on CPC level as well as whether a first, second or partial match was made. The findings demonstrate that prompting may face challenges when dealing with a high number of classes and limited context windows. The potential for built-in data enrichment is promising, and further refinement of prompt engineering could lead to more robust results.
Some variations of turning the CPC classes into vector embeddings and comparing their cosine similarity with the client texts was also attempted. The results of this section could still be useful as a baseline and alternative methodology. The experiments revealed that matching text to all classes outperforms matching only to level 1 classes. The observed discrepancies between client text and class embeddings underscore the need for more precise alignment strategies, such as splitting embeddings by level. Data augmentation was shown to enhance correct embedding matching, particularly in fine-grained classifications
Lastly, while averaging the more frequently occurring classes in the top matches showed promise in selecting the closest match from a pool of candidates, it was unable to rectify higher-level misclassifications.
The extensive experimentation will help the engineers from Metabolic in the development of their new Impact Assessment platform: Link. Automating impact assessment will provide quick and meaningful insights to companies that really want to implement change and contribute to a sustainable future. We are very happy that we could contribute to this cause.
We could not have done this Challenge without the amazing efforts from Alma Liezenga, Bram Cals, Rizdi Aprilian, Vivek V, Sabelo Makhanya, Jathin SN, Saloni Sharma, Dave Parr, Genrry Hernández, Mariano Lazarte, Obiageli Umeugochukwu, Yuri Shlyakhter, Graeme Harris, Muhammad Yahiya, Teodora Bujaroska, Dennis Beemsterboer, Freek Boelders, Justin Zarb