Machine Learning: Visual Identification of Damage Box

This project was developed using ".ipynb" files that were written in Portuguese, therefore, its repository contains the files in this same language.

To access the project repository, with all it's files in Portuguese, click HERE

The ".ipynb" used to develop the project is called "machine_learning_damaged_boxid.ipynb". You can also click onto the Google Colaboratoy Link for project visualization

To be able to execute the .ipynb file, it is necessary to execute a process.

Access the Google Drive link to download "dataset-desafio" zipped folder
Unpack the "dataset-desafio" folder
Add the .ipynb file to the "dataset-desafio" folder
Add the "dataset-desafio" folder to the google drive, following the procedure:
- gdrive/MyDrive/Colab Notebooks
Inside the "dataset-desafio" folder, you will have a folder called "dataset-desafio-MEduarda" that will contain all the results presented when the program was run by me, such as images generated by augmentation, within their respective folders, and the checkpoints saved.

This page will present, in English, the project's goal, the process used to develop it and the results.

Project Goal

The project has the goal to develop machine learning models which purpose is to identify and classify damaged packets from pictures
Three models were generated as a comparison metric, to identify the one that best met the objective
The data were given during the post-graduate course and the images were generated artificially

Project Development

The technical development of this project was divided into four phases:
- Data Load
- Pre-processing data
- Model Training
- Interpretability Map

Data Load

The data was loaded correctly, as explained above and inserted into "gdrive/MyDrive/Colab Notebooks"
The dataset presents images of "intact" and "damaged" medicine boxes, in two directions: top and side
The complete DataSet contains the data separated into training and test:
- The "Interpretability" folder is the folder that contains the test files. Contains a total of 20 damaged type images and 20 intact type images, half being the "top" type and half the "side" type
- The folders "DAMAGE" and "INTACT" contain the training images for damaged and intact boxes, respectively. There are 180 images of each type, being 90 "top" and 90 "side" in both cases

Pre-processing data

The models used to train the images are pre-trained, however, before being applied, it is necessary to perform pre-processing elements on the images
The "Pytorch transformers" was used to apply DataAugmentation in the DataSet, this provided that images, within a same category, had a greater variety, allowing better classification
With the DataSet ready, started the process to create the training and test DataLoad, these are the elements that will be used by the model for classification

Model Training

Three pre-trained models were used to start the classification process:
- Resnet 18D
- DenseNet 121
- SeresNet 50
The models were obtained through Timm's Library
Then the models were trained for 15 epochs in order to achieve good classification results
Finally, the parameters of the models that presented the best results at a given epoch are chosen and they are saved for definition of the final model

Interpretability Map

In addition to the provided images, the dataset also contained the bound-box for each of them
- The bound-box of the image is that region which most influences its classification
The idea of the interpretability map is to generate, from the predicted classification of images, a heat map in order to visualize the most influential regions.
It is considered that the model was more important/ relevant when there was a coincidence between the bound-box and the relevant regions of the interpretability map

Results

The images below represent the graphs for accuracy and loss through the epochs for all three models that exists the different positions of images (top and side)

With the parameters that presented best values at a given epoch, it is possible to obtain the accuracy and f1 values for each of the different models.
The table and graph below present a summary of these results:

MODEL	TOP OR SIDE	TEST ACC	TEST F1
resnet18d	top	0.50	0.479167
resnet18d	side	0.95	0.949875
densenet121	top	0.50	0.494949
densenet121	side	0.80	0.791667
seresnet50	top	0.50	0.479167
seresnet50	side	0.85	0.846547

Finally, it is possible to observe the confusion matrix for each model of classification that was generated

Interpretability Map

The images below represent examples of how the bound-box fits into the original images, in the given DataSet. The first row represents the DAMAGE items, and the second, the INTACT. Both of them are from the TOP of the box.

The Interpretability Map was made for each model that existed. Below there is a representation, for exemplification value, of the "ResNet 18D" model, from the TOP images. The first row represents the DAMAGE items, and the second, the INTACT.

Conclusion

The Resnet18 and SeresNet50 models presented better results for the classification of the data, obtaining greater accuracy and F1 value in the test phase, having greater ease to identify elements in the side position, compared to the top position elements
Through the confusion matrix it is observed that, for the three models, it is easier to perform the classification when the element is in the side position. In addition, the greatest difficulty of classification occurred with the elements of label intact classifying them erroneously as damage
None of the models presented better results for top position images, as observed due to low accuracy. In addition, the Resnet and SeresNet models have a difficulty, in this case, to classify images of type damage, placing them as intact.
Regarding the Interpretability Map, a similarity was observed between the models presented, in which the region of interest on the maps is closer to the bound-box in cases where the image is damaged. For the Intact ones, the region of interest if more different compared with the bound-box
- The models were consistent when presenting the results, regardless of the labels (damage or intact) or the seen side (top or side). While models 01 and 03 (ResNet18 and Se-ResNet50, respectively) were able to better identify the region of interest, fitting it better in the bound-box, model 02(DenseNet121) presented the worst result.
- An error that was frequent in all models and even in the variation of side, top, was the influence of the background image, in some with a greater influence than another. Often the program has identified a region of interest outside the bound-box very strong, pointing out that it was giving some importance in the background of the image and not only in the box, as should be the ideal.
- These maps can be used to test new models in the future:
  - Those tests can be made to the exclusion of the image's background, before the training, so that they will not have influence in the image's decision region
  - Besides that, DataAugmentation can be used to increase the amount of "INTACT" images since they were harder to identify

Share on

X (formerly Twitter) Facebook LinkedIn

Maria Eduarda E. Neves

Project Goal

Project Development

Data Load

Pre-processing data

Model Training

Interpretability Map

Results

Interpretability Map

Conclusion

Share on