Pytorch Training: Indoor Image Classification (from MIT data)

This project was developed using ".ipynb" files that were written in Portuguese, therefore, its repository contains the files in this same language.

To access the project repository, with all it's files in Portuguese, click HERE

The ".ipynb" used to develop the project is called "pytorch_indoor_img_class.ipynb". You can also click onto the Google Colaboratoy Link for project visualization

This page will present, in English, the project's goal, the process used to develop it and the results.

Project Goal

The project has the goal to use Pytorch to the classification of indoor images.
The data was collected from an MIT website

Project Development

The technical development of this project was divided into three phases:
- Data Load
- Pre-processing data
- Pytorch Model application

Data Load

The data was downloaded directly from the MIT website, using specific codes, without the need to download it on the computer:
- Code to download the dataset: !wget http://groups.csail.mit.edu/vision/LabelMe/NewImages/indoorCVPR_09.tar
- Images unpacking: !tar -xvf indoorCVPR_09.tar
After this process, the images were available for download
The whole data contains 67 indoor categories, and a total of 15620 images. The number of images varies across categories.
An example of the class "winecellar" can be seen below

Pre-processing data

The model used to train the data is a pre-trained model, but before applying it, there were some pre-processing modifications applied to the data.
It was used pytorch transforms to implement DataAugmentation, in order to have more variations of images in a same category
The dataset was fragmented into training and test, where the test received 25% of the total data.
Once the DataSet was ready, the train and test DataLoad items were created, they are the ones used by the model to execute the image classification.
After that, it is possible to observe, in the graph below, the amount of different categories that exists in each training and test phase

It is possible to observe that the categories are "unbalanced", there are ones that have many more images, compared to others. It is possible that the ones with fewer images will present difficulty in the classification phase.

Pytorch Model application

To start the process of classification it was used as a pre-trained model called "Resnet 18D"
This model was obtained through Timm's library
After that, the model was trained for 10 epochs and each time the results were saved in order to obtain the best accuracy of all.
The final model chosen was the one that presented the best accuracy of them all during the test phase

Results

The best model presented the following results:
- Training loss: 0.1856
- Test accuracy: 0.6804
- Test F1: 0.6758
The image below shows the confusion matrix for the results that were obtained

Conclusion

In possession of the results, it is possible to conclude that the model presented acceptable results, with accuracy above 67%, but not excellent.
The group of images that presented greater problems in the classification was the category "41" (livingroom), being mostly confused with the "6" (bedroom)

Share on

X (formerly Twitter) Facebook LinkedIn

Maria Eduarda E. Neves