Bacteria classification using the fastai library

Tram Ho

Introduce

fastai is a modern deep learning library that provides high-level APIs to help AI programmers install deep learning models for problems like classification, segmentation … and quickly achieve good results with just as little as possible. few lines of code. In addition, thanks to its development on the Pytorch library platform, fastai also provides low-order components for researchers to develop new models, as well as fully compatible with pytorch components.

In this article, I will introduce some features of fastai and apply them to build a classification model. Let’s get started !!!

Install fastai

You can install fastai on your device with the following command:

Once installed, run the following code to import fastai and necessary libraries:

When importing fastai, some popular libraries such as numpy, pandas, matplotlib are also imported, so there is no need to re-import.

Data

I will use the data snapshot from website bacteria this . You can download data straight from the website to your computer and then unzip or if anyone uses google colab, you can use this code:

Our data consists of 692 images:

Create Dataloader

fastai provides an API for creating pytorch’s Dataloader simply and quickly

The above command will return the DataBlock object. Let’s find out what each parameter is used for

  • block : Defines what the Dataloader will return. Since our problem is a classification problem, the Dataloader will return two things: its corresponding image and label.
  • get_y : how to get the label from the filename. An image’s label is part of its filename. fastai provides a RegexLabeller class that uses a regular expression to separate labels from filenames. For example:

  • splitter : dividing dataset into 2 sets of train / validation
  • item_tfms : Because the image is of different sizes, we need to resize it to the same size before we can pack in batches.

After having Datablock, just run:

along with the following parameters: data source (list of image files) and batch size. The above method will return a Dataloaders object. As the name implies, Dataloaders includes many Dataloaders (1 train and 1 validation). People can index into dls to access the dataloaders: dls [0], dls [1].

We can check how many classes the data set has:

Training

Model training is handled by the Learner class. With classification problem you can create Learner by function cnn_learner :

The parameters include:

  • Dataloaders
  • CNN architecture. Here I use Resnet50 but people can use CNN pretrain networks available on torchvision
  • List of metrics

When using the pretrain model, learner will automatically add some Linear classes at the end of the CNN section

By default, the weight of the CNN will be frozen and not updated during the training.

The training model is very simple:

fit_one_cycle(8, 1e-3) will train model for 8 epoch using a 1-cycle policy . If you don’t want to use the learning rate scheduler, you can use the fit method

After just 8 epochs, the accuracy will reach 98.5%. Now we will break the ice to train CNN:

To see the results of the model, everyone can run learn.show_results()

The whole process from loading the data to the train model takes less than 10 lines of code

Epilogue

Above, I have instructed you to install bacteria classification model with 98.5% accuracy using the fastai library. With less than 10 lines of code, we have passed the 97% SOTA results on this dataset (you can check here ). If you find this article useful, please leave me an upvode. Thank you all for your interest and see you in the next articles.

Share the news now

Source : Viblo