Alignment photo ID with PyTorch. Instructions as easy as eating candy

Tram Ho

Introduction

Hello friends. This is the 2nd tutorial about PyTorch that I want to write a special way to share with the Vietnamese PyTorch Community. We hope that we will have a lot of inspiration when using this Framework. In today’s article we will learn about how to perform a very important step in the process of performing a digitization problem that is Image Alignment . This is a step in the pre-processing of the input image before extracting the information area for inclusion in the rear OCR system. And I would like to emphasize that it is very important, greatly affecting the outcome of the final OCR model. This method is also applicable to OCR problems that pattern input has a fixed format such as identification cards, identification cards, driver’s license …. OK let’s start.

Identity identification problem

This is not a new problem and quite a lot of you see this as a project to practice processing skills with Deep Learning models. I recommend that you practice this problem because it will help you get acquainted with many different types of models in Deep Learning such as Object Detection, Instance Segmentation, Optical Character Recognition. Its basic pipeline can work like the following flow:

  • Cropper , also known as image alignment, receives input from raw data. Crop the identity card container in the image. Use Geometric Transform to rotate the image in the right direction
  • Detector is used to detect the image components such as name, date of birth, etc.
  • Reader is an OCR module for reading text content from cropped components.

Finally, summarize and rearrange a complete identity card problem. OK now we get to the main part that will be solved today

Alignment

A common technique in image processing It is the process of converting different data sets into the same coordinate system. The pictures are taken from the phone, from the sensor taken by different angles. There are many methods to accomplish this such as using Feature Based or Template Matching. The ultimate goal, however, is still how to obtain a neatly organized image in the coordinate system for easy handling. Here with the ID problem you will need as in the following image:

A photo taken from any position will be rotated straight and neatly

So how to accomplish this, we can immediately think of Geometric Transformations of Images in image processing, namely Perspective Transformation – mapping 4 coordinates in the original image into 4 coordinates in the target image. We can see the classic example image as follows:

In it, the chessboard has been flattened in the new coordinate system, making it easier to handle. So, if you’ve chosen the solution is Perspective Transformation, then our next question is:

How to find 4 points on the source image? Specifically 4 corners of the identity card?

To answer this question, there are many ways, but the simplest way is to build a Deep Leanring model to learn the position of the four corners. So to accomplish that, we need to have data, right. Now we will learn how to do data offline.

Data preparation

Crawl data

You can proceed with crawl data from sources like Google Images or photos on FB. Suggestions for you can search for keywords such as finding lost items, falling papers … or on pawn pages, lending credit to be able to crawl more data. After preparing a small amount of data you save it into the same folder as follows

The next step we need to label the data offline.

Assign data labels

We use the LabelImg tool to conduct annotate data. On its homepage there is a fairly detailed user guide then you just need to install it. Open the current directory and proceed with the annotate.

Here we need to annotate 4 corners in 4 corresponding classes as follows:

The more you take the time to assign labels, the bigger the hand will be and of course the sweet fruit will also come because the model will have a better accuracy if the data is diverse and sufficient. If you are lazy, I would like to share a pre-labeled data demo (150 photos only) that you can download and use here . Of course, because of the small episode, just for demo, you guys want to be better then you have to work. If you do, you can eat hihi.

After finishing labeling, you will get a folder with corresponding XML files as follows:

Training models with Detecto

Detecto is a library written on PyTorch platform and very easy to use with the purpose for your lazy code, the least amount of code can still train the object detect model. Detecto supports you with Transfer Learning with custom dataset like our identity card dataset. This library can run inference on both images and videos. You can see the artwork

You can see or star the author here . Now, start the training step

Import library

Very simple you just do the following

Define dataset and classes

Extremely simple too. You just need to declare your directory path and declare the classes used in the previous annotate step.

So the step of defining and loading dataset has been solved for us by the Detecto library. You do not need to care about the format of the label file or the transform data anymore. Now we will go into the model training

Training model

Very simple with only 1 line of code.

The next thing is to wait

After training you save the model for further use. Very simple with 1 statement

If you are lazy to train, this is a demo model saved at the 30th epoch. Of course, the results are not good enough due to the recent training of some short epochs and small data sets.

Test the results

After saving the model, we check the results with another file downloaded from the web (not on the train set). Suppose for example this file.

We proceed to read the file

and predict how the model will look

The model’s output will return a list of how the labels , the positions of the boxes and are confident in the scores . Printing these values ​​will see it returns quite a lot of different boxes

Corresponding to the boxes

We can still draw positions of these boxes

The following results are obtained:

Commenting results

It can be seen that the model is still pretty much mistaken. But still out of 4 types of labels. We have the following solutions:

  • Create more data to train for more variety
  • Training longer models. 30 Epoch is quite little
  • Post-processing data with non_max_suppression

Here we will examine the 3rd and the above two ways, depending on how big your hands are . We will discuss this in the next section

Post-processing of results

As seen above, we need to post-process the results for the output to work. The first way we can think of is to combine boxes that have relatively overlap positions together and combine them and the same class. With this problem, the boxes are often quite far apart so almost two boxes with relatively close coordinates are usually the same class. If not, review whether the data has been mislabeled.

Non Max Suppression

We will merge the overlap boxes and reassign the labels corresponding to the merged boxes. We implement them with the following function:

We try to process data with boxes and labels above:

We will get the following result:

Now the boxes have been combined with only 4 labels. Rerun the code to produce the output image we have

OK now for the last part. We will perform Perspective Transformation to crop the identity card into the new coordinate system

Perspective Transform

Determine the coordinates of the source points

We first need to determine the coordinates of the source points based on the location of the detected boxes. We simplify this process by taking the center point of each box into the source point. Use the following function

Next we create the final_points list from the collection of boxes collected above

And to facilitate the next step we proceed to create a dictionary to map labels and boxes respectively

Transform to destination coordinates

We define this function as follows

Here fix the hard target coordinates is 500×300 image for close to the size of the identity card. Readers can change it. Next we proceed to transform

We get the following result

As you can see, this result is easier to handle. You can experiment a lot with newer images

Conclude

This article is quite simple with the hope that you can understand roughly the steps of handling Alignment for identity cards like. Using the Detecto library makes model training very simple and gradually makes us feel that using Deep Learning is also a tool to help us better program and solve problems, not Something terrible as people have thought. Have fun and see you in the following articles

Share the news now

Source : Viblo