Rice detection system for face recognition (Part 1)

Tram Ho

As introduced, this first part I would like to talk about the AI ​​system for the system. In this face recognition problem, there are a few requirements:

  • Because of the large number of people coming in and out, there was no time to train for face recognition.
  • The person’s face data usually only has one or a few photos. If only on the same day, only one, if not encountered the muddy grandparents please many times.

Therefore, this problem I will give a solution is to use object detection pretrain model for face recognition. We will then extract the feature of the person’s face using a mesh of pretrain convolution. In simple terms, people will only look at the person’s face, then look for features such as a high nose, a mole somewhere, etc. to distinguish but the difference is that the computer processes on the pixel image value. We will then use the search tree or the search graph. For each new person, we will put into that person’s characteristic graph to search later.

AI explanation section

Here, I would like to explain a few AI theories that I use. If you are only interested in the code please go to the code section now. Note, skip this section does not affect the code.

Here, I use MTCNN. I know you’ll be like: Ewww insert the meme here . Why not use FaceNet or CenterNet-Resnet 50 or something that is genuine? The reason is because when I checked to see if any library pretrain to develop the system quickly, the MTCNN repo written in torch hit the face and it was convenient that they still made pip install for it. CenterNet puts the docker in a mess, FaceNet uses MXNet.
Original paper: https://arxiv.org/pdf/1604.02878v1.pdf .

MTCNN includes 3 networks:

  • P-Net
  • R-Net
  • O-Net
    The image input is resized into multiple sizes to form an Image Pyramid. Then the pyramid will be put into P-Net:

PNet architecture

As can be seen here, P-Net is a FCN – Fully convolutional network. Its task is to identify image windows that include the human face, but that are many, fast and inaccurate. The output output includes:

  • Face classification has shape (1x1x2).
  • BBox regression has shapes (1x1x4).

R-Net and O-Net networks have similar structures that differ only in depth and output. With R-Net input, bounding boxes from P-Net and O-Net input are bounding boxes from R-Net. Their task is to filter out bounding boxes more precisely by squeezing the depth of the model.

RNet architecture

ONet architecture

The above 2 networks have an additional Fully connected layer. That’s why their output:

  • Face classification has shape (2).
  • BBox regression has shapes (4).

Code implementation part

You can git clone repo yourself to use or use pip. If you do not intend to modify the behavior of the layers in the repo, you should use pip for convenient config paths, … First, create a backend folder in the project folder. And create the following files:

  • face_detector.py
  • face_searcher.py

Part extract feature

We will create a class responsible for locating faces and extracting features through a convoluted network in the file face_detector.py

Explain a few parameters:

  • MTCNN:
    • image_size: image size crop surface
    • min_face_size: minimum face size on original to search
    • threshold: confidence level for recognition. Array of three values ​​for three networks.
    • factor: scale the image on the pyramid
  • InceptionResnetV1:
    • Pretrain: Choose the pretrain model to use
    • classify: if we use True, then the network will go through both the Logits class and a classification problem. Here we will set False to get image characteristics.

Now we will continue with the face recognition code:

The upper function returns the image as a torch tensor

The above function will return the featured vector with shape (512,) in our problem. Next we will write the functions to get the vector from the image directly and from a folder (in case of server startup, we need to put all the images back into the system).

Section included in the search graph

This part I use Hnswlib for the following 2 reasons:

  • The above library allows adding new values ​​after building the graph. Libraries like Annoy do not allow this.
  • By benchmark, Hnswlib is better than many other libraries in both speed and Recall accuracy benchmark

We will write a search class in face_searcher.py :

The parameters of hnswlib.Index include:

  • space: how to calculate distance: Squared L2, Inner product, Inner product
  • dim: the vector length transmitted, here is 512

Parameter way like ef, M, … take a look here

In the next section, we will build the server system with flask.

Share the news now

Source : Viblo