Steps to implement Mask R-CNN for Image Segmentation problem

Tram Ho


  • The RCNN mask is a state-of-the-art problem for segmentation and object detection
  • Let’s find out how MaskRCNN works
  • Let’s implement steps to implement MaskRCNN for segmatation problem


End of the series on RASA chatbot with articles about Training to make Rasa chatbot or write custom action function in the thread of conversation conversation of chatbot (you can see the article on your page), today I would like to share share a bit of my knowledge about MaskRCNN in a series of articles about segmentation. In this article I will together with you to learn about how MaskRCNN works and the steps to implement MaskRCNN in a simple way that is easy to understand and the following article we will together perform RCNN training model mask for specific problems. how will your implementation.

III. Classify image segmentation problem

The image segmentation problem is divided into two categories:

  1. Semantic segmentation : Perform segmentation with the division of different layers: for example, in an image, there are 3 classes including: people, traffic lights, cars, the segment will be implemented with all people entering 1 layer, cell. One layer and one layer of traffic light.
  2. Instance segmentation : Perform segmentation with each individual object in the same layer. For example, if an image has 3 people, the segment will separate these 3 people into 3 different regions. Which segmentation type to apply depends on the problem of your problem. For example, in the problem of self-driving cars, we only need to use semantic segmentation to group human classes, lights … but when using the problem to track the actions of people in the supermarket, we must use instance segmentation to segment. identify each person to follow them.

IV. R-CNN Mask

The R-CNN Mask is basically an extension of Faster R-CNN. Faster R-CNN is used a lot in object detection problems. When we put an image into Faster R-CNN will return the label and bounding box of each specific object in an image.
In the original paper of the author said and I would like to fully extract the following: “The Mask R-CNN framework is built on top of Faster R-CNN”. I can simply say the following to make it easier for you to imagine, when you put an image in addition to returning the label and bouding box of each object in an image, it will add us the mask mask. .
There are also quite a lot of articles about Faster R-CNN so if you want to understand more, you can read and read within the scope of this article, let’s talk quickly about how Faster R-CNN works:

  1. It will first use ConvNet to extract features from the input image
  2. These calculations will then be transferred to a Region Proposal Network (RPN), which will then return bounding boxes at regions that may have objects of different sizes.
  3. Then add the RoI pooling layer with the purpose of aggregating bouding boxes on the same object with different sizes to the same size.
  4. And finally transferred to a fully connected layer for classification and output is a bouding box for each object.

Backbone Model

Similar to ConvNet in Faster R-CNN, in MaskRCNN the author uses the Resnet101 architecture to extract information from the input image.

Region Proposal Network (RPN)

In this step the model uses the extracted feature applied to the RPN network to predict whether the object is in that area or not. After this step we will get bounding boxes in areas that can have objects from the prediction model.

Region of Interest (RoI)

The bounding boxes from the object detection areas will have different sizes, so this step will merge the bouding boxes to a certain size at one object. Next, these regions are transferred to a fully connected layer to predict class labels and limit boxes. As I said an object will have a lot of bounding boxes with different sizes then it will be gradually removed through the calculation of IOU as follows:

or the area between the predicted bouding box and the actual bounding box divided by the actual bouding box. If the IOU is greater than or equal to 0.5, then the interest will be less than the eliminated. The following example you can understand more as follows: We can see that box1 and box2 will be IOU less than 0.5, so it will be a removal area and box3 and box4 will be greater than or equal to 0.5, so it will be called a concern.

Segmentation Mask

We have an ROI based on IOU values ​​through calculations so the author has added a mask branch into the current architecture.

Here, our model has segmented all the objects in the image. This is the final step in R-CNN Mask where we predict mask for all the objects in the image as shown below:

V. Steps to deploy R-CNN Mask

In this section, I will work with you to perform the steps to deploy R-CNN Mask:
Step 1: Clone the repository

Step 2: Install the related libraries below

Step 3: Download the pre-trained weights (trained on MS COCO) Download the coco pretrain by following the link below:

Step 4: Predicting for our image Finally we use the R-CNN Mask with weights to predict and create masks for the images we add.

Let’s get started together:

Next we will define the path for pretrained weight and image through the code below:

We can see from the image above that the R-CNN Mask model uses resnet101 with a mask size of 28×28 returned training on the Coco dataset and has a total of 81 classes including the background. Next we will create a model and load pretrain weight

Define the classes of the coco dataset

Let’s load the image and show the photos offline:

Let’s put the image on the model and enjoy the results come on:

So the article about Mask R-CNN for my image segmentation part 1 problem is here. Part 2 I will work with you to do the R-CNN Mask model training for a specific problem. If you find it interesting or interesting, please give me an upvote and follow it to wait for new posts. ? ) (ahihi). The article is still lacking many times, please ignore or comment below the article to improve it. Thank you for watching my article (tym) (tym) (tym).

References -color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Share the news now

Source : Viblo