I.Overview
- The RCNN mask is a state-of-the-art problem for segmentation and object detection
- Let’s find out how MaskRCNN works
- Let’s implement steps to implement MaskRCNN for segmatation problem
II.Introduction
End of the series on RASA chatbot with articles about Training to make Rasa chatbot or write custom action function in the thread of conversation conversation of chatbot (you can see the article on your page), today I would like to share share a bit of my knowledge about MaskRCNN in a series of articles about segmentation. In this article I will together with you to learn about how MaskRCNN works and the steps to implement MaskRCNN in a simple way that is easy to understand and the following article we will together perform RCNN training model mask for specific problems. how will your implementation.
III. Classify image segmentation problem
The image segmentation problem is divided into two categories:
- Semantic segmentation : Perform segmentation with the division of different layers: for example, in an image, there are 3 classes including: people, traffic lights, cars, the segment will be implemented with all people entering 1 layer, cell. One layer and one layer of traffic light.
- Instance segmentation : Perform segmentation with each individual object in the same layer. For example, if an image has 3 people, the segment will separate these 3 people into 3 different regions.
Which segmentation type to apply depends on the problem of your problem. For example, in the problem of self-driving cars, we only need to use semantic segmentation to group human classes, lights … but when using the problem to track the actions of people in the supermarket, we must use instance segmentation to segment. identify each person to follow them.
IV. R-CNN Mask
The R-CNN Mask is basically an extension of Faster R-CNN. Faster R-CNN is used a lot in object detection problems. When we put an image into Faster R-CNN will return the label and bounding box of each specific object in an image.
In the original paper of the author said and I would like to fully extract the following: “The Mask R-CNN framework is built on top of Faster R-CNN”. I can simply say the following to make it easier for you to imagine, when you put an image in addition to returning the label and bouding box of each object in an image, it will add us the mask mask. .
There are also quite a lot of articles about Faster R-CNN so if you want to understand more, you can read and read within the scope of this article, let’s talk quickly about how Faster R-CNN works:
- It will first use ConvNet to extract features from the input image
- These calculations will then be transferred to a Region Proposal Network (RPN), which will then return bounding boxes at regions that may have objects of different sizes.
- Then add the RoI pooling layer with the purpose of aggregating bouding boxes on the same object with different sizes to the same size.
- And finally transferred to a fully connected layer for classification and output is a bouding box for each object.
Backbone Model
Similar to ConvNet in Faster R-CNN, in MaskRCNN the author uses the Resnet101 architecture to extract information from the input image.
Region Proposal Network (RPN)
In this step the model uses the extracted feature applied to the RPN network to predict whether the object is in that area or not. After this step we will get bounding boxes in areas that can have objects from the prediction model.
Region of Interest (RoI)
The bounding boxes from the object detection areas will have different sizes, so this step will merge the bouding boxes to a certain size at one object. Next, these regions are transferred to a fully connected layer to predict class labels and limit boxes. As I said an object will have a lot of bounding boxes with different sizes then it will be gradually removed through the calculation of IOU as follows:
1 2 | IoU = Area of the intersection / Area of the union |
or the area between the predicted bouding box and the actual bounding box divided by the actual bouding box. If the IOU is greater than or equal to 0.5, then the interest will be less than the eliminated. The following example you can understand more as follows: We can see that box1 and box2 will be IOU less than 0.5, so it will be a removal area and box3 and box4 will be greater than or equal to 0.5, so it will be called a concern.
Segmentation Mask
We have an ROI based on IOU values through calculations so the author has added a mask branch into the current architecture.
Here, our model has segmented all the objects in the image. This is the final step in R-CNN Mask where we predict mask for all the objects in the image as shown below:
V. Steps to deploy R-CNN Mask
In this section, I will work with you to perform the steps to deploy R-CNN Mask:
Step 1: Clone the repository
1 2 | git clone https://github.com/matterport/Mask_RCNN.git |
Step 2: Install the related libraries below
1 2 3 4 5 6 7 8 9 10 11 12 13 | numpy scipy Pillow cython matplotlib scikit-image tensorflow>=1.3.0 keras>=2.0.8 opencv-python h5py imgaug IPython |
Step 3: Download the pre-trained weights (trained on MS COCO) Download the coco pretrain by following the link below:
https://github.com/matterport/Mask_RCNN/releases
Step 4: Predicting for our image Finally we use the R-CNN Mask with weights to predict and create masks for the images we add.
Let’s get started together:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import os import sys import random import math import numpy as np import skimage.io import matplotlib import matplotlib.pyplot as plt # Root directory of the project ROOT_DIR = os.path.abspath("../") import warnings warnings.filterwarnings("ignore") # Import Mask RCNN sys.path.append(ROOT_DIR) # To find local version of the library from mrcnn import utils import mrcnn.model as modellib from mrcnn import visualize # Import COCO config sys.path.append(os.path.join(ROOT_DIR, "samples/coco/")) # To find local version import coco %matplotlib inline |
Next we will define the path for pretrained weight and image through the code below:
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Directory to save logs and trained model MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Local path to trained weights file COCO_MODEL_PATH = os.path.join('', "mask_rcnn_coco.h5") # Download COCO trained weights from Releases if needed if not os.path.exists(COCO_MODEL_PATH): utils.download_trained_weights(COCO_MODEL_PATH) # Directory of images to run detection on IMAGE_DIR = os.path.join(ROOT_DIR, "images") |
1 2 3 4 5 6 7 8 9 | class InferenceConfig(coco.CocoConfig): # Set batch size to 1 since we'll be running inference on # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU GPU_COUNT = 1 IMAGES_PER_GPU = 1 config = InferenceConfig() config.display() |
We can see from the image above that the R-CNN Mask model uses resnet101 with a mask size of 28×28 returned training on the Coco dataset and has a total of 81 classes including the background. Next we will create a model and load pretrain weight
1 2 3 4 5 6 | # Create model object in inference mode. model = modellib.MaskRCNN(mode="inference", model_dir='mask_rcnn_coco.hy', config=config) # Load weights trained on MS-COCO model.load_weights('mask_rcnn_coco.h5', by_name=True) |
Define the classes of the coco dataset
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # COCO Class names class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] |
Let’s load the image and show the photos offline:
1 2 3 4 5 6 7 | # Load a random image from the images folder image = skimage.io.imread('sample.jpg') # original image plt.figure(figsize=(12,10)) skimage.io.imshow(image) |
Let’s put the image on the model and enjoy the results come on:
1 2 3 4 5 6 7 | # Run detection results = model.detect([image], verbose=1) # Visualize results r = results[0] visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores']) |
So the article about Mask R-CNN for my image segmentation part 1 problem is here. Part 2 I will work with you to do the R-CNN Mask model training for a specific problem. If you find it interesting or interesting, please give me an upvote and follow it to wait for new posts. ) (ahihi). The article is still lacking many times, please ignore or comment below the article to improve it. Thank you for watching my article (tym) (tym) (tym).
References
https://arxiv.org/pdf/1703.06870.pdf https://medium.com/@jonathan_hui/image-segmentation-with-mask-r-cnn-ebe6d793272 https://engineering.matterport.com/splash-of -color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46 https://github.com/matterport/Mask_RCNN