Learn about BlazePose: On-device Real-time Body Pose tracking

Tram Ho

Introduction to the problem of Pose Estimation

The Pose Estimation problem is one of the common problems in image processing. We have had very successful studies before in this area such as OpenPose , PoseNet . An important improvement in these models is improved processing speed. In this paper the authors of Google AI Research proposed an architecture that can run realtime on mobile devices with a processing speed of about 30FPS. We will learn about the network architecture as well as how BlazePose performs to achieve the above results.

Learn how Blaze Pose does tracking

Architectural overview

BlazePose consists of two main ingredients:

  • Pose Detector: To detect the person containing the image
  • Pose Tracker: To extract keypoints on the cropped person’s location from the image and predict the position of the person in the next frame.

In the case of video input the detector will only run in some keyframe and the position of the person will be tracked in subsequent frames. This tracking is done by the Pose Tracker model

Pose Detector section

In the paper instead of the full body, I use the face detector along with the midpoint of the current person’s hip, the size of the circle around the entire body, the angle of inclination (angle measured by the line connected by the point. mid-hips – the point between the shoulders and the vertical direction). This is a simple detector and light-weight

From the information detected, the detector proceeds alignment to rotate the person vertically.

Pose Tracker for tracking

In this article the author approaches the idea, that if the pose tracker can predict the position of the person in the next frame, the Pose Detector will no longer need to run again and will always use the predicted result of pose tracker and detect only when tracker incorrectly predicts (below a certain threshold). Thread works as follows

The Pose Tracking network is divided into two parts: Keypoints Detection Part and Keypoints Regression Part as shown below

First I train the left and middle front network using heatmap and offset loss . Then the regression on the right will be trained by sharing the feature with the detector network (do not do back propagate but only share the feature). When tésting I will completely remove the detection part and keep only the regression part. The output of this network will include 33 keyponts and 2 use points for alignment described above the Pose Detector section.

How BlazePose takes smooth landmarks

The way BlazePose takes smooth landmarks

This method is not covered in the original BlazePose paper, however you can refer to the method used in the mediapipe library. In this library, velocity-based tracking is used. If the speed exceeds a certain threshold, it will detect again. The velocity function is defined here

But smooth landmarks will only be implemented when in tracking mode i.e. static_image_mode=False because in static mode we have assumed that the images passed are not related to each other (can be a batch of images from many different backgrounds, contexts. together). So in static_mode, it is imperative to perform both Pose Detect and Pose Tracker models at each image in batch. So the smooth_landmarks parameter will be smooth_landmarks

Meaning of parameters in Mediapipe library

static_image_mode (boolean)

  • Where static_image_mode = True:: In the case this parameter is set to True in case we treat the input images as a series of stills or possibly unrelated images in the same batch. Now the detect person model will run on every frame without tracking
  • In case static_image_mode = False: In case this parameter is set to False , the algorithm will treat the input as a video stream. Now the algorithm that detects the position of the person in the frame is only done in a few key frames . After you have the position of the person on the keyframe, the frames will then use tracking algorithm to track that position without having to detect it again. Until the tracking model output falls below a certain threshold (specified by the min_tracking_confidence parameter), the detect person will be executed again.

smooth_landmarks (boolean)

This parameter only works in the case static_image_mode=False ie in the case the input is a video stream and tracking is performed. In static_image_mode=True mode, smooth_landmarks will be smooth_landmarks

  • In case smooth_landmarks = True: in this case then the landmarks filter landmarks algorithm will be implemented on keypoints to reduce the vibration of the keypoints. Helps the location of keypoints to be stable over frames
  • Where smooth_landmarks = False: don’t apply the filter landmarks algorithm.

min_detection_confidence (float in [0.0, 1.0])

The minimum value for the detect person model is considered to be a successful detect. If the model’s output is below this threshold, keypoints cannot be detected. Default value is 0.5

min_tracking_confidence (float in [0.0, 1.0])

The minimum value for the tracking model to be considered as successful tracking. If the output of the tracking pattern is below this threshold then it can be considered that the tracking failed (no longer tracking the person) and then it will perform the detection of the person’s position in the next frame. This value, if set high, will increase the accuracy of the model but reduce computation time (due to many times having to detect the person again). Default value is 0.5

Application

BlazePose is widely used in applications such as fitness or yoga tracker. The examples below are squats and push ups

Share the news now

Source : Viblo