3D Photography using Context-aware Layered Depth Inpainting

Tram Ho

1. Introduce

You’ve probably heard or used Facebook’s “3D” imaging technology. With a regular 2D photo, Facebook 3D Photos can create a small motion picture, creating the feeling like a moving image or short video. The technology of Facebook 3D Photos makes a breakthrough thanks to Layered Depth Image (LDI) layering capabilities. Color (RGB) and depth (D) of an image. However, this method still exists a major weakness. The layers are relatively rigid, making the background of the image not very good. And 3D Photography using Context-aware Layered Depth Inpainting offers some pretty interesting methods to solve.

2. Some concepts

Before going into the paper, I will discuss some terms used by the author

2.1. Photo RGB-D

A typical photo will have 3 color channels Red, Green and Blue with values ​​in the range (0,255).

RGB-D images are RGB images combined with a depth of field (depth map), often applied in the field of graphics. The Depth map of the image can be measured by the depth sensor (IR Depth Sensor), so the RGB-D image can be obtained by specialized ZCam devices such as Kinect, Orbbec, VicoVR …

2.2. Layered Depth image

The method used is the Layered depth image (LDI) studied by Jonathan Shade et al . You can find the original paper here . Paper was published in 1998 and was the basis for many later graphics applications. LDI was developed when Sprites Depth at that time did not respond to images with many hidden areas or parallax (parrallax).

With LDI, each pixel of the depth map will contain multiple points, so even when the Field of View moves, it is still possible to see the hidden layers at their original location. Another plus is that LDI does not use z-buffer, which makes rendering images faster and more efficient.

This method has also been applied by Facebook 3D Photos in the process of creating videos from 2D images. Or simply understand, can render images with many different depths in graphics.

Sprites Depth example splits depth map into layer

a) Extract sprite

b) Segmentation of areas (6 areas)

c) Merge segment regions based on the depth of the image

d) Photos displayed in layers

e) The last remaining layer

f) Recreate the background

g) Novel view has no residual class

h) Novel view with the residual class

2.3. Input and pre-process

As mentioned above, the input image of the network will be an RGB-D image. However, you do not always have one Kinect to get the desired picture, so in the author, I have given a solution is to create a photo depth of field with MegaDepth, MiDas …

Megadepth is basically a pretrained model used to predict the depth map of an image. You can try the demo at the link below to better visualize


And here is the resulting image of a car generated depth map using MiDas

3. Pipeline

Pipeline of this method has been illustrated quite easily and specifically by the author. With the input RGB-D image after being applied LDI to separate into layers, we will obtain images with different filter-depth. Every LDI pixel, in addition to the pixels, has a depth value field.

Pre-image processing steps:

  1. Normalize the image
  2. Use LDI with the predicted depth map image above
  3. Sharpen contours with bilateral median filter
  4. Draw the original border
  5. Divide the contours corresponding to the depth map

  1. Create layered depth image
  2. Cut / separate corresponding layers (background / subject)
  3. Create a background with context area and synthesis area
  4. Image reproduction

A big problem is that the depth map sometimes doesn’t match 100% with the image area and color. Therefore, the author has proposed to use 3 separate models to solve 3 separate problems: border restoration, color restore and restore hidden depth.

The step of recreating the background image is divided into 3 stages:

  • Regenerate the borders
  • Color reproduction
  • Reconstructed depth

The recreated edge will be concatenated with the context area to become input for the following models

With 3 elements contour, color and depth of field. For each stage, there will be corresponding models. Network architecture:

For reproduction of depth and color of images using Partial Convolution U-net architecture. Partial Convolution is the network architecture used to recreate / restore deleted areas. I’ll talk more in another post later.

You can refer to paper + code here .

To increase the efficiency, the author has used inpainting several times to achieve the most complete picture.

The case of fail

Large intricate toxic photographs or transparent, mirrored objects can affect the results. Experiments also show that with small details, or colors near the background can also distort the depth map and inpainting process.


4. Conclusion

This is a pretty good paper, isn’t it. You can absolutely use this colab link of a naughty author to try and create 3D photography with DL from your own photos. 😄

My article here is the end, thank you for your interest in tracking





Share the news now

Source : Viblo