(Paper Explained) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Saturday, 26/11/2022

Tram Ho

Introduce

In the super resolution problem, the CNN network has proven its strength in this problem with the accuracy superior to the traditional methods. With only a few layers of convolution layers, the SRCNN network was able to outperform the bicubic interpolation method right at the beginning of the learning process. However, in the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , the authors have proposed a new method to perform this problem that can be better in accuracy and processing speed. To achieve that, they used a technique called sub-pixel convolution layer

Problems with the SRCNN . network

In the SRCNN network, to process a low-resolution (LR) input image, the author used the bicubic interpolation method to upsample the image so that it has the same size as the high-resolution (HR) image. This has two disadvantages:

Increasing the size of the input image to the size of the output increases the workload many times over. It includes upscaling the image before putting it into the model and calculating the model with the input of an upsample image (which is many times the size of the small image). Particularly for the model, suppose if the size is increased
$n$ times the calculation volume will increase $n^2$ $n^{2} times. This causes the SRCNN network to have a long runtime and is not suitable for real-time applications [2].$
The bicubic interpolation method gives no additional information to the model. In addition, the use of bicubic interpolation also causes the result of the model to be affected by the result of this interpolation.

Therefore, the author of the article Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network has proposed a new method to solve these two weaknesses. Instead of doing the upscaling right at the input to match the high resolution size of the output, they suggest doing this at the end of the network to reduce the computational cost of the model.

Efficient Sub-Pixel Convolutional Neural Network (ESPCN) Network Architecture

In the ESPCN network of the paper, the feature extraction step is also performed like the SRCNN network. However, ESPCN is different in that the LR input image will not be upscaled by bicubic interpolation like SRCNN, but it will be taken directly through hidden layers (convolution layers) to extract feature maps. After this step, we obtain feature maps in low resolution space (LR). The next step is to build the HR image from the extracted LR feature map? Suppose, from an LR . image

$I_{LR}$ $I_{L}$ with size

$H \times W \times C, we need to upscale it to the HR image$

$I_{HR}$ $I_{H} with size$

$rH \times r W \times C ($

$r is the upscale factor).$

A first method that can be thought of is to use a deconvolutional layer (or transposed convolution). If convolutional layers are used mainly to reduce spatial dimensions (including height and width), the deconvolutional layer is used to reverse that, i.e. produce output with a larger height and width than the input. In fact, bicubic interpolation in SRCNN is also considered a deconvolutional layer because it is also used to increase the size of the input.

Sub-pixel

When taking a digital image, the camera’s imaging system projects the scene onto an image plane and then performs sampling and quantizing to produce a digital image. The sampling step here will be used to digitize the sampling coordinates of the pixels, and the quantize step is used to digitize the value of each pixel. Due to sensor limitations, images will often be limited to a certain resolution. Therefore, on that image we will have no more information in between two adjacent pixels. However, in the real world, we can have a lot of pixels between those two pixels. The pixels in between are called sub-pixels. As shown in the example below, the square red points are sampled points and will appear in the image, while the round black points in the middle will not be sampled and these are the sub-pixels.

Efficient sub-pixel convolution layer

In this paper, the author introduces a new layer type called sub-pixel convolution layer. This layer consists of 2 steps, the first step is the usual convolution to give the output is

$r^2C$ $H \times W \times r^{2} C$ , the remaining step is to shuffle the pixels to give an output of

$rH \times r W \times C, true to the resolution of$

$I_{HR}$ $I_{H} . This pixel shuffle step is performed by treating each pixel on$

$r^2$ $r^{2} feature maps are sub-pixels, we will rearrange them in a certain order on the output image. The following figure shows how to rearrange pixels to produce an output image.$

Using this layer has two main advantages:

Helps us avoid having to use zero-padding to affect the output.
Using a deconvolution layer increases the computational cost because the convolution is performed in the high-resolution space.

Result

The ESPCN network has slightly better results than other networks such as SRCNN and TNRD.

However, the highlight of ESPCN lies in the runtime. With an upscaling factor of 3, the running time of ESPCN (ours) is much better than SRCNN and other networks:

Conclusion

Thus, with only sub-pixel convolution and pixel shuffle, the ESPCN network has been able to reduce the super-resolution execution time many times while the accuracy is still improved compared to its predecessor SRCNN.

References:

Share the news now

Source : Viblo

(Paper Explained) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Introduce

Problems with the SRCNN . network

Efficient Sub-Pixel Convolutional Neural Network (ESPCN) Network Architecture

Sub-pixel

Efficient sub-pixel convolution layer

Result

Conclusion

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers