Upscaling Beyond Super–Resolution Using a Novel Deep–Learning System
Pablo Navarrete Michelini
pnavarre@boe.com.cn
Hanwen Liu
lhw@boe.com.cn BOE Technology Group Co., Ltd.
Upscaling Beyond SuperResolution Using a Novel DeepLearning System - - PowerPoint PPT Presentation
Upscaling Beyond SuperResolution Using a Novel DeepLearning System Pablo Navarrete Michelini pnavarre@boe.com.cn Hanwen Liu lhw@boe.com.cn BOE Technology Group Co., Ltd. BOE Technology Group Co., Ltd. BOE UltraHD Panels Chapter I :
Pablo Navarrete Michelini
pnavarre@boe.com.cn
Hanwen Liu
lhw@boe.com.cn BOE Technology Group Co., Ltd.
For example, a simple linear interpolation can be done with F = 1/4 1/2 1/4
1/2 1 1/2 1/4 1/2 1/4
Efficient implementation avoids multiplying zeros. Break F into many filters Wi.
Classic Upscalers: Nearest Neighbor, Linear, Bicubic, Lanczos, . . . Advanced Upscalers: Directional filters (NEDI), wavelets, . . .
(a) Original (b) Nearest Neighbor (c) Bicubic Figure: Classic Upscalers
SRCNN
Dong C., et.al., “Learning a Deep Convolutional Network for Image Super–Resolution.” Sept 2014.
BOE MuxOut
Navarrete P., et.al., “Upscaling with Deep Convolutional Networks and Muxout Layers.” May 2016.
Google RAISR
Romano Y., “RAISR: Rapid and Accurate Image Super Resolution.” Jun 2016.
Twitter ESPCN
Shi W., et.al, “Real–Time Single Image and Video Super–Resolution Using an Efficient Sub–Pixel Convolutional Neural Network.” Sept 2016.
Twitter GAN
Ledig C., et.al., “Photo-Realistic Single Image Super–Resolution Using a Generative Adversarial Network.” Sept 2016.
Twitter GAN
Sønderby C.K., et.al, “Amortised MAP Inference for Image Super-resolution.” Oct 2016.
Twitter ESPCN:
“Sub–pixel convolution layer” = “MuxOut r × r”. Differences:
MuxOut considers several groups of r 2 features. MuxOut is design to factorize r and use as several layers within the network.
Figure from: Shi W., et.al, “Real–Time Single Image and Video Super–Resolution Using an Efficient Sub–Pixel Convolutional Neural Network.” Sept 2016.
Google RAISR:
Uses a ML–approach to learn adaptive filters. Not based on convolutional–networks.
Similarity:
We will show how to interpret the convolutional–network approach as an adaptive filter.
Figure from: Romano Y., “RAISR: Rapid and Accurate Image Super Resolution.” Jun 2016.
Problems of MuxOut:
Reduces processing features Works very well only with easy content (e.g. text). Why? Filter parameters W ahve 2 tasks: Downsampling: Which combination of a–b–c–d works better? Filter: Which values work better for interpolation?
New Version:
Consider all (or most) possible combinations of features Can keep the same number of processing features. Filter parameters can focus on interpolation. SGD algorithms converge fast and stable.
A convolutional–layer block typically means:
conv(xcin) = σ
conv(xcin) =
xcin ∗ W cin,cout activ(xc) = σ (xc + bc)
And we use MuxOut like:
Problem: Large upscaling factors color might be misaligned with lumminance.
Idea: RGB Input + RGB Output Problem: MuxOut mixes color channels. Need to process separately:
Note: HVS is less sensitive to the position and motion of color than luminance.
Traditional approach: Loss(X, Y ) = MSE(X, Y ) = 1 H · W
H,W
(Xi,j − Yi,j)2 Problem: Not well correlated with HVS Why not PSNR? → PSNR is unbounded. Loss(X, Y ) = SSIM(X, Y ) = (2µXµY + C1)(2σXY + C2) (µ2
X + µ2 Y + C1)(σ2 X + σ2 Y + C2)
Well correlated with HVS. Differentiable. Behaves well with SGD.
(a) Standard (PSNR 24.82 dB – SSIM 0.8463) (b) Ours (PSNR 27.31 dB – SSIM 0.8990)
Linear Systems: Interpolation filter given by impulse response. CN: Is not Linear because of ReLU.
(c) Activity Recorder (d) Mask Layer
Use an input image and record activity. Replace all activations (ReLU) by a “Mask layer”. The system becomes linear! Check impulse response.
We say that x and y are aliases if Downscale(x) = Downscale(y) Many realistic images are aliased. MSE, SSIM, etc aim for only one alias. MSE, SSIM traget removes the innovation process! (e.g. linear regression). Give up the original content. We just want it to “look real”.
Generator (Upscaler) Discriminator
Increasing attention and significant progress in the last year. We will refer to the following important references:
WGAN:
Arjovsky M., et.al., “Wasserstein GAN.” Jan 2017.
Improved WGAN:
Gulrajani I., et.al., “Improved Training of Wasserstein GANs.” March 2017.
Losses: LD = E [D(xfake)] − E [D(xreal)] + λgpE
xD(ˆ
x)2 − 1)2 LG = −E [D(xfake)] + λLR∆ (Downscale(G(xLR)), xLR)
We do not want to reveal the high–resolution content during the Upscaler’s training. We do not want to generate artificial images with no reference to the input. We ask the upscaler to be able to recover the low–resolution input with a standard downscaler (e.g. area). ∆ ( Downscale(G(xLR)), xLR ) with: ∆ (x, y) = MSE(x, y)
∆ (x, y) = 1 − SSIM(x, y)
(e) Standard (PSNR 29.78 dB) (f) Original (PSNR ∞) (g) Ours (PSNR 25.68 dB)
(h) Standard (PSNR 29.78 dB) (i) Original (PSNR ∞) (j) Ours (PSNR 25.68 dB)
(k) Standard (PSNR 29.78 dB) (l) Original (PSNR ∞) (m) Ours (PSNR 25.68 dB)
Overview: System: Proposed improved MuxOut. Analysis: Novel approach to visualize CN as adaptive filter. Super–Resolution: Proposed SSIM loss and process color input/output. Hyper–Resolution: Hallucinating details using GAN can produce results comparable to original content. Next Steps: Larger upscaling factors. Use analysis to improve design and test for other problems. Improve generalization of GAN approach.
LinkedIn:
https://www.linkedin.com/in/pnavarre
ResearchGate:
https://www.researchgate.net/profile/Pablo_Navarrete_Michelini