In the name of Allah the compassionate, the merciful the - - PowerPoint PPT Presentation
In the name of Allah the compassionate, the merciful the - - PowerPoint PPT Presentation
In the name of Allah the compassionate, the merciful the compassionate, the merciful Digital Video Processing S. Kasaei S S S. Kasaei asae asae Room: CE 307 Department of Computer Engineering Sharif University of Technology Sh if U i
In the name of Allah
the compassionate, the merciful the compassionate, the merciful
Digital Video Processing
- S. Kasaei
- S. Kasaei
S asae S asae
Room: CE 307 Department of Computer Engineering Sh if U i it f T h l Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage: http://sharif.edu/~skasaei p g p
- Lab. Website: http://ipl.ce.sharif.edu
Acknowledgment
Most of the slides used in this course have been provided by: Most of the slides used in this course have been provided by:
- Prof. Yao Wang (Polytechnic University, Brooklyn) based on
the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].
Chapters 9 & 11
Video Coding Video Coding using M ti C ti Motion Compensation
Outline
- Block-Based Hybrid Video Coding
- Overview of Block-Based Hybrid Video Coding
- Overlapped Block Motion Compensation
- Coding Mode Selection & Rate Control
L Filt i
- Loop Filtering
- Scalable Video Coding
- Motivation for Scalable Coding
- Motivation for Scalable Coding
- Basic Modes of Scalability
- Scalability in MPEG-2
Kasaei 6
- Scalability in MPEG 2
- Fine Granularity Scalability in MPEG-4
Characteristics f
- f Typical Videos
Frame t-1 Frame t
Adjacent frames are similar & changes are due to
- bject or camera motion (on a static background).
Kasaei 7
Key Ideas C in Video Compression
- Predicts a new frame from a previous frame & only
codes the prediction error. P di ti b b d d i th DCT th d
- Prediction error cab be coded using the DCT method.
- Prediction errors have smaller energy than the
- riginal pixel values & can be coded with fewer bits.
- riginal pixel values & can be coded with fewer bits.
- Those regions that cannot be predicted well, will be
coded directly using DCT.
- Works on each macroblock (MB) (16x16 pixels)
independently to reduce the complexity.
- Motion compensation is done at MB level
Kasaei 8
- Motion compensation is done at MB level.
- DCT coding of error is done at block level (8x8 pixels).
ff C Different Coding Modes
Kasaei 9
T l P di ti Temporal Prediction
- No Motion Compensation:
- No Motion Compensation:
- Works well in stationary regions.
$( , , ) ( , , ) f t m n f t m n = −1
- Uni-directional Motion Compensation:
- Does not work well for uncovered regions by object motion.
( , , ) ( , , ) f f
- Bi-directional Motion Compensation:
$( , , ) ( , , ) f t m n f t m d n d
x y
= − − − 1
p
- Can better handle covered/uncovered regions.
$( , , ) ( , , )
, ,
f t m n w f t m d n d
b b x b y
= − − − 1
Kasaei 10
( , , )
, ,
w f t m d n d
f f x f y
+ + − − 1
T l P di ti Temporal Prediction
Kasaei 11
Temporal Prediction
Although bi-directional prediction can
improve prediction accuracy & consequently coding efficiency, it incurs encoding delay & is typically not used in real-time applications. H 261/H 263 l i di ti l
H.261/H.263 use only uni-directional
prediction & a restricted bi-directional prediction (PB mode) prediction (PB-mode).
MPEG employs both uni- & bi-directional
prediction
Kasaei 12
prediction.
Encoder Block Diagram f Bl k B d H b id Vid C d
- f a Block-Based Hybrid Video Coder
Kasaei 13
[Hybrid: a combination of motion-compensated temporal prediction + transform coding.]
Decoder Block Diagram
Kasaei 14
C Block-Based Hybrid Video Coding
- The encoder must emulate the decoder
- peration to deduce the same reconstructed
frame as the decoder.
- Frame types:
- When a frame is coded entirely in the intra-mode
I-frame. a e
- When a previous frame is used for prediction in the
inter-mode P-frame.
- When a weighted sum of previous & following frame
g p g is used for prediction in the inter-mode B-frame.
- The mode information, MVs, & other side
information (picture format, block location, …)
Kasaei 15
(p ) are coded using VLC.
Bl k B d Vid P i i Block-Based Video Partitions
Kasaei 16
MB Structure i 4 2 0 C l F t in 4:2:0 Color Format
4 of 8x8 Y blocks. 1 of 8x8 Cb blocks. 1 of 8x8 Cr blocks.
Kasaei 17
Block Matching Algorithm for Motion Estimation for Motion Estimation
MV
Search Region
Frame t 1 Frame t
Kasaei 18
Frame t-1 (reference frame) Frame t (predicted frame)
Macroblock Coding i I M d in I-Mode
DCT transform each 8x8 DCT block. Quantize DCT coefficients (with properly chosen quantization matrices) (with properly chosen quantization matrices). Zig-zag order & run-length code quantized DCT coefficients.
Kasaei 19
Macroblock Coding i P M d in P-Mode
Estimate one MV for each macroblock (16x16). Depending on the motion compensation error, determine the coding mode (intra, inter-with-no-MC, inter- with MC etc ) with-MC, etc.) Original values (for intra-mode) or motion compensation g ( ) p errors (for inter-mode) in each of the DCT blocks (8x8) are DCT transformed, quantized, zig-zag/alternate scanned & run-length coded
Kasaei 20
scanned, & run length coded.
Macroblock Coding i B M d in B-Mode
- Same as for the P-mode, except that a macroblock
can be predicted from a previous frame, a following
- ne or both
- ne, or both.
v f
vb
v f
Kasaei 21
Overlapped Block Motion C (O C) Compensation (OBMC)
- Conventional block motion compensation:
- Conventional block motion compensation:
- One best matching block is found from a reference
frame.
- The current block is replaced by the best matching
block.
- OBMC:
- OBMC:
- Each pixel in the current block is predicted by a
weighted average of several corresponding pixels in the reference frame.
- The corresponding pixels are determined by the
MVs of the current as well as adjacent MBs
Kasaei 22
MVs of the current as well as adjacent MBs.
- The weights for each corresponding pixel depends
- n the expected accuracy of the associated MV.
OBMC using 4-Neighboring MBs
weight assigned to estimated value based on MV (dm,k).
should be inversely should be inversely proportional to the distance between x & the center of ([hm,1 & hm,4] > [hm,2 & hm,3])
Kasaei 23
( ) K: total no. of neighboring blocks Bm: block
Optimal Weighting Design
- Convert to an optimization problem:
- Minimize:
- Subject to:
- Subject to:
- Optimal weighting functions:
autocorrelation
Kasaei 24
autocorrelation cross-correlation
How to Determine MVs O C with OBMC
- Option 1: using conventional BMA, minimize the
prediction error (MAD) within each MB i d d tl independently.
- Option 2: minimize the prediction error
assuming OBMC: assuming OBMC:
Solve the MV for the current MB while
keeping the MVs for the neighboring MBs p g g g found in the previous iterations.
Kasaei 25
How to Determine MVs O C with OBMC
- Option 3: Using a weighted error criterion over
a larger block:
window function function
Kasaei 26
Weighting Coefficients used in H.263
Left or Right Top or Bottom Current Block
Kasaei 27
Window Function Corresponding t H 263 W i ht f OBMC to H.263 Weights for OBMC
Kasaei 28
C S Coding Parameter Selection
Coding modes
- Coding modes:
- Intra vs. inter, quantization parameter (QP) for
each MB, ME method ( different rates). each MB, ME method ( different rates).
- Rate-distortion optimized selection, given a
target rate:
- Minimize the distortion, subject to the target rate
constraint:
simplified version:
The optimal mode is such that each MB works at the same R-D slope:
C Rate Control
- Rate control:
- How to code a video so that the resulting bit
stream satisfies a target bit rate? g
- For pleasant visual perception, video should have
a constant quality.
- But, the coding method necessarily yields variable
g y y bit rate.
- So, at least the bit rate should be constant when
averaged over a short period.
- Rate control is also necessary when the video is to
be sent over a constant bit rate (CBR) channel.
- The fluctuation within the period can be smoothed
b b ff t th d t t
Kasaei 30
by a buffer at the encoder output.
C Rate Control
Rate control accomplished steps:
- Step 1) Determine the target average bit rate at
the frame GOB & MB level based on the the frame, GOB, & MB level, based on the current buffer fullness.
- Step 2) Satisfy frame level target rate by varying
f ( ki f h ) frame rate (skip frames when necessary).
- Step 3) Satisfy GOB/MB level target rate by
varying the coding mode & QP at each MB. y g g
(= Rate-distortion optimized mode selection.)
Kasaei 31
Loop Filtering
- Errors in previously reconstructed frames
accumulate over time with motion compensated t l di ti temporal prediction.
- Error propagation leads to:
Reduction of prediction accuracy
- Reduction of prediction accuracy.
- Increase of the bitrate needed for coding new
frames.
Kasaei 32
Loop Filtering
- Loop filtering:
- Filters the reference frame before using it for
prediction.
- Can be embedded in the motion compensation
loop loop.
- Half-pel accuracy motion compensation.
- OBMC.
- Loop filtering can significantly improve coding
efficiency. For theoretically optimal design of loop filters
Kasaei 33
- For theoretically optimal design of loop filters
see text.
S C (C ) Scalable Coding (Ch. 11)
- Scalability refers to the capability of recovering
physically meaningful image (or video) information by decoding only partial compressed information by decoding only partial compressed bit stream.
- It is used when users try to access the same
id th h diff t i ti li k video through different communication links (bandwidth scalability).
- A scalable stream can also offer adaptivity to
p y varying channel error characteristics & computing power at the receiving terminal.
- This includes: quality spatial temporal &
Kasaei 34
- This includes: quality, spatial, temporal, &
frequency scalability.
S C Scalable Coding
- Motivation:
- Real networks are heterogeneous in rate.
- Streaming video from home (56 kbps) using
d t LAN (10 100 b ) modem vs. corporate LAN (10-100 mbps).
- Scalable video coding:
- Ideal goal (embedded stream): creating a bit
- Ideal goal (embedded stream): creating a bit
stream that can be accessed at any rate.
- Practical video coder:
- Layered coder: base-layer provides basic quality,
successive layers refine the quality incrementally.
- Coarse granularity: (typically known as layered
Kasaei 35
- Coarse granularity: (typically known as layered
coder).
- Fine granularity (FGS).
Q S Quality Scalability
finer quantize of the difference between
- riginal DCT coefficients
& coarsely quantized base-layer coefficients coarse quantization
Kasaei 36
quantization
S S Spatial Scalability
Kasaei 37
Combined Spatial/Quality Scalability
Kasaei 38
f S C Illustration of Scalable Coding
bility 6.5 kbps 133.9 kbps patial Scalab Sp 21 6 kbps 436 3 kbps
Kasaei 39
21.6 kbps 436.3 kbps Quality (SNR) Scalability
Quality Scalability Q by Multistage Quantization
coarse fine
Kasaei 40
Spatial/Temporal Scalability / S through Down/Up-Sampling
coarse fi fine
Kasaei 41
S G Scalability in MPEG-2
MPEG-2 is the earliest standard that offers
scalability tools.
Four types of scalability:
Data partition (frequency scalability). SNR scalability (quality scalability). Temporal scalability (frame-rate scalability). Spatial scalability (resolution scalability).
Kasaei 42
Fine Granularity Scalability in G MPEG-4
MPEG 4 achieves fine granularity quality MPEG-4 achieves fine granularity quality
scalability through bit-plane coding.
The DCT coefficients are represented The DCT coefficients are represented
losslessly in binary bits.
The bit planes are coded successively, from
the most significant bit to the least the most significant bit to the least.
The bit plane within each block is coded
using run-length coding. g g g
Temporal scalability is accomplished by
combining I, B, & P-frames.
Kasaei 43
Spatial scalability is achieved by spatial
down/up-sampling.
C Bit-Plane Coding
Kasaei 44
C Bit-Plane Coding
Kasaei 45
C Bit-Plane Coding
Kasaei 46
Scalable Video Coding f using Wavelet Transform
Wavelet-based image coding:
Full frame image transform (as opposed to
block-based transform).
Bit plane coding of the transform coefficients
l d t b dd d bit t can lead to embedded bit streams.
Embedded zero-tree wavelet (EZW)
SPIHT SPIHT.
Kasaei 47
Scalable Video Coding using f Wavelet Transforms
Wavelet-based video coding:
Temporal filtering with & without motion
compensation.
Can achieve temporal, spatial, & quality
l bilit i lt l scalability simultaneously.
MPEG4 uses DCT-based coding for natural
videos but uses WT-based coding for still videos, but uses WT-based coding for still images & graphics.
Still an active research activity!
Kasaei 48
y
C Wavelet-Based Video Coding
Kasaei 49
Coded images by (a) JPEG baseline method (PSNRs for Y, U, and V components are 28.36, 34.74, 34.98 dB, respectively); (b) the MZTE method (PSNRs for Y, U, and V components are 30.98, 41.68, 40.14 dB, respectively). Both at a compression ratio of 45:1.
C Wavelet-Based Video Coding
Comparison of SA DWT (SA ZTE in this case) coding with SA DCT coding
Kasaei 50
Comparison of SA-DWT (SA-ZTE in this case) coding with SA-DCT coding. SA: Shape-Adaptive.
C Wavelet-Based Video Coding
Reconstructed object using SA-DCT & SA-ZTE: (a) SA-DCT (1.0042bpp, PSNR-Y=37.09dB; PSNR-U=42.14dB; PSNR-V=42.36dB);
Kasaei 51
(b) SA-ZTE (0.9538bpp, PSNR-Y=38.06dB; PSNR-U=43.43dB; PSNR-V=43.25dB). SA: Shape-Adaptive.
C Wavelet-Based Video Coding
Kasaei 52
Block diagram of a wavelet-based video codec.
Homework 9
Reading assignment:
- Sec. 9.3, 11.1
- Sec. 9.3, 11.1
Written assignment:
Prob 9 13 11 3 11 4
- Prob. 9.13, 11.3, 11.4
Computer assignment:
P b 9 11 9 12
- Prob. 9.11, 9.12
Optional: 9.15
Kasaei 53