[PPT] - In the name of Allah the compassionate, the merciful the PowerPoint Presentation

SLIDE 1

SLIDE 2

In the name of Allah

the compassionate, the merciful the compassionate, the merciful

SLIDE 3

Digital Video Processing

S. Kasaei
S. Kasaei

S asae S asae

Room: CE 307 Department of Computer Engineering Sh if U i it f T h l Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage: http://sharif.edu/~skasaei p g p

Lab. Website: http://ipl.ce.sharif.edu

SLIDE 4

Acknowledgment

Most of the slides used in this course have been provided by: Most of the slides used in this course have been provided by:

Prof. Yao Wang (Polytechnic University, Brooklyn) based on

the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].

SLIDE 5

Chapters 9 & 11

Video Coding Video Coding using M ti C ti Motion Compensation

SLIDE 6

Outline

Block-Based Hybrid Video Coding
Overview of Block-Based Hybrid Video Coding
Overlapped Block Motion Compensation
Coding Mode Selection & Rate Control

L Filt i

Loop Filtering
Scalable Video Coding
Motivation for Scalable Coding
Motivation for Scalable Coding
Basic Modes of Scalability
Scalability in MPEG-2

Kasaei 6

Scalability in MPEG 2
Fine Granularity Scalability in MPEG-4

SLIDE 7

Characteristics f

f Typical Videos

Frame t-1 Frame t

Adjacent frames are similar & changes are due to

bject or camera motion (on a static background).

Kasaei 7

SLIDE 8

Key Ideas C in Video Compression

Predicts a new frame from a previous frame & only

codes the prediction error. P di ti b b d d i th DCT th d

Prediction error cab be coded using the DCT method.
Prediction errors have smaller energy than the
riginal pixel values & can be coded with fewer bits.
riginal pixel values & can be coded with fewer bits.
Those regions that cannot be predicted well, will be

coded directly using DCT.

Works on each macroblock (MB) (16x16 pixels)

independently to reduce the complexity.

Motion compensation is done at MB level

Kasaei 8

Motion compensation is done at MB level.
DCT coding of error is done at block level (8x8 pixels).

SLIDE 9

ff C Different Coding Modes

Kasaei 9

SLIDE 10

T l P di ti Temporal Prediction

No Motion Compensation:
No Motion Compensation:
Works well in stationary regions.

$( , , ) ( , , ) f t m n f t m n = −1

Uni-directional Motion Compensation:
Does not work well for uncovered regions by object motion.

( , , ) ( , , ) f f

Bi-directional Motion Compensation:

$( , , ) ( , , ) f t m n f t m d n d

x y

= − − − 1

p

Can better handle covered/uncovered regions.

$( , , ) ( , , )

, ,

f t m n w f t m d n d

b b x b y

= − − − 1

Kasaei 10

( , , )

, ,

w f t m d n d

f f x f y

+ + − − 1

SLIDE 11

T l P di ti Temporal Prediction

Kasaei 11

SLIDE 12

Temporal Prediction

Although bi-directional prediction can

improve prediction accuracy & consequently coding efficiency, it incurs encoding delay & is typically not used in real-time applications. H 261/H 263 l i di ti l

H.261/H.263 use only uni-directional

prediction & a restricted bi-directional prediction (PB mode) prediction (PB-mode).

MPEG employs both uni- & bi-directional

prediction

Kasaei 12

prediction.

SLIDE 13

Encoder Block Diagram f Bl k B d H b id Vid C d

f a Block-Based Hybrid Video Coder

Kasaei 13

[Hybrid: a combination of motion-compensated temporal prediction + transform coding.]

SLIDE 14

Decoder Block Diagram

Kasaei 14

SLIDE 15

C Block-Based Hybrid Video Coding

The encoder must emulate the decoder
peration to deduce the same reconstructed

frame as the decoder.

Frame types:
When a frame is coded entirely in the intra-mode

I-frame. a e

When a previous frame is used for prediction in the

inter-mode P-frame.

When a weighted sum of previous & following frame

g p g is used for prediction in the inter-mode B-frame.

The mode information, MVs, & other side

information (picture format, block location, …)

Kasaei 15

(p ) are coded using VLC.

SLIDE 16

Bl k B d Vid P i i Block-Based Video Partitions

Kasaei 16

SLIDE 17

MB Structure i 4 2 0 C l F t in 4:2:0 Color Format

4 of 8x8 Y blocks. 1 of 8x8 Cb blocks. 1 of 8x8 Cr blocks.

Kasaei 17

SLIDE 18

Block Matching Algorithm for Motion Estimation for Motion Estimation

MV

Search Region

Frame t 1 Frame t

Kasaei 18

Frame t-1 (reference frame) Frame t (predicted frame)

SLIDE 19

Macroblock Coding i I M d in I-Mode

DCT transform each 8x8 DCT block. Quantize DCT coefficients (with properly chosen quantization matrices) (with properly chosen quantization matrices). Zig-zag order & run-length code quantized DCT coefficients.

Kasaei 19

SLIDE 20

Macroblock Coding i P M d in P-Mode

Estimate one MV for each macroblock (16x16). Depending on the motion compensation error, determine the coding mode (intra, inter-with-no-MC, inter- with MC etc ) with-MC, etc.) Original values (for intra-mode) or motion compensation g ( ) p errors (for inter-mode) in each of the DCT blocks (8x8) are DCT transformed, quantized, zig-zag/alternate scanned & run-length coded

Kasaei 20

scanned, & run length coded.

SLIDE 21

Macroblock Coding i B M d in B-Mode

Same as for the P-mode, except that a macroblock

can be predicted from a previous frame, a following

ne or both
ne, or both.

v f

vb

v f

Kasaei 21

SLIDE 22

Overlapped Block Motion C (O C) Compensation (OBMC)

Conventional block motion compensation:
Conventional block motion compensation:
One best matching block is found from a reference

frame.

The current block is replaced by the best matching

block.

OBMC:
OBMC:
Each pixel in the current block is predicted by a

weighted average of several corresponding pixels in the reference frame.

The corresponding pixels are determined by the

MVs of the current as well as adjacent MBs

Kasaei 22

MVs of the current as well as adjacent MBs.

The weights for each corresponding pixel depends
n the expected accuracy of the associated MV.

SLIDE 23

OBMC using 4-Neighboring MBs

weight assigned to estimated value based on MV (dm,k).

should be inversely should be inversely proportional to the distance between x & the center of ([hm,1 & hm,4] > [hm,2 & hm,3])

Kasaei 23

( ) K: total no. of neighboring blocks Bm: block

SLIDE 24

Optimal Weighting Design

Convert to an optimization problem:
Minimize:
Subject to:
Subject to:
Optimal weighting functions:

autocorrelation

Kasaei 24

autocorrelation cross-correlation

SLIDE 25

How to Determine MVs O C with OBMC

Option 1: using conventional BMA, minimize the

prediction error (MAD) within each MB i d d tl independently.

Option 2: minimize the prediction error

assuming OBMC: assuming OBMC:

Solve the MV for the current MB while

keeping the MVs for the neighboring MBs p g g g found in the previous iterations.

Kasaei 25

SLIDE 26

How to Determine MVs O C with OBMC

Option 3: Using a weighted error criterion over

a larger block:

window function function

Kasaei 26

SLIDE 27

Weighting Coefficients used in H.263

Left or Right Top or Bottom Current Block

Kasaei 27

SLIDE 28

Window Function Corresponding t H 263 W i ht f OBMC to H.263 Weights for OBMC

Kasaei 28

SLIDE 29

C S Coding Parameter Selection

Coding modes

Coding modes:
Intra vs. inter, quantization parameter (QP) for

each MB, ME method ( different rates). each MB, ME method ( different rates).

Rate-distortion optimized selection, given a

target rate:

Minimize the distortion, subject to the target rate

constraint:

simplified version:

The optimal mode is such that each MB works at the same R-D slope:

SLIDE 30

C Rate Control

Rate control:
How to code a video so that the resulting bit

stream satisfies a target bit rate? g

For pleasant visual perception, video should have

a constant quality.

But, the coding method necessarily yields variable

g y y bit rate.

So, at least the bit rate should be constant when

averaged over a short period.

Rate control is also necessary when the video is to

be sent over a constant bit rate (CBR) channel.

The fluctuation within the period can be smoothed

b b ff t th d t t

Kasaei 30

by a buffer at the encoder output.

SLIDE 31

C Rate Control

Rate control accomplished steps:

Step 1) Determine the target average bit rate at

the frame GOB & MB level based on the the frame, GOB, & MB level, based on the current buffer fullness.

Step 2) Satisfy frame level target rate by varying

f ( ki f h ) frame rate (skip frames when necessary).

Step 3) Satisfy GOB/MB level target rate by

varying the coding mode & QP at each MB. y g g

(= Rate-distortion optimized mode selection.)

Kasaei 31

SLIDE 32

Loop Filtering

Errors in previously reconstructed frames

accumulate over time with motion compensated t l di ti temporal prediction.

Error propagation leads to:

Reduction of prediction accuracy

Reduction of prediction accuracy.
Increase of the bitrate needed for coding new

frames.

Kasaei 32

SLIDE 33

Loop Filtering

Loop filtering:
Filters the reference frame before using it for

prediction.

Can be embedded in the motion compensation

loop loop.

Half-pel accuracy motion compensation.
OBMC.
Loop filtering can significantly improve coding

efficiency. For theoretically optimal design of loop filters

Kasaei 33

For theoretically optimal design of loop filters

see text.

SLIDE 34

S C (C ) Scalable Coding (Ch. 11)

Scalability refers to the capability of recovering

physically meaningful image (or video) information by decoding only partial compressed information by decoding only partial compressed bit stream.

It is used when users try to access the same

id th h diff t i ti li k video through different communication links (bandwidth scalability).

A scalable stream can also offer adaptivity to

p y varying channel error characteristics & computing power at the receiving terminal.

This includes: quality spatial temporal &

Kasaei 34

This includes: quality, spatial, temporal, &

frequency scalability.

SLIDE 35

S C Scalable Coding

Motivation:
Real networks are heterogeneous in rate.
Streaming video from home (56 kbps) using

d t LAN (10 100 b ) modem vs. corporate LAN (10-100 mbps).

Scalable video coding:
Ideal goal (embedded stream): creating a bit
Ideal goal (embedded stream): creating a bit

stream that can be accessed at any rate.

Practical video coder:
Layered coder: base-layer provides basic quality,

successive layers refine the quality incrementally.

Coarse granularity: (typically known as layered

Kasaei 35

Coarse granularity: (typically known as layered

coder).

Fine granularity (FGS).

SLIDE 36

Q S Quality Scalability

finer quantize of the difference between

riginal DCT coefficients

& coarsely quantized base-layer coefficients coarse quantization

Kasaei 36

quantization

SLIDE 37

S S Spatial Scalability

Kasaei 37

SLIDE 38

Combined Spatial/Quality Scalability

Kasaei 38

SLIDE 39

f S C Illustration of Scalable Coding

bility 6.5 kbps 133.9 kbps patial Scalab Sp 21 6 kbps 436 3 kbps

Kasaei 39

21.6 kbps 436.3 kbps Quality (SNR) Scalability

SLIDE 40

Quality Scalability Q by Multistage Quantization

coarse fine

Kasaei 40

SLIDE 41

Spatial/Temporal Scalability / S through Down/Up-Sampling

coarse fi fine

Kasaei 41

SLIDE 42

S G Scalability in MPEG-2

MPEG-2 is the earliest standard that offers

scalability tools.

Four types of scalability:

Data partition (frequency scalability). SNR scalability (quality scalability). Temporal scalability (frame-rate scalability). Spatial scalability (resolution scalability).

Kasaei 42

SLIDE 43

Fine Granularity Scalability in G MPEG-4

MPEG 4 achieves fine granularity quality MPEG-4 achieves fine granularity quality

scalability through bit-plane coding.

The DCT coefficients are represented The DCT coefficients are represented

losslessly in binary bits.

The bit planes are coded successively, from

the most significant bit to the least the most significant bit to the least.

The bit plane within each block is coded

using run-length coding. g g g

Temporal scalability is accomplished by

combining I, B, & P-frames.

Kasaei 43

Spatial scalability is achieved by spatial

down/up-sampling.

SLIDE 44

C Bit-Plane Coding

Kasaei 44

SLIDE 45

C Bit-Plane Coding

Kasaei 45

SLIDE 46

C Bit-Plane Coding

Kasaei 46

SLIDE 47

Scalable Video Coding f using Wavelet Transform

Wavelet-based image coding:

Full frame image transform (as opposed to

block-based transform).

Bit plane coding of the transform coefficients

l d t b dd d bit t can lead to embedded bit streams.

Embedded zero-tree wavelet (EZW)

SPIHT SPIHT.

Kasaei 47

SLIDE 48

Scalable Video Coding using f Wavelet Transforms

Wavelet-based video coding:

Temporal filtering with & without motion

compensation.

Can achieve temporal, spatial, & quality

l bilit i lt l scalability simultaneously.

MPEG4 uses DCT-based coding for natural

videos but uses WT-based coding for still videos, but uses WT-based coding for still images & graphics.

Still an active research activity!

Kasaei 48

y

SLIDE 49

C Wavelet-Based Video Coding

Kasaei 49

Coded images by (a) JPEG baseline method (PSNRs for Y, U, and V components are 28.36, 34.74, 34.98 dB, respectively); (b) the MZTE method (PSNRs for Y, U, and V components are 30.98, 41.68, 40.14 dB, respectively). Both at a compression ratio of 45:1.

SLIDE 50

C Wavelet-Based Video Coding

Comparison of SA DWT (SA ZTE in this case) coding with SA DCT coding

Kasaei 50

Comparison of SA-DWT (SA-ZTE in this case) coding with SA-DCT coding. SA: Shape-Adaptive.

SLIDE 51

C Wavelet-Based Video Coding

Reconstructed object using SA-DCT & SA-ZTE: (a) SA-DCT (1.0042bpp, PSNR-Y=37.09dB; PSNR-U=42.14dB; PSNR-V=42.36dB);

Kasaei 51

(b) SA-ZTE (0.9538bpp, PSNR-Y=38.06dB; PSNR-U=43.43dB; PSNR-V=43.25dB). SA: Shape-Adaptive.

SLIDE 52

C Wavelet-Based Video Coding

Kasaei 52

Block diagram of a wavelet-based video codec.

SLIDE 53

Homework 9

Reading assignment:

Sec. 9.3, 11.1
Sec. 9.3, 11.1

Written assignment:

Prob 9 13 11 3 11 4

Prob. 9.13, 11.3, 11.4

Computer assignment:

P b 9 11 9 12

Prob. 9.11, 9.12

Optional: 9.15

Kasaei 53

SLIDE 54