[PPT] - In the name of Allah f the compassionate, the merciful Digital PowerPoint Presentation

SLIDE 1

SLIDE 2

In the name of Allah f

the compassionate, the merciful

SLIDE 3

Digital Video Processing Digital Video Processing

S.

S. Kasaei

Kasaei

Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu @ Webpage: http://sharif.edu/~skasaei

Lab. Website: http://ipl.ce.sharif.edu

SLIDE 4

Acknowledgment g

Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].

SLIDE 5

Chapter 6 Chapter 6

2-D Motion Estimation

Part II: Advanced Techniques

SLIDE 6

Outline

Problems with EBMA Deformable block matching algorithm (DBMA):

Node-based motion model

Mesh-based motion estimation:

Mesh-based motion representation Mesh-based motion estimation

Global motion estimation:

G oba

t o

est at o

Direct method Indirect method

Region-based motion estimation Region based motion estimation Multiresolution motion estimation:

Hierarchical block matching algorithm (HBMA)

Summary

Kasaei 6

Summary

SLIDE 7

Problems with Exhaustive Block-Matching Algorithm (EBMA)

Blocking artifact (discontinuity across block

c

g a ac (d sco u y ac oss b oc boundaries) in the predicted image:

Because the block-wise translation model is not accurate.

Real motion in a block may be more complicated than a

pure translation (rotation, zooming, multiple objects, …).

Fix: Deformable BMA:

Uses a more sophisticated model: affine bilinear or perspective Uses a more sophisticated model: affine, bilinear, or perspective

mapping (to describe block motion).

Kasaei 7

SLIDE 8

Problems with EBMA

There may be multiple objects with different There may be multiple objects with different

motions in a block.

Fix: Region-based motion estimation. Mesh-based motion estimation (using adaptive meshes).

Intensity changes may be due to illumination Intensity changes may be due to illumination

effect:

Should compensate for illumination effect before

applying the “constant intensity assumption”.

Kasaei 8

SLIDE 9

Problems with EBMA

Motion field is somewhat chaotic:

Because MVs are estimated independently from block to

block.

Fix: Fix:

Imposing smoothness constraint explicitly.
Multiresolution approach.
Mesh-based motion estimation
Mesh-based motion estimation.

Wrong MV in flat regions:

This is because motion is indeterminate when spatial

di t i gradient is near zero.

Ideally, should use non-regular partitions. Fix: region-based motion estimation.

Kasaei 9

g

SLIDE 10

Problems with EBMA

Requires tremendous computation! Requires tremendous computation!

Fix: Fast algorithms.

Fast algorithms.

Multiresolution approaches.

Kasaei 10

SLIDE 11

Deformable Block-Matching Algorithm (DBMA)

Kasaei 11

Allowed block deformation depends on the used motion model.

SLIDE 12

Overview of DBMA

Three steps:

p

Partition the anchor frame into regular blocks. Model the motion in each block by a more complex

motion motion.

A 2-D motion caused by a flat surface patch undergoing

a rigid 3-D motion can be approximated well by a projective mapping projective mapping.

Projective mapping can be approximated by: affine mapping + bilinear mapping. Various possible mappings can be described by a node-

based motion model.

Kasaei 12

SLIDE 13

Overview of DBMA

Estimate the motion parameters block by block

p y independently.

Discontinuity problem cross block boundaries still

remains remains.

Still cannot solve the problem of multiple motions

within a block or changes due to illumination effect! g

Kasaei 13

SLIDE 14

Problems with DBMA

There might be motion discontinuity across block

g y boundaries (because nodal MVs are estimated independently from block to block):

Fix: mesh-based motion estimation Fix: mesh-based motion estimation. First apply EBMA to all blocks.

Kasaei 14

SLIDE 15

Problems with DBMA

Cannot do well on blocks with multiple moving

bjects or changes due to illumination effect.

Three mode method:

First, apply EBMA to all blocks.
Blocks with small EBMA errors have translational motion.
Blocks with large EBMA errors may have non-translational

g y motion.

First, apply DBMA to these blocks. Blocks still having errors are non-motion compensable.

g

[Ref] O. Lee and Y. Wang, Motion compensated prediction

using nodal-based deformable block matching. J. Visual Communications and Image Representation (March 1995), 6 26 34

Kasaei 15

6:26-34

SLIDE 16

Affine & Bilinear Model

Affine (6 parameters):

Good for mapping triangles to triangles.

   d ) (       + + + + =       y b x b b y a x a a y x d y x d

y x 2 1 2 1

) , ( ) , (

Bilinear (8 parameters):

Good for mapping blocks to quadrangles.

      + + + + + + =       xy b y b x b b xy a y a x a a y x d y x d

y x 3 2 1 3 2 1

) , ( ) , (

Kasaei 16

SLIDE 17

Difficulties in Estimating Affine & Bilinear Motion Parameters

The coefficients need floating point precision. The coefficients have different influence on the The coefficients have different influence on the

estimated motion.

0-th order coefficients (a0,b0) represent the translation

component.

Other coefficients’ influence depends on pixel

coordinates.

Kasaei 17

SLIDE 18

Node-Based Motion Model

Control nodes can move freely; in this example: block freely; in this example: block corners. Motion in other points are i t l t d f th d l interpolated from the nodal MVs, dm,k. Control node MVs can be described with integer- or half- pel accuracy, all have the same importance.

di l t t i t i l t

Translation (1-node), affine (3-nodes), & bilinear (4-nodes) are special cases of this

displacement at any point in element m

Kasaei 18

p model.

“interpolation kernel” associated with node k in element m

SLIDE 19

Interpolation Kernels

To guarantee continuity across element boundary:

g y y

Shape functions of standard triangular element:

Affine function.

Kasaei 19

SLIDE 20

Estimation of Nodal Motions

Shape functions of standard quadrilateral

p q element:

Bilinear function.

Objective DFD function:

Kasaei 20

Difficult to calculate!

SLIDE 21

Estimation of Nodal Motions

Search method:

Exhaustive search:

Search K nodal MVs simultaneously in integer- or half-pel

accuracy (may not be feasible in practice) accuracy (may not be feasible in practice).

Gradient descent approach:

See textbook for the Newton-Raphson update algorithm.
Solution depends on the initial solution.

A good initial solution is the translation MV found using EBMA. One can use the average of the motion vectors of the 4 blocks

g attached to each node as the initial estimate of the MV for that

node. It will then be updated.

Kasaei 21

SLIDE 22

Mesh-Based Motion Estimation (An Overview)

non-overlapping polygonal elements

triangular h mesh quadrilateral mesh

Kasaei 22

SLIDE 23

Mesh-Based vs. Block- Based Motion Estimation Based Motion Estimation

block-based backward ME (blocking artifacts) mesh-based backward ME (continuous tracking better to (continuous tracking, better to have separate meshes for different objects) mesh-based forward ME

SLIDE 24

Mesh-Based Motion Model

The motion in each element is interpolated from nodal MVs:
Mesh based vs node based model:
Mesh-based vs. node-based model:
Mesh-based: Each node has a single MV, which influences the

motion of all four adjacent elements.

Node-based: Each node can have four different MVs depending

Kasaei 24

p g

n within which element it is considered to be in.

SLIDE 25

Mesh Generation & Motion Estimation

Two problems:

p

Given a mesh in the anchor frame, determine nodal

positions in the target frame – Motion estimation.

Set up the mesh in the anchor frame so that the mesh Set up the mesh in the anchor frame, so that the mesh

conforms with object boundaries – Mesh generation.

Backward ME: can use either regular mesh or object adaptive

mesh at each new frame.

Motion estimation is easier with a regular mesh, but adaptive

mesh can yield more accurate result.

Forward ME:

Only needs to establish a mesh for the initial frame. Meshes in the

following frames depend on the nodal MVs between successive frames.

To accommodate appearing/disappearing objects, the mesh

Kasaei 25

To accommodate appearing/disappearing objects, the mesh

geometry needs to be updated.

We only discuss motion estimation problem here.

SLIDE 26

Estimation of Nodal Motion

Unlike DBMA, all nodal MVs should be estimated simultaneously.
Unless the anchor frame uses a regular mesh, the interpolation

kernels are complicated kernels are complicated.

To simplify, use a mapping to a master element:

*

u

Kasaei 26

* *

mapping function [J(u): Jacobian]

SLIDE 27

Estimation of Nodal Motion (cntd)

Simplification:

theoretical

Update one node at a time,

minimizing DFD over all adjacent elements.

G di t d t th d [W theoretical limit practical limit

Gradient descent method [Wang

and Lee 1994].

Exhaustive search [Wang and

Ostermann 1998].

Update order is important:
First, update those nodes where

motion can be estimated accurately (near edges) accurately (near edges).

Motion of this node should be

constrained not to cause excessively deformed elements.

Search range for node n.

Kasaei 27

y

SLIDE 28

e e nchor frame arget frame an ta 86dB) frame (29.8

n field

ted anchor Moti Predict Example: Half-pel EBMA

SLIDE 29

B) MA (29.86dB EBM dB) hod (29.72d based meth mesh-b EBMA vs. Mesh-based Motion Estimation.

SLIDE 30

Estimation of Nodal Motion (cntd)

In order to handle newly appearing or In order to handle newly appearing or

disappearing objects in a scene, one should allow for the deletion of nodes corresponding p g to disappeared objects, and the creation of new nodes in newly appearing objects. y pp g j

Kasaei 30

SLIDE 31

Global Motion Estimation

Global motion is caused by a camera motion, or if

y , the imaged scene consists of a single object undergoing a rigid 3-D motion:

Camera moving over a stationary scene Camera moving over a stationary scene.

Most projected camera motions can be captured by affine

mapping!

The scene moves in its entirety (a rare event)! The scene moves in its entirety (a rare event)! The motion at any pixel can be decomposed into a global

motion (caused by camera movement) & a local motion b f th t f th d l i bj t because of the movement of the underlying object.

Typically, the scene can be decomposed into several major

regions, each moving differently (region-based ME).

Kasaei 31

SLIDE 32

Global Motion Estimation

If there is indeed a global motion, or the region

g , g undergoing a coherent motion has been determined, we can determine the motion parameters by:

Direct ME: Direct ME:

Estimate global motion parameters directly by minimizing

prediction errors.

Indirect ME: Indirect ME:

First, determine MVs.
Then, use a regression method to find the global motion

model that best fits the estimated motion field model that best fits the estimated motion field.

Kasaei 32

SLIDE 33

Global Motion Estimation

A pixel may not experience only a global motion.

p y p y g

Obtained prediction error may be large (even with

correct global motion parameters). Al t ll th i l i th l b l

Also, not all the pixels may experience the global

motion.

To fix: use robust estimator. To fix: use robust estimator.

Iteratively determines the motion parameters & the

pixels undergoing that motion. C id th i l th t d b th l b l

Considers the pixels that are governed by the global

motion as inliers,& the remaining pixels as outliers (hard/soft threshold robust estimator).

Kasaei 33

SLIDE 34

Direct Estimation

First, parameterize the DFD error in terms of the motion

, p parameters.

Then, estimate these parameters by minimizing DFD:

Weighting wn coefficients depend on the importance of pixel xn. Ex: Affine motion:

T n n n x

b b b a a a b b b y a x a a d d ] , , , , , [ , ) ; ( ) ; (

2 1 2 1 2 1

=       + + + + =       a a x a x

n n n y

y b x b b d ) ; (

2 1

    + +     a x Exhaustive search or gradient descent method can be used to find a that minimizes the EDFD error.

Kasaei 34 DFD

SLIDE 35

Indirect Estimation

First, find the dense motion field using a pixel-based or

, g p block-based approach (e.g., EBMA).

Then, parameterize the resulting motion field using the

ti d l th h l t fitti motion model through least squares fitting.

n n n fit

w E d a x d : motion Affine ) ) ; ( (

2

∑

− =

Weighting wn coefficients depend

n the accuracy of estimated

motion at x

n n n n n n n

y x y x A a A a x d 1 1 ] [ , ] [ ) ; (       = =

motion at xn.

( ) ( )

T T n n T n n fit n n

w E y d A A A a d a A A a ] [ ] [ ] [ ) ] ([ ] [

1 ∑

∑ ∑

−

= − = ∂ ∂  

Kasaei 35

( ) ( )

n T n n n T n n

w w d A A A a ] [ ] [ ] [

∑ ∑

=

SLIDE 36

Illustration of Robust Estimator

Fitting a line to the data points by using LMS and robust estimators [Courtesy of Fatih Porikli].

Kasaei 36

g p y g [ y ]

SLIDE 37

Robust Estimator

Essence: iteratively removing “outlier” pixels. Essence: iteratively removing outlier pixels.

1.

Set the region to include all pixels in a frame.

2.

Apply the direct (or indirect) method over all pixels

2.

Apply the direct (or indirect) method over all pixels in the region.

3.

Evaluate errors (EDFD or Efit) at all pixels in the region.

4.

Eliminate “outlier” pixels with large errors.

5.

Repeat steps 2-4 for the remaining pixels in the region.

Kasaei 37

SLIDE 38

Region-Based Motion Estimation

Assumption: the scene consists of multiple objects,

Assumption: the scene consists of multiple objects, with the region corresponding to each object (or sub-object) having a coherent motion.

Physically more correct than block-based, mesh-based, &

global motion model.

Kasaei 38

SLIDE 39

Region-Based Motion Estimation

Method:

Region First: Segment the frame into multiple regions

based on texture/edges, then estimate motion in each region using the global motion estimation method region using the global motion estimation method.

Motion First: Estimate a dense motion field, then segment

the motion field so that motion in each region can be accurately modeled by a single set of parameters.

Joint region-segmentation & motion estimation: iterate the

two processes. p

Kasaei 39

SLIDE 40

Multiresolution Motion Estimation

Problems with BMA:

Unless exhaustive search is used, the solution may not be

the global minimum.

Exhaustive search requires extremely large amount of Exhaustive search requires extremely large amount of

computations.

Block-wise translation motion model is not always

appropriate appropriate.

Kasaei 40

SLIDE 41

Multiresolution Motion Estimation

Multiresolution approach:

Aims at solving the first two problems. First, estimate the motion in a coarse resolution over low-

filt d & d l d i i pass filtered & down-sampled image pair.

Can usually lead to a solution close to the true motion

field.

Then, modify the initial solution in successively finer

resolutions within a small search range.

Reduces the computational burden Reduces the computational burden.

Can be applied on different motion representations, but we

will focus on its application to BMA.

Kasaei 41

SLIDE 42

Hierarchical Block Matching Algorithm (HBMA)

Kasaei 42

SLIDE 43

Kasaei 43

SLIDE 44

9.32dB) r frame (29 cted ancho

Kasaei 44

Predic Example: Three-level HBMA.

SLIDE 45

e e nchor frame arget frame an ta 86dB) frame (29.8

n field

ted anchor Moti

Kasaei 45

Predict Example: Half-pel EBMA.

SLIDE 46

Computational Requirement of HBMA

Operation counts for HBMA:

Image size: MxM; Block size: NxN at every level; Levels: L

Search range:

1st level: (Equivalent to R in L-th level).
Other levels: (can be smaller).
No. of blocks at the L-th level:
No. of blocks at the L th level:

Total no. of operations:

Operation counts for EBMA:

Image size: MxM; Block size: NxN; Search range R

No. of candidate matching blocks for each block:

Total no of operations:

( )2

2

1 2 + R M

Kasaei 46

Total no. of operations:

( )2

2

1 2 + R M

SLIDE 47

Computation Requirement of HBMA

Operation counts at L-th level (Image size:

): p ( g ) T t l ti t

( ) ( )

2 1 2

1 2 / 2 2 / +

− − L l L

R M

Total operation count:

( ) ( )

2 2 ) 2 ( 2 1 2

4 4 3 1 1 2 / 2 2 / R M R M

L L L l L − − − −

≈ +

∑

Saving factor:

( ) ( )

1

3

l=

∑

) 3 ( 12 ); 2 ( 3 4 3

) 2 (

= = = ×

−

L L

L

Kasaei 47

SLIDE 48

Summary

Fundamentals:

Optical flow equation

Derived from constant intensity & small motion assumptions.

Ambiguity in motion estimation

Ambiguity in motion estimation.

How to represent motion:

Pixel-based, block-based, region-based, mesh-based, global, …

Estimation criterion:

DFD (constant intensity).
OF (constant intensity+small motion)
OF (constant intensity+small motion).
Bayesian (MAP, DFD+motion smoothness).

Search method:

Kasaei 48

Exhaustive search, gradient-descent, multiresolution.

SLIDE 49

Summary (Cntd)

Basic techniques:

Pixel-based motion estimation. Block-based motion estimation.

EBMA, integer-pel vs. half-pel accuracy, fast algorithms.

More advanced techniques:

Deformable block matching algorithm (DBMA): Deformable block matching algorithm (DBMA):

To allow more complex motion within each block.

Mesh-based motion estimation:

To enforce continuity of motion across block boundaries.

Kasaei 49

SLIDE 50

Summary (Cntd)

Global motion estimation:

Good for estimating camera motion.

Region-based motion estimation: Region-based motion estimation:

More physically correct: allows different motion in each sub-
bject region.

Multiresolution approach: Multiresolution approach:

Avoids local minima, produces smooth motion fields, reduces

computations.

Application in Video Coding Application in Video Coding.

Kasaei 50

SLIDE 51

Homework 5

Reading assignment:

Read Secs. 6.5-6.10. Go through & verify the gradient descent algorithm presented for

DBMA (Eqs. 6.5.2-6.5.6).

Go through the derivation of the objective function definition (Eq.

6.6.6-6.6.8) for mesh-based motion estimation carefully, & verify the gradient function given in Eq. 6.6.9.

A i t

Assignment:

Prob. 6.9, 6.10, 6.16, 6.15 (computer assignment).

Kasaei 51

SLIDE 52

Homework 5

Optional computer assignment:

Assuming the motion between two frames can be approximated

by an affine mapping,determine the affine parameters using the indirect method First apply the HBMA (or EBMA) algorithm you indirect method. First apply the HBMA (or EBMA) algorithm you implemented, to determine a block-wise motion field between two

frames. Then determine the affine parameters using the weighted

least squares method (Eq. 6.7.3). Show the predicted image ( ) g based on the affine parameters and the associated prediction error (in terms of PSNR). Compared them to those obtained with the original block-based motion estimation. Note: You should apply you algorithm to two video frames experiencing apply you algorithm to two video frames experiencing predominantly camera motion. To test the accuracy of your algorithm, you may want to artificially generate a pair of frames, where one frame is the affine mapping of another.

Kasaei 52 Implement the direct method (Prob. 6.17), & compare the results.

SLIDE 53