In the name of Allah f the compassionate, the merciful Digital - - PowerPoint PPT Presentation
In the name of Allah f the compassionate, the merciful Digital - - PowerPoint PPT Presentation
In the name of Allah f the compassionate, the merciful Digital Video Processing Digital Video Processing S. S. Kasaei Kasaei Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu @
In the name of Allah f
the compassionate, the merciful
Digital Video Processing Digital Video Processing
S.
- S. Kasaei
Kasaei
Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu @ Webpage: http://sharif.edu/~skasaei
- Lab. Website: http://ipl.ce.sharif.edu
Acknowledgment g
Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].
Chapter 6 Chapter 6
2-D Motion Estimation
Part II: Advanced Techniques
Outline
Problems with EBMA Deformable block matching algorithm (DBMA):
Node-based motion model
Mesh-based motion estimation:
Mesh-based motion representation Mesh-based motion estimation
Global motion estimation:
G oba
- t o
est at o
Direct method Indirect method
Region-based motion estimation Region based motion estimation Multiresolution motion estimation:
Hierarchical block matching algorithm (HBMA)
Summary
Kasaei 6
Summary
Problems with Exhaustive Block-Matching Algorithm (EBMA)
Blocking artifact (discontinuity across block
- c
g a ac (d sco u y ac oss b oc boundaries) in the predicted image:
Because the block-wise translation model is not accurate.
Real motion in a block may be more complicated than a
Real motion in a block may be more complicated than a
pure translation (rotation, zooming, multiple objects, …).
- Fix: Deformable BMA:
Uses a more sophisticated model: affine bilinear or perspective Uses a more sophisticated model: affine, bilinear, or perspective
mapping (to describe block motion).
Kasaei 7
Problems with EBMA
There may be multiple objects with different There may be multiple objects with different
motions in a block.
Fix: Region-based motion estimation. Mesh-based motion estimation (using adaptive meshes).
Intensity changes may be due to illumination Intensity changes may be due to illumination
effect:
Should compensate for illumination effect before
applying the “constant intensity assumption”.
Kasaei 8
Problems with EBMA
Motion field is somewhat chaotic:
Because MVs are estimated independently from block to
block.
Fix: Fix:
- Imposing smoothness constraint explicitly.
- Multiresolution approach.
- Mesh-based motion estimation
- Mesh-based motion estimation.
Wrong MV in flat regions:
This is because motion is indeterminate when spatial
di t i gradient is near zero.
Ideally, should use non-regular partitions. Fix: region-based motion estimation.
Kasaei 9
g
Problems with EBMA
Requires tremendous computation! Requires tremendous computation!
Fix: Fast algorithms.
Fast algorithms.
Multiresolution approaches.
Kasaei 10
Deformable Block-Matching Algorithm (DBMA)
Kasaei 11
Allowed block deformation depends on the used motion model.
Overview of DBMA
Three steps:
p
Partition the anchor frame into regular blocks. Model the motion in each block by a more complex
motion motion.
A 2-D motion caused by a flat surface patch undergoing
a rigid 3-D motion can be approximated well by a projective mapping projective mapping.
Projective mapping can be approximated by: affine mapping + bilinear mapping. Various possible mappings can be described by a node-
based motion model.
Kasaei 12
Overview of DBMA
Estimate the motion parameters block by block
p y independently.
Discontinuity problem cross block boundaries still
remains remains.
Still cannot solve the problem of multiple motions
within a block or changes due to illumination effect! g
Kasaei 13
Problems with DBMA
There might be motion discontinuity across block
g y boundaries (because nodal MVs are estimated independently from block to block):
Fix: mesh-based motion estimation Fix: mesh-based motion estimation. First apply EBMA to all blocks.
Kasaei 14
Problems with DBMA
Cannot do well on blocks with multiple moving
Cannot do well on blocks with multiple moving
- bjects or changes due to illumination effect.
Three mode method:
- First, apply EBMA to all blocks.
- Blocks with small EBMA errors have translational motion.
- Blocks with large EBMA errors may have non-translational
g y motion.
First, apply DBMA to these blocks. Blocks still having errors are non-motion compensable.
g
- [Ref] O. Lee and Y. Wang, Motion compensated prediction
using nodal-based deformable block matching. J. Visual Communications and Image Representation (March 1995), 6 26 34
Kasaei 15
6:26-34
Affine & Bilinear Model
Affine (6 parameters):
Affine (6 parameters):
Good for mapping triangles to triangles.
d ) ( + + + + = y b x b b y a x a a y x d y x d
y x 2 1 2 1
) , ( ) , (
Bilinear (8 parameters):
Good for mapping blocks to quadrangles.
+ + + + + + = xy b y b x b b xy a y a x a a y x d y x d
y x 3 2 1 3 2 1
) , ( ) , (
Kasaei 16
Difficulties in Estimating Affine & Bilinear Motion Parameters
The coefficients need floating point precision. The coefficients have different influence on the The coefficients have different influence on the
estimated motion.
0-th order coefficients (a0,b0) represent the translation
component.
Other coefficients’ influence depends on pixel
coordinates.
Kasaei 17
Node-Based Motion Model
Control nodes can move freely; in this example: block freely; in this example: block corners. Motion in other points are i t l t d f th d l interpolated from the nodal MVs, dm,k. Control node MVs can be described with integer- or half- pel accuracy, all have the same importance.
di l t t i t i l t
Translation (1-node), affine (3-nodes), & bilinear (4-nodes) are special cases of this
displacement at any point in element m
Kasaei 18
p model.
“interpolation kernel” associated with node k in element m
Interpolation Kernels
To guarantee continuity across element boundary:
g y y
Shape functions of standard triangular element:
Affine function.
Kasaei 19
Estimation of Nodal Motions
Shape functions of standard quadrilateral
p q element:
Bilinear function.
Objective DFD function:
Kasaei 20
Difficult to calculate!
Estimation of Nodal Motions
Search method:
Search method:
Exhaustive search:
- Search K nodal MVs simultaneously in integer- or half-pel
accuracy (may not be feasible in practice) accuracy (may not be feasible in practice).
Gradient descent approach:
- See textbook for the Newton-Raphson update algorithm.
- Solution depends on the initial solution.
A good initial solution is the translation MV found using EBMA. One can use the average of the motion vectors of the 4 blocks
g attached to each node as the initial estimate of the MV for that
- node. It will then be updated.
Kasaei 21
Mesh-Based Motion Estimation (An Overview)
non-overlapping polygonal elements
triangular h mesh quadrilateral mesh
Kasaei 22
Mesh-Based vs. Block- Based Motion Estimation Based Motion Estimation
block-based backward ME (blocking artifacts) mesh-based backward ME (continuous tracking better to (continuous tracking, better to have separate meshes for different objects) mesh-based forward ME
Mesh-Based Motion Model
- The motion in each element is interpolated from nodal MVs:
- Mesh based vs node based model:
- Mesh-based vs. node-based model:
- Mesh-based: Each node has a single MV, which influences the
motion of all four adjacent elements.
- Node-based: Each node can have four different MVs depending
Kasaei 24
p g
- n within which element it is considered to be in.
Mesh Generation & Motion Estimation
Two problems:
p
Given a mesh in the anchor frame, determine nodal
positions in the target frame – Motion estimation.
Set up the mesh in the anchor frame so that the mesh Set up the mesh in the anchor frame, so that the mesh
conforms with object boundaries – Mesh generation.
- Backward ME: can use either regular mesh or object adaptive
mesh at each new frame.
Motion estimation is easier with a regular mesh, but adaptive
mesh can yield more accurate result.
- Forward ME:
Only needs to establish a mesh for the initial frame. Meshes in the
following frames depend on the nodal MVs between successive frames.
To accommodate appearing/disappearing objects, the mesh
Kasaei 25
To accommodate appearing/disappearing objects, the mesh
geometry needs to be updated.
We only discuss motion estimation problem here.
Estimation of Nodal Motion
- Unlike DBMA, all nodal MVs should be estimated simultaneously.
- Unless the anchor frame uses a regular mesh, the interpolation
kernels are complicated kernels are complicated.
- To simplify, use a mapping to a master element:
*
u
Kasaei 26
* *
mapping function [J(u): Jacobian]
Estimation of Nodal Motion (cntd)
- Simplification:
theoretical
- Update one node at a time,
minimizing DFD over all adjacent elements.
G di t d t th d [W theoretical limit practical limit
- Gradient descent method [Wang
and Lee 1994].
- Exhaustive search [Wang and
Ostermann 1998].
- Update order is important:
- First, update those nodes where
motion can be estimated accurately (near edges) accurately (near edges).
- Motion of this node should be
constrained not to cause excessively deformed elements.
Search range for node n.
Kasaei 27
y
e e nchor frame arget frame an ta 86dB) frame (29.8
- n field
ted anchor Moti Predict Example: Half-pel EBMA
B) MA (29.86dB EBM dB) hod (29.72d based meth mesh-b EBMA vs. Mesh-based Motion Estimation.
Estimation of Nodal Motion (cntd)
In order to handle newly appearing or In order to handle newly appearing or
disappearing objects in a scene, one should allow for the deletion of nodes corresponding p g to disappeared objects, and the creation of new nodes in newly appearing objects. y pp g j
Kasaei 30
Global Motion Estimation
Global motion is caused by a camera motion, or if
y , the imaged scene consists of a single object undergoing a rigid 3-D motion:
Camera moving over a stationary scene Camera moving over a stationary scene.
- Most projected camera motions can be captured by affine
mapping!
The scene moves in its entirety (a rare event)! The scene moves in its entirety (a rare event)! The motion at any pixel can be decomposed into a global
motion (caused by camera movement) & a local motion b f th t f th d l i bj t because of the movement of the underlying object.
Typically, the scene can be decomposed into several major
regions, each moving differently (region-based ME).
Kasaei 31
Global Motion Estimation
If there is indeed a global motion, or the region
g , g undergoing a coherent motion has been determined, we can determine the motion parameters by:
Direct ME: Direct ME:
- Estimate global motion parameters directly by minimizing
prediction errors.
Indirect ME: Indirect ME:
- First, determine MVs.
- Then, use a regression method to find the global motion
model that best fits the estimated motion field model that best fits the estimated motion field.
Kasaei 32
Global Motion Estimation
A pixel may not experience only a global motion.
p y p y g
Obtained prediction error may be large (even with
correct global motion parameters). Al t ll th i l i th l b l
Also, not all the pixels may experience the global
motion.
To fix: use robust estimator. To fix: use robust estimator.
Iteratively determines the motion parameters & the
pixels undergoing that motion. C id th i l th t d b th l b l
Considers the pixels that are governed by the global
motion as inliers,& the remaining pixels as outliers (hard/soft threshold robust estimator).
Kasaei 33
Direct Estimation
First, parameterize the DFD error in terms of the motion
, p parameters.
Then, estimate these parameters by minimizing DFD:
Weighting wn coefficients depend on the importance of pixel xn. Ex: Affine motion:
T n n n x
b b b a a a b b b y a x a a d d ] , , , , , [ , ) ; ( ) ; (
2 1 2 1 2 1
= + + + + = a a x a x
n n n y
y b x b b d ) ; (
2 1
+ + a x Exhaustive search or gradient descent method can be used to find a that minimizes the EDFD error.
Kasaei 34 DFD
Indirect Estimation
First, find the dense motion field using a pixel-based or
, g p block-based approach (e.g., EBMA).
Then, parameterize the resulting motion field using the
ti d l th h l t fitti motion model through least squares fitting.
n n n fit
w E d a x d : motion Affine ) ) ; ( (
2
∑
− =
Weighting wn coefficients depend
- n the accuracy of estimated
motion at x
n n n n n n n
y x y x A a A a x d 1 1 ] [ , ] [ ) ; ( = =
motion at xn.
( ) ( )
T T n n T n n fit n n
w E y d A A A a d a A A a ] [ ] [ ] [ ) ] ([ ] [
1 ∑
∑ ∑
−
= − = ∂ ∂
Kasaei 35
( ) ( )
n T n n n T n n
w w d A A A a ] [ ] [ ] [
∑ ∑
=
Illustration of Robust Estimator
Fitting a line to the data points by using LMS and robust estimators [Courtesy of Fatih Porikli].
Kasaei 36
g p y g [ y ]
Robust Estimator
Essence: iteratively removing “outlier” pixels. Essence: iteratively removing outlier pixels.
1.
Set the region to include all pixels in a frame.
2.
Apply the direct (or indirect) method over all pixels
2.
Apply the direct (or indirect) method over all pixels in the region.
3.
Evaluate errors (EDFD or Efit) at all pixels in the region.
4.
Eliminate “outlier” pixels with large errors.
5.
Repeat steps 2-4 for the remaining pixels in the region.
Kasaei 37
Region-Based Motion Estimation
Assumption: the scene consists of multiple objects,
Assumption: the scene consists of multiple objects, with the region corresponding to each object (or sub-object) having a coherent motion.
Physically more correct than block-based, mesh-based, &
global motion model.
Kasaei 38
Region-Based Motion Estimation
Method:
Method:
Region First: Segment the frame into multiple regions
based on texture/edges, then estimate motion in each region using the global motion estimation method region using the global motion estimation method.
Motion First: Estimate a dense motion field, then segment
the motion field so that motion in each region can be accurately modeled by a single set of parameters.
Joint region-segmentation & motion estimation: iterate the
two processes. p
Kasaei 39
Multiresolution Motion Estimation
Problems with BMA:
Unless exhaustive search is used, the solution may not be
the global minimum.
Exhaustive search requires extremely large amount of Exhaustive search requires extremely large amount of
computations.
Block-wise translation motion model is not always
appropriate appropriate.
Kasaei 40
Multiresolution Motion Estimation
Multiresolution approach:
Multiresolution approach:
Aims at solving the first two problems. First, estimate the motion in a coarse resolution over low-
filt d & d l d i i pass filtered & down-sampled image pair.
Can usually lead to a solution close to the true motion
field.
Then, modify the initial solution in successively finer
resolutions within a small search range.
Reduces the computational burden Reduces the computational burden.
Can be applied on different motion representations, but we
will focus on its application to BMA.
Kasaei 41
Hierarchical Block Matching Algorithm (HBMA)
Kasaei 42
Kasaei 43
9.32dB) r frame (29 cted ancho
Kasaei 44
Predic Example: Three-level HBMA.
e e nchor frame arget frame an ta 86dB) frame (29.8
- n field
ted anchor Moti
Kasaei 45
Predict Example: Half-pel EBMA.
Computational Requirement of HBMA
Operation counts for HBMA:
Operation counts for HBMA:
Image size: MxM; Block size: NxN at every level; Levels: L
Search range:
- 1st level: (Equivalent to R in L-th level).
- Other levels: (can be smaller).
- No. of blocks at the L-th level:
- No. of blocks at the L th level:
Total no. of operations:
Operation counts for EBMA:
Image size: MxM; Block size: NxN; Search range R
- No. of candidate matching blocks for each block:
Total no of operations:
( )2
2
1 2 + R M
Kasaei 46
Total no. of operations:
( )2
2
1 2 + R M
Computation Requirement of HBMA
Operation counts at L-th level (Image size:
): p ( g ) T t l ti t
( ) ( )
2 1 2
1 2 / 2 2 / +
− − L l L
R M
Total operation count:
( ) ( )
2 2 ) 2 ( 2 1 2
4 4 3 1 1 2 / 2 2 / R M R M
L L L l L − − − −
≈ +
∑
Saving factor:
( ) ( )
1
3
l=
∑
) 3 ( 12 ); 2 ( 3 4 3
) 2 (
= = = ×
−
L L
L
Kasaei 47
Summary
Fundamentals:
Optical flow equation
- Derived from constant intensity & small motion assumptions.
Ambiguity in motion estimation
- Ambiguity in motion estimation.
How to represent motion:
- Pixel-based, block-based, region-based, mesh-based, global, …
Estimation criterion:
- DFD (constant intensity).
- OF (constant intensity+small motion)
- OF (constant intensity+small motion).
- Bayesian (MAP, DFD+motion smoothness).
Search method:
Kasaei 48
- Exhaustive search, gradient-descent, multiresolution.
Summary (Cntd)
Basic techniques:
Basic techniques:
Pixel-based motion estimation. Block-based motion estimation.
- EBMA, integer-pel vs. half-pel accuracy, fast algorithms.
More advanced techniques:
Deformable block matching algorithm (DBMA): Deformable block matching algorithm (DBMA):
- To allow more complex motion within each block.
Mesh-based motion estimation:
- To enforce continuity of motion across block boundaries.
Kasaei 49
Summary (Cntd)
Global motion estimation:
- Good for estimating camera motion.
Region-based motion estimation: Region-based motion estimation:
- More physically correct: allows different motion in each sub-
- bject region.
Multiresolution approach: Multiresolution approach:
- Avoids local minima, produces smooth motion fields, reduces
computations.
Application in Video Coding Application in Video Coding.
Kasaei 50
Homework 5
Reading assignment:
Read Secs. 6.5-6.10. Go through & verify the gradient descent algorithm presented for
DBMA (Eqs. 6.5.2-6.5.6).
Go through the derivation of the objective function definition (Eq.
6.6.6-6.6.8) for mesh-based motion estimation carefully, & verify the gradient function given in Eq. 6.6.9.
A i t
Assignment:
- Prob. 6.9, 6.10, 6.16, 6.15 (computer assignment).
Kasaei 51
Homework 5
Optional computer assignment:
Assuming the motion between two frames can be approximated
by an affine mapping,determine the affine parameters using the indirect method First apply the HBMA (or EBMA) algorithm you indirect method. First apply the HBMA (or EBMA) algorithm you implemented, to determine a block-wise motion field between two
- frames. Then determine the affine parameters using the weighted
least squares method (Eq. 6.7.3). Show the predicted image ( ) g based on the affine parameters and the associated prediction error (in terms of PSNR). Compared them to those obtained with the original block-based motion estimation. Note: You should apply you algorithm to two video frames experiencing apply you algorithm to two video frames experiencing predominantly camera motion. To test the accuracy of your algorithm, you may want to artificially generate a pair of frames, where one frame is the affine mapping of another.
Kasaei 52 Implement the direct method (Prob. 6.17), & compare the results.