[PPT] - Image Segmentation Perceptual and Sensory Augmented Computing Luc PowerPoint Presentation

SLIDE 1

Perceptual and Sensory Augmented Computing Computer Vision WS 0/09

Image Segmentation

Luc Van Gool, ETH Zurich

With important contributions by

Vittorio Ferrari, Un. of Edinburgh

Slide credits:

K. Grauman, B. Leibe, S. Lazebnik, S. Seitz,

Y Boykov, W. Freeman, P. Kohli

SLIDE 2

Perceptual and Sensory Augmented Computing

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with path search

SLIDE 3

Perceptual and Sensory Augmented Computing

Grouping in Vision

Slide credit: Kristen Grauman

Fast, bottom-up mechanisms to determine regions that belong together… … stepping stone between pixels/retina cell responses and scene interpretation. Psychophysics has listed features that seem to provoke perceptual grouping

SLIDE 4

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Similarity in appearance

http://chicagoist.com/attachments/chicagoist_alicia/GEESE.jpg, http://wwwdelivery.superstock.com/WI/223/1532/PreviewComp/SuperStock_1532R-0831.jpg

Slide adapted from Kristen Grauman

SLIDE 5

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Symmetry

http://seedmagazine.com/news/2006/10/beauty_is_in_the_processingtim.php

Slide credit: Kristen Grauman

SLIDE 6

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Common Fate

Image credit: Arthus-Bertrand (via F. Durand)

Slide credit: Kristen Grauman

SLIDE 7

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Proximity

http://www.capital.edu/Resources/Images/outside6_035.jpg

Slide credit: Kristen Grauman

SLIDE 8

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Gestalt School

Grouping is key to visual perception
`Belonging together’ is inferred from relationships

Ø “The whole is greater than the sum of its parts”

Illusory/subjective contours Occlusion Familiar configuration http://en.wikipedia.org/wiki/Gestalt_psychology

Slide credit: Svetlana Lazebnik

Image source: Steve Lehar

SLIDE 9

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gestalt Theory

Gestalt: whole or group

Ø Whole is greater than sum of its parts Ø Relationships among parts can yield new properties/features

Psychologists identified series of factors that predispose

set of elements to be grouped (by human visual system)

Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/Wertheimer/Forms/forms.htm

“I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees.”

Max Wertheimer

(1880-1943)

Slide credit: B. Leibe

SLIDE 10

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gestalt Factors

These factors make intuitive sense, but are very difficult to translate into algorithms.

Image source: Forsyth & Ponce

Slide credit: B. Leibe

SLIDE 11

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Figure-Ground Discrimination

Slide credit: B. Leibe

SLIDE 12

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Ultimate Gestalt test

Slide adapted from B. Leibe

SLIDE 13

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Image Segmentation

Goal: identify groups of pixels that go together

Slide credit: Steve Seitz, Kristen Grauman

SLIDE 14

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Goals of Segmentation

Separate image into objects

Image Human segmentation

Slide credit: Svetlana Lazebnik

SLIDE 15

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with path search

SLIDE 16

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Image Segmentation: Toy Example

These intensities define the three groups.
We could label every pixel in the image according to

which of these it is.

Ø i.e. segment the image based on the intensity feature.

What if the image isn’t quite so simple?

intensity input image

black pixels gray pixels white pixels

1 2 3

Slide credit: Kristen Grauman

SLIDE 17

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Pixel count Input image Input image Intensity Pixel count Intensity

Slide credit: Kristen Grauman

SLIDE 18

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Now how to determine the three main intensities that

define our groups?

We need to cluster.

Input image Intensity Pixel count

Slide credit: Kristen Grauman

SLIDE 19

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Goal: choose three “centers” as the representative

intensities, and label every pixel according to which of these centers it is nearest to.

Best cluster centers are those that minimize SSD

between all points and their nearest cluster center ci:

Slide credit: Kristen Grauman

190 255

1 2 3

Intensity

SLIDE 20

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Clustering

With this objective, it is a “chicken and egg” problem:

Ø If we knew the cluster centers, we could allocate points to

groups by assigning each to its closest center.

Ø If we knew the group memberships, we could get the centers by

computing the mean per group.

Slide credit: Kristen Grauman

SLIDE 21

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

K-Means Clustering

Basic idea: randomly initialize the k cluster centers, and

iterate between the two steps we just saw.

1. Randomly initialize the cluster centers, c1, ..., cK
2. Given cluster centers, determine points in each cluster

– For each point p, find the closest ci. Put p into cluster i

3. Given points in each cluster, solve for ci

– Set ci to be the mean of points in cluster i

4. If ci have changed, repeat Step 2
Properties

Ø

Will always converge to some solution

Ø

Can be a “local minimum”

– Does not always find the global minimum of objective function:

Slide credit: Steve Seitz

SLIDE 22

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation as Clustering

K=2 K=3

Slide credit: Kristen Grauman

SLIDE 23

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Feature Space

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based on

intensity similarity

Feature space: intensity value (1D)

Slide credit: Kristen Grauman

SLIDE 24

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Feature Space

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based
n color similarity
Feature space: color value (3D)

R=255 G=200 B=250 R=245 G=220 B=248 R=15 G=189 B=2 R=3 G=12 B=2

R G B

Slide credit: Kristen Grauman

SLIDE 25

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation as Clustering

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based
n texture similarity
Feature space: filter bank responses (e.g. 24D)

Filter bank

f 24 filters

F24 F2 F1

…

Slide credit: Kristen Grauman

SLIDE 26

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Spatial coherence

Assign a cluster label per pixel à

à possible discontinuities

How can we ensure they

are spatially smooth?

1 2 3

?

Original Labeled by cluster center’s intensity

Slide adapted from Kristen Grauman

SLIDE 27

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Spatial coherence

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based on

intensity+position similarity

Way to encode both similarity and proximity.

Slide adapted from Kristen Grauman

X Intensity Y

SLIDE 28

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary K-Means

Pros

Ø Simple, fast to compute Ø Converges to local minimum

f within-cluster squared error
Cons/issues

Ø Setting k? Ø Sensitive to initial centers Ø Sensitive to outliers Ø Detects spherical clusters only

Slide credit: Kristen Grauman

SLIDE 29

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Probabilistic Clustering

Basic questions

Ø What’s the probability that a point x is in cluster m? Ø What’s the shape of each cluster?

K-means doesn’t answer these questions.
Basic idea

Ø Instead of treating the data as a bunch of points, assume that

they are all generated by sampling a continuous function.

Ø This function is called a generative model. Ø Defined by a vector of parameters θ Ø No hard (as with K-means) but a soft assignment to different

clusters each with their probability

Slide credit: Steve Seitz

SLIDE 30

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mixture of Gaussians

One generative model is a mixture of Gaussians (MoG)

Ø K Gaussian blobs with means µb covariance matrices Vb, dimension d

– Blob b defined by:

Ø Blob b is selected with probability ( ) Ø The likelihood of observing x is a weighted mixture of Gaussians

,

Slide adapted from Steve Seitz

α b

b=1 K

∑

= 1

SLIDE 31

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Expectation Maximization (EM)

Goal

Ø

Find blob parameters θ that maximize the likelihood function

ver all all datapoints
Approach:

1.

E-step: given current guess of blobs, compute probabilistic ownership

f each point

2.

M-step: given ownership probabilities, update blobs to maximize likelihood function

3.

Repeat until convergence

Slide adapted from Steve Seitz

SLIDE 32

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

EM Details

E-step

Ø Compute probability that point x is in blob b, given current

guess of θ

M-step

Ø Compute overall probability that blob b is selected Ø Mean of blob b Ø Covariance of blob b

(N data points)

Slide adapted from Steve Seitz

SLIDE 33

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation with EM

Image source: Serge Belongie

Slide credit: B. Leibe

K = 3

SLIDE 34

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Mixtures of Gaussians, EM

Pros

Ø Probabilistic interpretation Ø Soft assignments between data points and clusters Ø Generative model, can predict novel data points Ø Relatively compact storage

Cons

Ø Initialization

– often a good idea to start from output of k-means

Ø Local minima Ø Need to know number of components K

– solutions: add a cost for model complexity

Ø Need to choose generative model (math form of a cluster ?)

Slide adapted from B. Leibe

SLIDE 35

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 36

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with path search

SLIDE 37

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Finding Modes in a Histogram

How many modes are there?

Ø Mode = local maximum of a given distribution Ø Easy to see, hard to compute

Slide adapted from Steve Seitz

SLIDE 38

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Segmentation

An advanced and versatile technique for clustering-

based segmentation

http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html

D. Comaniciu and P. Meer, Mean Shift: A Robust Approach toward Feature Space Analysis,

PAMI 2002.

Slide credit: Svetlana Lazebnik

SLIDE 39

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Algorithm

Iterative Mode Search

1.

Initialize random seed center and window W

2.

Calculate center of gravity (the “mean”) of W:

3.

Shift the search window to the mean

4.

Repeat steps 2+3 until convergence

Slide adapted from Steve Seitz

SLIDE 40

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 41

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 42

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 43

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 44

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 45

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 46

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 47

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Tessellate the space with windows Run the procedure in parallel

Slide by Y . Ukrainitz & B. Sarel

Real Modality Analysis

SLIDE 48

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The blue data points were traversed by the windows towards the mode.

Slide by Y . Ukrainitz & B. Sarel

Real Modality Analysis

SLIDE 49

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Clustering

Cluster: all data points in the attraction basin of a mode
Attraction basin: the region for which all trajectories

lead to the same mode

Slide by Y . Ukrainitz & B. Sarel

SLIDE 50

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Clustering/Segmentation

Choose features (color, gradients, texture, etc)
Initialize windows at individual pixel locations
Start mean-shift from each window until convergence
Merge windows that end up near the same “peak” or

mode

Slide adapted from Svetlana Lazebnik

SLIDE 51

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Segmentation Results

http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html

Slide credit: Svetlana Lazebnik

SLIDE 52

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary Mean-Shift

Pros

Ø General, application-independent tool Ø Model-free, does not assume any prior shape (spherical,

elliptical, etc.) on data clusters

Ø Just a single parameter (window size h)

– h has a physical meaning (unlike k-means) == scale of clustering

Ø Finds variable number of modes given the same h Ø Robust to outliers

Cons

Ø Output depends on window size h Ø Window size (bandwidth) selection is not trivial Ø Computationally rather expensive Ø Does not scale well with dimension of feature space

(sparsity problems in high-dimensional spaces…)

Slide adapted from Svetlana Lazebnik

SLIDE 53

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with path search

SLIDE 54

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Images as Graphs

Fully-connected graph

Ø Node (vertex) for every pixel Ø Edge between every pair of pixels (p,q) Ø Affinity weight wpq for each edge

– wpq measures similarity – Similarity is inversely proportional to difference (in color, texture, position, …)

q p wpq

w

Slide adapted from Steve Seitz

SLIDE 55

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Measuring Affinity

Distance
Intensity
Color
Texture

{ }

2

2 1 2

( , ) exp

d

aff x y x y

σ

= − −

{ }

2

2 1 2

( , ) exp ( ) ( )

d

aff x y I x I y

σ

= − −

(some suitable color space distance)

( )

{ }

2

2 1 2

( , ) exp ( ), ( )

d

aff x y dist c x c y

σ

= −

Source: Forsyth & Ponce

{ }

2

2 1 2

( , ) exp ( ) ( )

d

aff x y f x f y

σ

= − −

(vectors of filter outputs)

SLIDE 56

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation by Graph Cuts

Break Graph into Segments

Ø Delete edges crossing between segments Ø Easiest to break edges with low similarity (low weight)

– Similar pixels should be in the same segments – Dissimilar pixels should be in different segments

w A B C

Slide adapted from Steve Seitz

SLIDE 57

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Graph Cut (GC)

GC = edges whose removal partitions a graph in two
Cost of a cut

Ø Sum of weights of cut edges:

A graph cut gives us a segmentation

Ø What is a “good” graph cut and how do we find one?

Slide adapted from Steve Seitz

A B

∑

∈ ∈

=

B q A p q p

w B A cut

, ,

) , (

SLIDE 58

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Minimum Cut

We can do segmentation by finding the minimum cut in

a graph

Ø

Efficient algorithms exist for doing this

Drawback:

Ø

Weight of cut proportional to number of edges in the cut

Ø

Minimum cut tends to cut off very small, isolated components Ideal Cut Cuts with lesser weight than the ideal cut

Slide credit: Khurram Hassan-Shafique

SLIDE 59

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Normalized Cut (NCut)

Min-cut has bias toward partitioning out small segments
This can be fixed by normalizing for size of segments
The normalized cut cost is:
The exact solution is NP-hard but an approximation can

be computed by solving a generalized eigenvalue problem.

assoc(A,V) = sum of weights from A to all nodes in the graph

cut(A,B) assoc(A,V) + cut(A,B) assoc(B,V)

J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000

Slide adapted from Svetlana Lazebnik

SLIDE 60

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Interpretation as a Dynamical System

Treat the edges as springs and ‘shake’ the system

Ø Elasticity proportional to cost Ø Vibration “modes” correspond to segments

– Can compute these by solving a generalized eigenvector problem

Slide adapted from Steve Seitz

SLIDE 61

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

NCuts Example

Image source: Shi & Malik

NCuts segments

Slide credit: B. Leibe

SLIDE 62

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Color Image Segmentation with NCuts

Image Source: Shi & Malik

Slide credit: Steve Seitz

SLIDE 63

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Results with Color & Texture

SLIDE 64

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Normalized Cuts

Pros:

Ø Generic framework, flexible to choice of function that computes

weights (“affinities”) between nodes

Ø Does not require any model of the data distribution

Cons:

Ø Time and memory complexity can be high

– Dense, highly connected graphs many affinity computations – Solving eigenvalue problem

Ø Preference for balanced partitions

– If a region is uniform, NCuts will find the modes of vibration of the image dimensions

Slide credit: Kristen Grauman

⇒

SLIDE 65

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Markov Random Fields

Allow rich probabilistic models for images
But built in a local, modular way

Ø Learn local effects, get global effects out

Slide credit: William Freeman

Observed evidence Hidden “true states” Neighborhood relations

SLIDE 66

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

MRF Nodes as Pixels (or Patches)

Image Image pixels states (e.g. foreground/background)

Slide adapted from William Freeman

( , )

i i

x y Φ

( , )

i j

x x Ψ

SLIDE 67

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Network Joint Probability

,

( , ) ( , ) ( , )

i i i j i i j

P x y x y x x = Φ Ψ

∏ ∏

states Image

Slide adapted from William Freeman

Image-state compatibility function state-state compatibility function Neighboring nodes Local

bservations

SLIDE 68

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Formulation

Joint probability
Maximizing the joint probability is the same as

minimizing the -log

This is similar to free-energy problems in statistical

mechanics (spin glass theory). We therefore draw the analogy and call E an energy function.

and are called potentials.

,

( , ) ( , ) ( , )

i i i j i i j

P x y x y x x = Φ Ψ

∏ ∏

−log P(x, y) = − log Φ(xi, yi)

i

∑

− log

i, j

∑

Ψ(xi,x j) E(x, y) = ϕ(xi, yi)

i

∑

+ ψ (xi,x j)

i, j

∑

Slide credit: B. Leibe

ϕ ψ

SLIDE 69

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Formulation

Energy function
Unary potentials

Ø Encode local information about the given pixel/patch Ø How likely is a pixel/patch to be in a certain state ?

(e.g. foreground/background)?

Pairwise potentials

Ø Encode neighborhood information Ø How different is a pixel/patch’s label from that of its neighbor?

(e.g. here independent of image data, but later based on intensity/color/texture difference) Pairwise potentials Unary potentials

( , )

i i

x y ϕ ( , )

i j

x x ψ

,

( , ) ( , ) ( , )

i i i j i i j

E x y x y x x ϕ ψ = +

∑ ∑

Slide adapted from B. Leibe

ϕ ψ

SLIDE 70

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Minimization

Goal:

Ø Infer the optimal labeling of the MRF.

Many inference algorithms are available, e.g.

Ø Gibbs sampling, simulated annealing Ø Iterated conditional modes (ICM) Ø Variational methods Ø Belief propagation Ø Graph cuts

Recently, Graph Cuts have become a popular tool

Ø Only suitable for a certain class of energy functions Ø But the solution can be obtained very fast for typical vision

problems (~1MPixel/sec).

( , )

i i

x y ϕ ( , )

i j

x x ψ

Slide credit: B. Leibe

SLIDE 71

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Graph Cuts for Optimal Boundary Detection

Idea: convert MRF into source-sink graph

n-links s t a cut

hard constraint hard constraint

Minimum cost cut can be computed in polynomial time

(max-flow/min-cut algorithms)

[Boykov & Jolly, ICCV’01] Slide adapted from Yuri Boykov

SLIDE 72

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Adding Regional Properties

pq

w

n-links s t a cut

) (t Dp

t-link

) (s Dp

t-link

Regional bias example

Suppose are given “expected” intensities

f object and background

t s

I I and

( )

2 2 2

/ || || exp ) ( σ

s p p

I I s D − − ∝

( )

2 2 2

/ || || exp ) ( σ

t p p

I I t D − − ∝

[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov

SLIDE 73

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Adding Regional Properties

More generally, regional bias can be based on any

intensity models of object and background

a cut

( ) logPr( | )

p p p p

D L I L = −

given object and background intensity histograms

) (s Dp ) (t Dp

s t

I

) | Pr( s I p ) | Pr( t I p

p

I

[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov

SLIDE 74

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

How Does it Work? The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1 Graph (V, E, C)

Vertices V = {v1, v2 ... vn} Edges E = {(v1, v2) ....} Costs C = {c(1, 2) ....}

Slide credit: Pushmeet Kohli

SLIDE 75

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

What is an st-cut? What is the cost of a st-cut?

An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T

5 + 2 + 9 = 16

SLIDE 76

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

What is an st-cut? What is the cost of a st-cut?

An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T st-cut with the minimum cost

What is the st-mincut?

2 + 1 + 4 = 7

SLIDE 77

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

How to Compute the s-t-Mincut?

Source Sink v1 v2

2 5 9 4 2 1 Solve the dual maximum flow problem

In every network, the maximum flow equals the cost of the st-mincut

Min-cut/Max-flow Theorem Compute the maximum flow between Source and Sink

Constraints Edges: Flow < Capacity Nodes: Flow in = Flow out

Slide credit: Pushmeet Kohli

SLIDE 78

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0

SLIDE 79

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0 2 5

SLIDE 80

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0 + 2 5-2 2-2

SLIDE 81

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 3

SLIDE 82

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2

SLIDE 83

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 9 4

SLIDE 84

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 + 4 5

SLIDE 85

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6

SLIDE 86

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6 3 5 1

SLIDE 87

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6 + 1 2 4 1-1

SLIDE 88

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 7

SLIDE 89

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 7

SLIDE 90

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dealing with Non-Binary Cases

For image segmentation, the limitation to binary

energies is a nuisance.

Binary segmentation only

We would like to solve also multi-label problems.

Ø NP-hard problem with 3 or more labels

There exist some approximation algorithms which

extend graph cuts to the multi-label case

Ø E.g. -Expansion

They are no longer guaranteed to return the globally
ptimal result.

Ø But -Expansion has a guaranteed approximation quality and

converges in a few iterations.

Slide credit: B. Leibe

α

SLIDE 91

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Graph Cuts Segmentation

Pros

Ø Powerful technique, based on probabilistic model (MRF). Ø Applicable for a wide range of problems. Ø Very efficient algorithms available for vision problems. Ø Becoming a de-facto standard for many segmentation tasks.

Cons/Issues

Ø Graph cuts can only solve a limited class of models

– Submodular energy functions – Can capture only part of the expressiveness of MRFs

Ø Only approximate algorithms available for multi-label case

Slide credit: B. Leibe

SLIDE 92

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation: Caveats

We’ve looked at bottom-up ways to segment an image

into regions, yet finding meaningful segments is intertwined with the recognition problem.

Often want to avoid making hard decisions too soon
Difficult to evaluate; when is a segmentation successful?

Slide credit: Kristen Grauman

SLIDE 93

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Speeding up 1: start from `superpixels’

Start from an over-segmentation, similar-looking pixels

have been grouped together quickly; requires object boundaries to be preserved as part of superpixel edges !

X. Ren and J. Malik. Learning a classification model for segmentation. ICCV 2003.

“superpixels”

Slide credit: Svetlana Lazebnik

SLIDE 94

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Speeding up 2: objectness

Trying to draw bounding boxes around

bjects,

without knowing what they are

Figure 7:

yellow: bb by computer / blue: by human

Focus on regions that an `objectness’ score indicates as

probably containing an object

SLIDE 95

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with path search

SLIDE 96

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dynamic path search: principle

Guided by a user-supplied cost function, expressing expectations like good edges to contain pixels with high gradients, edges to be smooth, etc. find optimal path through the image:

1. having lowest cost
2. satisfying constraints (e.g. given endpoints)

Useful in interactive applications (e.g. medical),

r when environment constrained

SLIDE 97

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dynamic path search : nomenclature

A graph consists of nodes (pixels) connected by arcs (steps) Nodes connected by steps are parents and successors Identifying a node’s successors is expansion

f that node

A tree is a graph with 1 parent for the nodes (our arcs are undirected)

SLIDE 98

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dynamic path search : nomenclature cont’d

Often the arcs are assigned a cost A sequence of nodes n1,n2,…,nk (ni = sucessor

f ni-1) is a path of length k

Usually path cost = Σ arc costs

SLIDE 99

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dynamic path search : cost functions

Cost function incorporates problem-specific information e.g. penalize changes in edge direction e.g. penalize the inclusion of pixels with low intensity gradient problem is one of optimization : l 1. gradient descent l 2. path array methods l 3. best-first search

SLIDE 100

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gradient descent

Always choose the next pixel that adds the smallest cost Example :

SLIDE 101

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gradient descent : example

angiogram image :

SLIDE 102

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gradient descent : example cont’d

Cost :

1

3 3

−

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ∇ ∑ d i

SLIDE 103

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gradient descent : problem

Gradient descent never looks back :

SLIDE 104

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gradient descent : remarks

Gradient descent used for several purposes Fast but no guarantee that the optimal path is found

SLIDE 105

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : principle

Illustration of one-pass example : the angiogram Select cheapest step to a node

? Remember by putting a pointer back

SLIDE 106

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : principle

Upon reaching the last column… Select cheapest node in last column

BACKTRACK

SLIDE 107

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : principle

From left to right : determine cheapest path + cost for each pixel set pointer back Determine pixel with lowest value in right column “backtrack”

Summary

SLIDE 108

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : example

Optimal paths are found :

SLIDE 109

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : remarks

guaranteed to find the optimal path solves many optimisation problems simultaneously : from each point cheapest path can be backtracked suboptimal in that sense program is simple cannot handle meandering paths, so far

SLIDE 110

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : principle of F*

Solution to meandering problem via multi- pass algorithm : F* Path array is constructed iteratively until no changes F* finds optimal paths notation :

P( i , j ) = cost up to point ( i , j ) C( i’,j’, i, j ) = cost for step from ( i’, j’) to ( i, j )

SLIDE 111

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : F* algorithm

l 1. P initialized to

∞, starting node = 0

l 2. Top to bottom, row by row updating of P : l 3. Alternating bottom to top and top to bottom l 4. Stops when no changes l 5. Optimal path is backtracked

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎧ + + + = ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ − + − + − + + − − + − − − + − − = 1 , , , 1 , , , min : , : left right to from then, 1 , , , 1 , , 1 , 1 , , 1 , 1 , , 1 , , , 1 , 1 , 1 , , 1 , 1 , , min : , : right left to from first, j i P j i j i C j i P j i P j i P j i j i C j i P j i j i C j i P j i j i C j i P j i j i C j i P j i P

SLIDE 112

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : F* example

Road detection in aerial image : Endpoints are given

SLIDE 113

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : F* example

Cost : C ( i, j ) = (255 - F( i, j ))a

F ( i, j ) is scaled output of matched

convolution filter :

SLIDE 114

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : F* example

Result for a = 1 Result for a = 2.4

SLIDE 115

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Path array techniques : F* remarks

Choice of cost function is crucial but not trivial ! Note special structure of cost function in the ex.:

C ( i’, j’, i , j ) = C ( i , j )

⇓

backtracking via cheapest neighbour instead of pointers F* yields the optimal solution Invests much effort in finding irrelevant solutions as before

SLIDE 116

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first search : principle

Strike a balance between the two previous ones n 1. Uses problem-specific information to guide the process selectively n 2. Returns the optimal solution if applied properly New data structures: list OPEN: end nodes of paths list CLOSED: nodes already passed through furthermore: evaluation function f(n)

SLIDE 117

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : the algorithm BF

1. Start node s on OPEN
3. OPEN node n with f min., put it on CLOSED
4. Expand n
6. For every successor n’

  Calculate f (n’)   If n’ neither on OPEN nor CLOSED : put n’ on OPEN, set ptr   If n’ already on OPEN or CLOSED, replace f (n’) and redirect ptr if new f (n’) lower, if n’ on CLOSED, back to OPEN

7. Go to step 2.
2. OPEN = empty exit with failure

ñ ñ

5. A successor = a goal node solved

ñ ñ

SLIDE 118

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : the algorithm BF

1. Start node s on OPEN
3. OPEN node n with f min., put it on CLOSED
4. Expand n
6. For every successor n’

  Calculate f (n’)   If n’ neither on OPEN nor CLOSED : put n’ on OPEN, set ptr   If n’ already on OPEN or CLOSED, replace f (n’) and redirect ptr if new f (n’) lower, if n’ on CLOSED, back to OPEN

7. Go to step 2.
2. OPEN = empty exit with failure

ñ ñ

5. A successor = a goal node solved

ñ ñ

SLIDE 119

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : BF*

Risk of missing the optimal solution In step 5 : quits if successor is a goal node We should include the cost of the last arc

SLIDE 120

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : BF*

wait until new arc is part of f (n) exit when node n is a goal node

BF* algorithm

SLIDE 121

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : cost functions

The BF and BF* algorithms advance slowly. For a typical cost function, short paths will have lower costs and need to be developed before the actual solution path can reach a goal node. We now try to do something about that.

SLIDE 122

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : cost functions

C (n) = cost for portion from n to the END

f the path

C (n) is recursive if for immediate successor ns C (n) = F (E (n) , C (ns ))

where E (n) function of local properties only If such rollback function F exists, it is possible to evaluate the cost of a path knowing the optimal remaining cost and the costs of steps taken so far Examples of rollback functions are the maximum or the sum of the arc costs.

SLIDE 123

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : crystal balls…? BUT we don’t know

the path to the goal...

????? !!!!!!

SLIDE 124

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : crystal balls…?

We introduce an estimate h (n) Recursiveness of the roll-back function allows us to effectively use such estimate! this version of BF is the Z algorithm this version of BF* is the Z* algorithm

SLIDE 125

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

( ) ( ) ( )

n C n n c n C

nr nr

+ ′ = ′ , ( ) ( ) ( )

1 2 1 + + + + = n n c n n c n C , ,

( ) ( ) ( ) ( )

n n c n n c n n c n C , , , 1 1 2 1 1 − + + + + + = −

Best - first: Z

Examples of non-recursive cost functions:

1 2

Largest - smallest node cost With Cnr(n’) the cost from start node to n’ Suppose n-1 is a start node, n+2 is a goal node Cannot be retrieved from c(n-1,n) and

SLIDE 126

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : Z*

Z* finds the optimal path if

1. h (n) underestimates the remaining cost
2. for n’ successor of n :
3. for parents n1 and n2 of a

common successor n :

( ) ( ) ( ) ( ) ( ) ( )

n h n E F n h n E F ≥ , ,

2 1

⇓

( ) ( ) ( ) ( ) ( ) ( )

n h n E F n h n E F ′ ≥ ′ , ,

2 1

Sum and maximum cost are acceptable functions

( ) ( ) ( ) ( ) ( ) ( )

n h n E F n h n E F ′ ′ ≥ ′ , ,

( ) ( )

n h n h ′ ′ ≥ ′

⇓

SLIDE 127

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : A*

A* uses a sum : C(n) = c( n,ns ) + C(ns )

f(n) = g(n) + h(n), with g(n) sum of arc costs

up to the current point

SLIDE 128

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best - first : the A* algorithm

1. Start node on OPEN
3. n with min. f from OPEN, put it on CLOSED
5. expand n

  n’ not on OPEN or CLOSED, estimate h(n’), calculate f (n’), set ptrs.

  n’ on OPEN or CLOSED, direct ptr along path with lowest g (n’)   if n’ obtained new ptr + on CLOSED, then put on OPEN

6. Go to step 2.
2. OPEN = empty exit, failure

ñ ñ

4. n = goal node solved

ñ ñ

SLIDE 129

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best-first : consistency assumption

if ( )

( ) ( )

n m c n h m h ,

min

≤ −

then reconsidering CLOSED is unnecessary explanation : If the assumption holds, costs always increase over time. Hence returning to a node with lower cost is impossible remark : with h(n) = 0 , the assumption normally applies

SLIDE 130

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Best-search : Z, A remarks

Increase efficiency by penalizing path incompleteness Spend less time on needlessly growing inferior paths Nevertheless, e.g. F* can be superior if the

ptimum is not outstanding, due to

bookkeeping overhead or if there can be changes in the goal