Higher-order CRFs Nikos Komodakis (University of Crete) - - PowerPoint PPT Presentation

higher order crfs
SMART_READER_LITE
LIVE PREVIEW

Higher-order CRFs Nikos Komodakis (University of Crete) - - PowerPoint PPT Presentation

Fast Training of Pairwise or Higher-order CRFs Nikos Komodakis (University of Crete) Introduction Conditional Random Fields (CRFs) Ubiquitous in computer vision segmentation stereo matching optical flow image restoration image


slide-1
SLIDE 1

Fast Training of Pairwise or Higher-order CRFs

Nikos Komodakis (University of Crete)

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Conditional Random Fields (CRFs)

  • Ubiquitous in computer vision
  • segmentation

stereo matching

  • ptical flow

image restoration image completion

  • bject detection/localization

...

  • and beyond
  • medical imaging, computer graphics, digital

communications, physics…

  • Really powerful formulation
slide-4
SLIDE 4

Conditional Random Fields (CRFs)

  • Extensive research for more than 20 years
  • Key task: inference/optimization for CRFs/MRFs
  • Lots of progress
  • Graph-cut based algorithms
  • Message-passing methods
  • LP relaxations
  • Dual Decomposition
  • ….
  • Many state-of-the-art methods:
slide-5
SLIDE 5

MAP inference for CRFs/MRFs

  • Hypergraph

– Nodes – Hyperedges/cliques

  • High-order MRF energy minimization problem

high-order potential (one per clique) unary potential (one per node) hyperedges nodes

slide-6
SLIDE 6

CRF training

  • But how do we choose the CRF potentials?
  • Through training
  • Parameterize potentials by w
  • Use training data to learn correct w
  • Characteristic example of structured output

learning [Taskar], [Tsochantaridis, Joachims]

: f Z X 

can contain any kind of data CRF variables (structured object) how to determine f ?

slide-7
SLIDE 7

CRF training

  • Stereo matching:
  • Z: left, right image
  • X: disparity map

Z X f :

arg f 

parameterized by w

slide-8
SLIDE 8

CRF training

  • Denoising:
  • Z: noisy input image
  • X: denoised output image

Z X f :

arg f 

parameterized by w

slide-9
SLIDE 9

CRF training

  • Object detection:
  • Z: input image
  • X: position of object parts

Z X f :

arg f 

parameterized by w

slide-10
SLIDE 10

CRF training

  • Equally, if not more, important than MAP inference
  • Better optimize correct energy

(even approximately)

  • Than optimize wrong energy exactly
  • Becomes even more important as we move

towards:

  • complex models
  • high-order potentials
  • lots of parameters
  • lots of training data
slide-11
SLIDE 11

Contributions of this work

slide-12
SLIDE 12

CRF Training via Dual Decomposition

  • A very efficient max-margin learning framework for

general CRFs

slide-13
SLIDE 13

CRF Training via Dual Decomposition

  • A very efficient max-margin learning framework for

general CRFs

  • Key issue: how to properly exploit CRF structure

during learning?

slide-14
SLIDE 14

CRF Training via Dual Decomposition

  • A very efficient max-margin learning framework for

general CRFs

  • Key issue: how to properly exploit CRF structure

during learning?

  • Existing max-margin methods:
  • use MAP inference of an equally complex CRF as

subroutine

  • have to call subroutine many times during learning
slide-15
SLIDE 15

CRF Training via Dual Decomposition

  • A very efficient max-margin learning framework for

general CRFs

  • Key issue: how to properly exploit CRF structure

during learning?

  • Existing max-margin methods:
  • use MAP inference of an equally complex CRF as

subroutine

  • have to call subroutine many times during learning
  • Suboptimal
slide-16
SLIDE 16

CRF Training via Dual Decomposition

  • A very efficient max-margin learning framework for

general CRFs

  • Key issue: how to properly exploit CRF structure

during learning?

  • Existing max-margin methods:
  • use MAP inference of an equally complex CRF as

subroutine

  • have to call subroutine many times during learning
  • Suboptimal
  • computational efficiency ???
  • accuracy ???
  • theoretical properties ???
slide-17
SLIDE 17

CRF Training via Dual Decomposition

  • Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

slide-18
SLIDE 18

CRF Training via Dual Decomposition

  • Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

  • Handles arbitrary pairwise or higher-order CRFs
slide-19
SLIDE 19

CRF Training via Dual Decomposition

  • Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

  • Handles arbitrary pairwise or higher-order CRFs
  • Uses very efficient projected subgradient learning scheme
slide-20
SLIDE 20

CRF Training via Dual Decomposition

  • Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

  • Handles arbitrary pairwise or higher-order CRFs
  • Uses very efficient projected subgradient learning scheme
  • Allows hierarchy of structured prediction learning

algorithms of increasing accuracy

slide-21
SLIDE 21

CRF Training via Dual Decomposition

  • Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

  • Handles arbitrary pairwise or higher-order CRFs
  • Uses very efficient projected subgradient learning scheme
  • Allows hierarchy of structured prediction learning

algorithms of increasing accuracy

  • Extremely flexible and adaptable
  • Easily adjusted to fully exploit additional structure in any

class of CRFs (no matter if they contain very high order cliques)

slide-22
SLIDE 22

Dual Decomposition for CRF MAP Inference (brief review)

slide-23
SLIDE 23

MRF Optimization via Dual Decomposition

  • Very general framework for MAP inference [Komodakis

et al. ICCV07, PAMI11]

  • Master = coordinator

(has global view) Slaves = subproblems (have only local view)

slide-24
SLIDE 24

MRF Optimization via Dual Decomposition

  • Very general framework for MAP inference [Komodakis

et al. ICCV07, PAMI11]

  • Master =

(MAP-MRF on hypergraph G) = min

slide-25
SLIDE 25

MRF Optimization via Dual Decomposition

  • Very general framework for MAP inference [Komodakis

et al. ICCV07, PAMI11]

  • Set of slaves =

(MRFs on sub-hypergraphs Gi whose union covers G)

  • Many other choices possible as well
slide-26
SLIDE 26

MRF Optimization via Dual Decomposition

  • Very general framework for MAP inference [Komodakis

et al. ICCV07, PAMI11]

  • Optimization proceeds in an iterative fashion via

master-slave coordination

slide-27
SLIDE 27

MRF Optimization via Dual Decomposition

convex dual relaxation Set of slave MRFs For each choice of slaves, master solves (possibly different) dual relaxation

  • Sum of slave energies = lower bound on MRF optimum
  • Dual relaxation = maximum such bound
slide-28
SLIDE 28

MRF Optimization via Dual Decomposition

convex dual relaxation Set of slave MRFs Choosing more difficult slaves tighter lower bounds tighter dual relaxations

 

slide-29
SLIDE 29

CRF Training via Dual Decomposition

slide-30
SLIDE 30

Max-margin Learning via Dual Decomposition

  • Input:
  • k-th sample: CRF on
  • (training set of K samples)
  • Feature vectors: ,
  • Constraints:

= dissimilarity function, (

)

slide-31
SLIDE 31

Max-margin Learning via Dual Decomposition

  • Input:
  • k-th sample: CRF on
  • (training set of K samples)
  • Feature vectors: ,
  • Constraints:

= dissimilarity function, (

)

slide-32
SLIDE 32

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:
slide-33
SLIDE 33

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:
slide-34
SLIDE 34

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:
slide-35
SLIDE 35

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:

Learning objective intractable due to this term Problem

slide-36
SLIDE 36

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:

Solution: approximate it with dual relaxation from decomposition

slide-37
SLIDE 37

Max-margin Learning via Dual Decomposition

slide-38
SLIDE 38

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:

now

slide-39
SLIDE 39

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:

now before

slide-40
SLIDE 40

Max-margin Learning via Dual Decomposition

  • Regularized hinge loss functional:

now before

Training of complex CRF was decomposed to parallel training of easy-to-handle slave CRFs !!!

slide-41
SLIDE 41

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:
  • Input:
  • Hypergraphs:
  • Training samples:
  • Feature vectors:
slide-42
SLIDE 42

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:

so as to satisfy

slide-43
SLIDE 43

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:

so as to satisfy

slide-44
SLIDE 44

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:

so as to satisfy

slide-45
SLIDE 45

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:

so as to satisfy fully specified from

 

 

,

ˆ i k x

slide-46
SLIDE 46

Max-margin Learning via Dual Decomposition

  • Global optimum via projected subgradient learning algorithm:

so as to satisfy fully specified from

 

 

,

ˆ i k x

slide-47
SLIDE 47

Max-margin Learning via Dual Decomposition

  • Incremental subgradient version:
  • Further improves computational efficiency
  • Same optimality guarantees & theoretical

properties

  • Same as before but considers subset of slaves per

iteration

  • Subset chosen
  • deterministically or
  • randomly (stochastic subgradient)
slide-48
SLIDE 48

Max-margin Learning via Dual Decomposition

  • Resulting learning scheme:

 Slave problems freely chosen by the user  Easily adaptable to further exploit special structure of any class of CRFs  Very efficient and very flexible  Requires from the user only to provide an optimizer for the slave MRFs

slide-49
SLIDE 49

Choice of decompositions

= true loss (intractable) = loss from decomposition

  • (hierarchy of learning algorithms)
  • (upper bound property)
slide-50
SLIDE 50
  • denotes following decomposition:

– One slave per clique – Corresponding sub-hypergraph : ,

  • Resulting slaves often easy (or even trivial) to solve even

if global problem is complex and NP-hard – leads to widely applicable learning algorithm

  • Corresponding dual relaxation is an LP

– Generalizes well known LP relaxation for pairwise MRFs (at the core of most state-of-the-art methods)

Choice of decompositions

slide-51
SLIDE 51
  • But we can do better if CRFs have special structure…

Choice of decompositions

  • Structure means:
  • More efficient optimizer for slaves (speed)
  • Optimizer that handles more complex slaves

(accuracy)

(Almost all known examples fall in one of above two cases)

  • We adapt decomposition to problem at hand to exploit its

structure

slide-52
SLIDE 52
  • But we can do better if CRFs have special structure…
  • E.g., pattern-based high-order potentials (for a clique c)

[Komodakis & Paragios CVPR09]

  • We only assume:

– Set is sparse – It holds – No other restriction subset of (its vectors called patterns)

Choice of decompositions

slide-53
SLIDE 53

Experimental results

slide-54
SLIDE 54

Image denoising

  • Piecewise constant images
  • Potentials:
  • Goal: learn pairwise potential

Z X

 

k p p p p

u x x z  

 

 

,

k pq p q p q

h x x V x x  

slide-55
SLIDE 55

Image denoising

slide-56
SLIDE 56

Stereo matching

  • Potentials:
  • Goal: learn function f (.) for gradient-modulated Potts model

 

 

 

k left right p p p

u x I p I p x   

 

 

, ( )

k left pq p q p q

h x x f I p x x       

slide-57
SLIDE 57

Stereo matching

“Venus” disparity using f (.) as estimated at different iterations of learning algorithm

  • Potentials:
  • Goal: learn function f (.) for gradient-modulated Potts model

 

 

 

k left right p p p

u x I p I p x   

 

 

, ( )

k left pq p q p q

h x x f I p x x       

slide-58
SLIDE 58

Stereo matching

Sawtooth 4.9% Poster 3.7% Bull 2.8%

  • Potentials:
  • Goal: learn function f (.) for gradient-modulated Potts model

 

 

 

k left right p p p

u x I p I p x   

 

 

, ( )

k left pq p q p q

h x x f I p x x       

slide-59
SLIDE 59

Stereo matching

  • Potentials:
  • Goal: learn function f (.) for gradient-modulated Potts model

 

 

 

k left right p p p

u x I p I p x   

 

 

, ( )

k left pq p q p q

h x x f I p x x       

slide-60
SLIDE 60

High-order Pn Potts model

Cost for optimizing slave CRF: O(|L|)

  • 100 training samples
  • 50x50 grid
  • clique size 3x3
  • 5 labels (|L|=5)

[Kohli et al. CVPR07] Goal: learn high order CRF with potentials given by Fast training

slide-61
SLIDE 61

Clustering

  • Goal: distance learning for clustering [ICCV’11]
  • In this case cliques are of very high order: contain

all variables

  • Novel discriminative formulation
  • Significant extension: dual decomposition for

training high-order CRFs with latent variables

  • On top of that, there exist unobserved (latent)

variables during training