Deep Epitomic Nets and Scale/Position Search for Image Classification - - PowerPoint PPT Presentation

deep epitomic nets and scale position search for image
SMART_READER_LITE
LIVE PREVIEW

Deep Epitomic Nets and Scale/Position Search for Image Classification - - PowerPoint PPT Presentation

1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou Iasonas Kokkinos Toyota Technological Institute Ecole Centrale Paris/INRIA


slide-1
SLIDE 1

1

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Nets and Scale/Position Search for Image Classification

TTIC_ECP team

George Papandreou

Toyota Technological Institute at Chicago

Iasonas Kokkinos

Ecole Centrale Paris/INRIA

slide-2
SLIDE 2

2

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

TTIC_ECP entry in a nutshell

Goal: Invariance in Deep CNNs Part 1: Deep epitomic nets: local translation (deformation) Part 2: Global scaling and translation

Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers. (0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9%

(2) epitomic DCNN + search

10.56%

Fusion (1)+(2)

10.22% ~1% gain ~1.5% gain

slide-3
SLIDE 3

3

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998 Krizhevsky et al.: ImageNet Classification with Deep CNNs, NIPS 2012

Cascade of convolution + max-pooling blocks (deformation-invariant template matching)

Deep Convolutional Neural Networks (DCNNs)

Our work: different blocks (P1) & different architecture (P2)

convolutional fully connected

slide-4
SLIDE 4

4

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Part 1: Deep epitomic nets

slide-5
SLIDE 5

5

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Epitomes: translation-invariant patch models

Patch Templates

Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003

EM-based training

Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011 Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007

Epitomes: a lot more for just a bit more

Separate modeling: more data & less power per parameter

slide-6
SLIDE 6

6

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Papandreou, Chen, Yuille: Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14

Mini-epitomes for image classification

Dictionary of mini-epitomes Dictionary of patches (K-means)

Gains in (flat) BoW classification

slide-7
SLIDE 7

7

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

From flat to deep: Epitomic convolution

Max-Pooling

Epitomic Convolution

  • G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

Max over image positions Max over epitome positions

k=1, 2, . . .

k=1, 2, . . .

slide-8
SLIDE 8

8

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Convolutional Nets

Convolution + max-pooling Epitomic convolution

Supervised dictionary learning by back-propagation

  • G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.
slide-9
SLIDE 9

9

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Convolutional Nets

Parameter sharing: faster and more reliable model learning

(0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9% ~1% gain

Consistent improvements

slide-10
SLIDE 10

10

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Part 2: Global scaling and translation

slide-11
SLIDE 11

11

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependent (area) Category-dependent (ear detector)

Dogs

slide-12
SLIDE 12

12

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependent Category-dependent (ear detector)

Dogs Skyscrapers

slide-13
SLIDE 13

13

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependent Category-dependent (ear detector)

Dogs Skyscrapers Training set

slide-14
SLIDE 14

14

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependent Category-dependent (ear detector)

Dogs Skyscrapers

Rule: Large skyscrapers have ears, large dogs don’t

slide-15
SLIDE 15

15

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariant classification

Scale-dependent Category-dependent

F 0(x) = 1 K

K

X

k=1

F(xsk)

F 0(x) = max

k

F(xsk)

F(x) → {F(xs1), . . . , F(xsK)}

x → {xs1, . . . , xsK}

  • A. Howard. Some improvements on deep convolutional neural network based image classification, 2013.
  • T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997.
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014.

This work: MIL: End-to-end training! ‘bag’ of features feature

slide-16
SLIDE 16

16

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 1: Efficient multi-scale convolutional features

stitch pyramid

GPU

unstitch I(x,y) I(x,y,s) Patchwork(x,y) C(x,y) C(x,y,s)

multi-scale convolutional features

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat : ICLR 2014

Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012 Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv 2014

220x220x3 5x5x512

slide-17
SLIDE 17

17

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 2: From fully connected to fully convolutional

convolutional fully connected

slide-18
SLIDE 18

18

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 2: From fully connected to fully convolutional

convolutional

slide-19
SLIDE 19

19

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 2: From fully connected to fully convolutional

convolutional

pyramid

GPU

I(x,y) Patchwork(x,y) F(x,y) stich I(x,y,s)

220x220x3 1x1x4096 fully connected

slide-20
SLIDE 20

20

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 3: Global max-pooling

pyramid

GPU

I(x,y) Patchwork(x,y) stich I(x,y,s)

For free: argmax yields 48% localization error

(0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9% ~1% gain

(2) epitomic DCNN + search

10.56% ~1.5% gain

Gc = max

x,y Fc(x, y) + wc(x, y)

learned class-specific bias

Fusion (1)+(2)

10.22%

Fc(x, y)

Consistent, explicit position and scale search during training and testing

slide-21
SLIDE 21

21

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

DCNN: 6 Convolutional + 2 Fully Connected layers

(0) Baseline: max-pooled net

13.0%

(1) Epitomic DCNN

11.9%

(2) search

10.56%

Fusion (1)+(2)

10.22% ~1% gain ~1.5% gain

The Deeper the Better: stay tuned!

Deep Epitomic Nets and Scale/Position Search for Image Classification

Goal: Invariance in Deep CNNs

?

slide-22
SLIDE 22

22

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Epitomic implementation details

n Architecture of our deep epitomic net (11.94%) n Training took 3 weeks on a singe Titan (60 epochs) n Standard choices for learning rate, momentum, etc.

slide-23
SLIDE 23

23

TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Pyramidal search implementation details

n Image warp to square image. Position in mosaic is fixed n Scales: 400, 300, 220, 160, 120, 90 pixels à Mosaic: 720 pixels