Learning Music, Images and Physics with Deep Neural Networks Joan - - PowerPoint PPT Presentation

learning music images and physics with deep neural
SMART_READER_LITE
LIVE PREVIEW

Learning Music, Images and Physics with Deep Neural Networks Joan - - PowerPoint PPT Presentation

Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irne Waldspurger cole Normale Suprieure www.di.ens.fr/data High


slide-1
SLIDE 1

Learning Music, Images and Physics with Deep Neural Networks

Joan Bruna, Matthew Hirn, Stéphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irène Waldspurger École Normale Supérieure

www.di.ens.fr/data

slide-2
SLIDE 2

given n sample values {xi , yi = f(xi)}i≤n

  • High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
  • Classification: estimate a class label f(x)

High Dimensional Learning

Image Classification

d = 106

Anchor Joshua Tree Beaver Lotus Water Lily

Huge variability inside classes

slide-3
SLIDE 3

given n sample values {xi , yi = f(xi)}i≤n

  • High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
  • Classification: estimate a class label f(x)

High Dimensional Learning

Audio: instrument recognition Huge variability inside classes

slide-4
SLIDE 4
  • High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
  • Regression: approximate a functional f(x)

given n sample values {xi , yi = f(xi) ∈ R}i≤n

High Dimensional Learning

Astronomy Quantum Chemistry Physics: energy f(x) of a state vector x

slide-5
SLIDE 5

Curse of Dimensionality

local interpolation if f is regular and there are close examples:

  • f(x) can be approximated from examples {xi , f(xi)}i by

?

x

  • Need ✏−d points to cover [0, 1]d at a Euclidean distance ✏

) kx xik is always large Huge variability inside classes

slide-6
SLIDE 6

”Similarity” metric: ∆(x, x0) Data: How to define Φ ?

Learning by Euclidean Embedding

x ∈ Rd kx x0k: non-informative Φx ∈ H Representation Φ Linear Classifier C1 kΦx Φx0k  ∆(x, x0)  C2 kΦx Φx0k Equivalent Euclidean metric: x

Gaussian & Separated

kΦx Φx0k

slide-7
SLIDE 7

x

ρ(u) = |u|

Linear Classificat.

ρ

linear convolution linear convolution

Optimize the Lk with support constraints: over 109 parameters

Deep Convolution Neworks

L2 ρ Φ(x)

. . .

Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo...

non-linear scalar:

L1

neuron

Why does it work so well ?

  • The revival of an old (1950) idea: Y. LeCun, G. Hinton
slide-8
SLIDE 8

Overview

  • Deep multiscale networks: invariant and stable metrics on groups
  • Image classification
  • Models of audio and image textures: information theory
  • Learning physics: quantum chemistry energy regression
slide-9
SLIDE 9

x(u) x0(u)

Invariant to translations

  • Low-dimensional ”geometric shapes”

Grenander Diffeomorphism action: Dτx(u) = x(u − τ(u))

Image Metrics

(classic mechanics)

Deformation metric: ∆(x, x0) ⇠ min

τ

kDτx x0k + krτk1 kxk

diffeomorphism amplitude

slide-10
SLIDE 10
  • High dimensional textures:

ergodic stationary processes

Image Metrics

2D Turbulence

Highly non-Gaussian processes X(u)

  • Can we find Φ so that Φ(X) is nearly Gaussian,

without loosing information ?

  • A Euclidean metric is a Maximum Likelihood on Gaussian models.
slide-11
SLIDE 11

kΦx Φx0k  C kx x0k

Euclidean Metric Embedding

  • Invariance to translations:

xc(u) = x(u − c) ⇒ Φ(xc) = Φ(x)

  • Stability to additive perturbations:

xτ(u) = x(u τ(u)) ) kΦx Φxτk  C krτk∞ kxk

  • Stability to deformations:

Failure of Fourier and classic invariants

slide-12
SLIDE 12
  • Dilated wavelets:

Q-constant band-pass filters ˆ ψλ ψλ(t) = 2−j/Q ψ(2−j/Qt) with λ = 2−j/Q

Wavelet Transform

| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)

Wx = ✓ x ? 2J(t) x ? λ(t) ◆

λ≤2J

  • Wavelet transform:

Preserves norm: Wx2 = x2 .

: average : higher frequencies

slide-13
SLIDE 13

rotated and dilated:

real parts imaginary parts

ψλ(t) = 2−j ψ(2−jrθt) with λ = (2j, θ)

  • Complex wavelet: ψ(t) = g(t) exp iξt , t = (t1, t2)

Scale separation with Wavelets

| ˆ ψλ(ω)|2

ω1

ω2

Wx = ✓ x ? 2J(t) x ? λ(t) ◆

λ≤2J

  • Wavelet transform:

Preserves norm: Wx2 = x2 .

: average : higher frequencies

slide-14
SLIDE 14

20 21

|x ? 21,θ|

Fast Wavelet Transform

|W1|

2J Scale

slide-15
SLIDE 15

20 22 23 2J

|x ? 22,θ| |x ? 23,θ|

Wavelet Transform

|W1|

Scale 21

|x ? 21,θ|

|W1|

x ? J : locally invariant by translation

How to make everything invariant to translation ?

Depth:

slide-16
SLIDE 16

x(t)

|W1|x = ✓ x ? 2J |x ? λ1| ◆

λ1

First wavelet transform

Modulus improves invariance:

W1x = ✓ x ? λ1 ◆

λ1

x ? 2J

Wavelet Translation Invariance

x ? λ1(t) = x ? a

λ1(t) + i x ? b λ1(t)

|x ? λ1(t)| = q |x ? a

λ1(t)|2 + |x ? b λ1(t)|2

|x ? λ1| ? 2J(t)

2J

local translation invariance x ? 2J(t) full translation invariance

2J = ∞

Second wavelet transform modulus |W2| |x ? λ1|= ✓ |x ? λ1| ? 2J(t) ||x ? λ1| ? λ2(t)| ◆

λ2

but covariant

slide-17
SLIDE 17

|x ⇥ λ000

1 (t)|

|x ⇥ λ00

1 (t)|

|x ⇥ λ0

1(t)|

|x ⇥ λ1(t)| x x ? 2J

Scattering Transform

|W1|

slide-18
SLIDE 18

x x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2(t)|

|W1| |W2|

Scattering Transform

slide-19
SLIDE 19

x x ? 2J |||x ? λ1| ? λ2| ? λ3| ||x ? λ1| ? λ2| ? 2J

|W3|

|x ? λ1| ? 2J

|W2| |W1|

Scattering Neural Network

slide-20
SLIDE 20

= . . . |W3| |W2| |W1| x

SJx =       x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2| ? 2J |||x ? λ2| ? λ2| ? λ3| ? 2J ...      

λ1,λ2,λ3,...

preserves norms kSJxk = kxk

Scattering Properties

contractive kSJx SJyk  kx yk (L2 stability)

Theorem: For appropriate wavelets, a scattering is

translations invariance and deformation stability: if xτ(u) = x(u − τ(u)) then lim

J→∞ kSJxτ SJxk  C krτk∞ kxk

Wk is unitary ⇒ |Wk| is contractive

slide-21
SLIDE 21

LeCun et. al.

Classification Errors Joan Bruna

Digit Classification: MNIST

Linear Classifier SJx y = f(x)

x

Training size

  • Conv. Net.

Scattering 50000 0.5% 0.4%

slide-22
SLIDE 22
  • J. Bruna
  • Scat. Moments

Classification of Textures

CUREt database 61 classes

Texte

Linear Classifier SJx y = f(x)

x

Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0.2 %

2J = image size

Classification Errors

slide-23
SLIDE 23

The scattering transform of a stationary process X(t)

Scattering Moments of Processes

SJX =       X |X ? λ1| ||X ? λ1| ? λ2| |||X ? λ2| ? λ2| ? λ3| ...       ? 2J : Gaussian for 2J large

if X is ergodic

E(SX) =       E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...      

λ1,λ2,λ3,...

J → ∞

slide-24
SLIDE 24

Representation of Random Processes

E(SX) =       E(X) = E(U0X) E(|X ? λ1|) = E(U1X) E(||X ? λ1| ? λ2|) = E(U2X) E(|||X ? λ2| ? λ2| ? λ3|) = E(U3X) ...      

λ1,λ2,λ3,...

Little loss of information: Hmax ≈ H(X) p(x) = 1 Z exp ⇣

X

m=1

λm . Umx ⌘ Theorem (Boltzmann) The distribution p(x) which satisfies Z

RN Umx p(x) dx = E(UmX)

with a maximum entropy Hmax = − R p(x) log p(x) dx is Hmax ≥ H(X) (entropie of X)

slide-25
SLIDE 25

Ergodic Texture Reconstructions

Joan Bruna Original Textures Gaussian process model with same second order moments

2D Turbulence E(|x ? λ1|) , E(||x ? λ1| ? λ2|)

Second order Gaussian Scattering: O(log N 2) moments

slide-26
SLIDE 26

Original Paper Cocktail Party

Representation of Audio Textures

Joan Bruna

60 20 40 60 20 40 60

Applauds Gaussian in time Gaussian in scattering

t

ω

slide-27
SLIDE 27

Failures: Harmonic Sounds

Speech Bird Cello

V .Lostanlen

Need to express frequency channel interactions: time-frequency image

slide-28
SLIDE 28
  • 1

2 3 4 5

  • ctave

Harmonic Spiral

t λ

  • j

θ

  • More regular variations along (θ, j) than λ
  • Alignment of harmonics in two main groups.

V .Lostanlen Need to capture frequency variability and structures.

R × Z R

slide-29
SLIDE 29

UIUC database: 25 classes Scattering classification errors Training

  • Scat. Translation

20 20 %

Rotation and Scaling Invariance

Laurent Sifre

slide-30
SLIDE 30
  • Action on wavelet coefficients:

Extension to Rigid Mouvements

Laurent Sifre x(u) |W1| R x(u)du

  • Group of rigid displacements: translations and rotations

|W1| R x(u)du xj(rα(u − c), θ − α) x(rα(u − c)) xj(u, ✓) = |x ? 2j,θ(u)|

rotation & translation rotation & translation , angle translation

Need to capture the variability of spatial directions.

slide-31
SLIDE 31
  • To build invariants: second wavelet transform on L2(G):

with wavelets ψλ2(u, θ)

  • Scattering on rigid mouvements:

Wavelets on Rigid Mvt. Wavelets on Rigid Mvt.

Extension to Rigid Mouvements

Laurent Sifre xj(u , θ)

Wavelets on Translations

x(u) R x(u)du |W1| |W2| |xj ~ ψλ2(v, θ)| R xj(u, θ) dudθ |W3|

Z |xj ~ ψλ2(v, θ)|dudθ

xj ~ ψλ2(u, θ) = Z

R2

Z 2π xj(v, α) ψλ2(u − v, θ − α) dv dα convolutions of xj(u, θ)

slide-32
SLIDE 32

UIUC database: 25 classes Scattering classification errors Training

  • Scat. Translation
  • Scat. Rigid Mouvt.

20 20 % 0.6%

Rotation and Scaling Invariance

Laurent Sifre

slide-33
SLIDE 33

Classification Accuracy SJx Data Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80%

Rigid Mvt.

computes invariants

Complex Image Classification

Bateau Nénuphare Metronome Castore Arbre de Joshua Ancre

CalTech 101 data-basis:

Linear Classif. y x

Edouard Oyallon

variable selection

State of the art Unsupervised

: 2000

slide-34
SLIDE 34
  • Energy of d interacting bodies:

Can we learn the interaction energy f(x) of a system with x = n positions, values

  • ?

Astronomy Quantum Chemistry

Learning Physics: N-Body Problem

Matthew Hirn

  • N. Poilvert
slide-35
SLIDE 35

Multiscale Interactions

35

  • A system of d particles involves d2 interactions
  • Multiscale separation into O(log2 d) interactions
slide-36
SLIDE 36

with Organic molecules Hydrogne, Carbon Nitrogen, Oxygen Sulfur, Chlorine

Quantum Chemistry

Electronic density ρx(u): computed by solving Schrodinger

slide-37
SLIDE 37

Kohn-Sham model: E(ρ) = T(ρ) + Z ρ(u) V (u) + 1 2 Z ρ(u)ρ(v) |u − v| dudv + Exc(ρ)

Molecular energy

At equilibrium:

Density Functional Theory

Kinetic energy electron-electron Coulomb repulsion electron-nuclei attraction Exchange

  • correlat. energy

37

deformation stable

  • f(x) is invariant to isometries and is

f(x) = E(ρx) = min

ρ E(ρ)

slide-38
SLIDE 38

Atomization Density

Electronic density ρx(u) Approximate density ˜ ρx(u)

38

slide-39
SLIDE 39

Quantum Chemistry

Partial Least Square regression on the training set: Matthew Hirn

Quantum Regression

  • N. Poilvert

invariant to action of isometries in R3: scattering coefficients and squared

  • Sparse regression computed over a representation

Fourier modulus coefficients and squared

  • r

Φx = {φn(˜ ρx)}n : fM(x) =

M

X

k=1

wk φnk(˜ ρx) M: number of variables

slide-40
SLIDE 40

x Scattering Regression

1 2 3 4 5 6 7 8 9 10

Model Complexity log (M)

1 2 3 4 5 6 7 8

  • |

− |

  • Fourier

Wavelet Scattering Coulomb

log2 M

Regression:

40

5.8 14.2 16.7 kcal/mol 2.7 kcal/mol : State of the art

slide-41
SLIDE 41
  • A major challenge of data analysis is to find
  • Continuity to action of diffeomorphisms ⇒ wavelets
  • Can learn physics from prior on geometry and invariants.

Conclusion

Unknown geometry: learn wavelets on appropriate groups.

  • Known geometry ⇒ no need to learn.

Euclidean embeddings of metrics ⇔ build Gaussian models

  • Applications to images, audio and natural languages

www.di.ens.fr/data/scattering