[PPT] - Learning Music, Images and Physics with Deep Neural Networks Joan PowerPoint Presentation

SLIDE 1

Learning Music, Images and Physics with Deep Neural Networks

Joan Bruna, Matthew Hirn, Stéphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irène Waldspurger École Normale Supérieure

www.di.ens.fr/data

SLIDE 2

given n sample values {xi , yi = f(xi)}i≤n

High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
Classification: estimate a class label f(x)

High Dimensional Learning

Image Classification

d = 106

Anchor Joshua Tree Beaver Lotus Water Lily

Huge variability inside classes

SLIDE 3

given n sample values {xi , yi = f(xi)}i≤n

High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
Classification: estimate a class label f(x)

High Dimensional Learning

Audio: instrument recognition Huge variability inside classes

SLIDE 4

High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
Regression: approximate a functional f(x)

given n sample values {xi , yi = f(xi) ∈ R}i≤n

High Dimensional Learning

Astronomy Quantum Chemistry Physics: energy f(x) of a state vector x

SLIDE 5

Curse of Dimensionality

local interpolation if f is regular and there are close examples:

f(x) can be approximated from examples {xi , f(xi)}i by

?

x

Need ✏−d points to cover [0, 1]d at a Euclidean distance ✏

) kx xik is always large Huge variability inside classes

SLIDE 6

”Similarity” metric: ∆(x, x0) Data: How to define Φ ?

Learning by Euclidean Embedding

x ∈ Rd kx x0k: non-informative Φx ∈ H Representation Φ Linear Classifier C1 kΦx Φx0k  ∆(x, x0)  C2 kΦx Φx0k Equivalent Euclidean metric: x

Gaussian & Separated

kΦx Φx0k

SLIDE 7

x

ρ(u) = |u|

Linear Classificat.

ρ

linear convolution linear convolution

Optimize the Lk with support constraints: over 109 parameters

Deep Convolution Neworks

L2 ρ Φ(x)

. . .

Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo...

non-linear scalar:

L1

neuron

Why does it work so well ?

The revival of an old (1950) idea: Y. LeCun, G. Hinton

SLIDE 8

Overview

Deep multiscale networks: invariant and stable metrics on groups
Image classification
Models of audio and image textures: information theory
Learning physics: quantum chemistry energy regression

SLIDE 9

x(u) x0(u)

Invariant to translations

Low-dimensional ”geometric shapes”

Grenander Diffeomorphism action: Dτx(u) = x(u − τ(u))

Image Metrics

(classic mechanics)

Deformation metric: ∆(x, x0) ⇠ min

τ

kDτx x0k + krτk1 kxk

diffeomorphism amplitude

SLIDE 10

High dimensional textures:

ergodic stationary processes

Image Metrics

2D Turbulence

Highly non-Gaussian processes X(u)

Can we find Φ so that Φ(X) is nearly Gaussian,

without loosing information ?

A Euclidean metric is a Maximum Likelihood on Gaussian models.

SLIDE 11

kΦx Φx0k  C kx x0k

Euclidean Metric Embedding

Invariance to translations:

xc(u) = x(u − c) ⇒ Φ(xc) = Φ(x)

Stability to additive perturbations:

xτ(u) = x(u τ(u)) ) kΦx Φxτk  C krτk∞ kxk

Stability to deformations:

Failure of Fourier and classic invariants

SLIDE 12

Dilated wavelets:

Q-constant band-pass filters ˆ ψλ ψλ(t) = 2−j/Q ψ(2−j/Qt) with λ = 2−j/Q

Wavelet Transform

| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)

Wx = ✓ x ? 2J(t) x ? λ(t) ◆

λ≤2J

Wavelet transform:

Preserves norm: Wx2 = x2 .

: average : higher frequencies

SLIDE 13

rotated and dilated:

real parts imaginary parts

ψλ(t) = 2−j ψ(2−jrθt) with λ = (2j, θ)

Complex wavelet: ψ(t) = g(t) exp iξt , t = (t1, t2)

Scale separation with Wavelets

| ˆ ψλ(ω)|2

ω1

ω2

Wx = ✓ x ? 2J(t) x ? λ(t) ◆

λ≤2J

Wavelet transform:

Preserves norm: Wx2 = x2 .

: average : higher frequencies

SLIDE 14

20 21

|x ? 21,θ|

Fast Wavelet Transform

|W1|

2J Scale

SLIDE 15

20 22 23 2J

|x ? 22,θ| |x ? 23,θ|

Wavelet Transform

|W1|

Scale 21

|x ? 21,θ|

|W1|

x ? J : locally invariant by translation

How to make everything invariant to translation ?

Depth:

SLIDE 16

x(t)

|W1|x = ✓ x ? 2J |x ? λ1| ◆

λ1

First wavelet transform

Modulus improves invariance:

W1x = ✓ x ? λ1 ◆

λ1

x ? 2J

Wavelet Translation Invariance

x ? λ1(t) = x ? a

λ1(t) + i x ? b λ1(t)

|x ? λ1(t)| = q |x ? a

λ1(t)|2 + |x ? b λ1(t)|2

|x ? λ1| ? 2J(t)

2J

local translation invariance x ? 2J(t) full translation invariance

2J = ∞

Second wavelet transform modulus |W2| |x ? λ1|= ✓ |x ? λ1| ? 2J(t) ||x ? λ1| ? λ2(t)| ◆

λ2

but covariant

SLIDE 17

|x ⇥ λ000

1 (t)|

|x ⇥ λ00

1 (t)|

|x ⇥ λ0

1(t)|

|x ⇥ λ1(t)| x x ? 2J

Scattering Transform

|W1|

SLIDE 18

x x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2(t)|

|W1| |W2|

Scattering Transform

SLIDE 19

x x ? 2J |||x ? λ1| ? λ2| ? λ3| ||x ? λ1| ? λ2| ? 2J

|W3|

|x ? λ1| ? 2J

|W2| |W1|

Scattering Neural Network

SLIDE 20

= . . . |W3| |W2| |W1| x

SJx =       x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2| ? 2J |||x ? λ2| ? λ2| ? λ3| ? 2J ...      

λ1,λ2,λ3,...

preserves norms kSJxk = kxk

Scattering Properties

contractive kSJx SJyk  kx yk (L2 stability)

Theorem: For appropriate wavelets, a scattering is

translations invariance and deformation stability: if xτ(u) = x(u − τ(u)) then lim

J→∞ kSJxτ SJxk  C krτk∞ kxk

Wk is unitary ⇒ |Wk| is contractive

SLIDE 21

LeCun et. al.

Classification Errors Joan Bruna

Digit Classification: MNIST

Linear Classifier SJx y = f(x)

x

Training size

Conv. Net.

Scattering 50000 0.5% 0.4%

SLIDE 22

J. Bruna
Scat. Moments

Classification of Textures

CUREt database 61 classes

Texte

Linear Classifier SJx y = f(x)

x

Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0.2 %

2J = image size

Classification Errors

SLIDE 23

The scattering transform of a stationary process X(t)

Scattering Moments of Processes

SJX =       X |X ? λ1| ||X ? λ1| ? λ2| |||X ? λ2| ? λ2| ? λ3| ...       ? 2J : Gaussian for 2J large

if X is ergodic

E(SX) =       E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...      

λ1,λ2,λ3,...

J → ∞

SLIDE 24

Representation of Random Processes

E(SX) =       E(X) = E(U0X) E(|X ? λ1|) = E(U1X) E(||X ? λ1| ? λ2|) = E(U2X) E(|||X ? λ2| ? λ2| ? λ3|) = E(U3X) ...      

λ1,λ2,λ3,...

Little loss of information: Hmax ≈ H(X) p(x) = 1 Z exp ⇣

∞

X

m=1

λm . Umx ⌘ Theorem (Boltzmann) The distribution p(x) which satisfies Z

RN Umx p(x) dx = E(UmX)

with a maximum entropy Hmax = − R p(x) log p(x) dx is Hmax ≥ H(X) (entropie of X)

SLIDE 25

Ergodic Texture Reconstructions

Joan Bruna Original Textures Gaussian process model with same second order moments

2D Turbulence E(|x ? λ1|) , E(||x ? λ1| ? λ2|)

Second order Gaussian Scattering: O(log N 2) moments

SLIDE 26

Original Paper Cocktail Party

Representation of Audio Textures

Joan Bruna

60 20 40 60 20 40 60

Applauds Gaussian in time Gaussian in scattering

t

ω

SLIDE 27

Failures: Harmonic Sounds

Speech Bird Cello

V .Lostanlen

Need to express frequency channel interactions: time-frequency image

SLIDE 28

1

2 3 4 5

ctave

Harmonic Spiral

t λ

j

θ

More regular variations along (θ, j) than λ
Alignment of harmonics in two main groups.

V .Lostanlen Need to capture frequency variability and structures.

R × Z R

SLIDE 29

UIUC database: 25 classes Scattering classification errors Training

Scat. Translation

20 20 %

Rotation and Scaling Invariance

Laurent Sifre

SLIDE 30

Action on wavelet coefficients:

Extension to Rigid Mouvements

Laurent Sifre x(u) |W1| R x(u)du

Group of rigid displacements: translations and rotations

|W1| R x(u)du xj(rα(u − c), θ − α) x(rα(u − c)) xj(u, ✓) = |x ? 2j,θ(u)|

rotation & translation rotation & translation , angle translation

Need to capture the variability of spatial directions.

SLIDE 31

To build invariants: second wavelet transform on L2(G):

with wavelets ψλ2(u, θ)

Scattering on rigid mouvements:

Wavelets on Rigid Mvt. Wavelets on Rigid Mvt.

Extension to Rigid Mouvements

Laurent Sifre xj(u , θ)

Wavelets on Translations

x(u) R x(u)du |W1| |W2| |xj ~ ψλ2(v, θ)| R xj(u, θ) dudθ |W3|

Z |xj ~ ψλ2(v, θ)|dudθ

xj ~ ψλ2(u, θ) = Z

R2

Z 2π xj(v, α) ψλ2(u − v, θ − α) dv dα convolutions of xj(u, θ)

SLIDE 32

UIUC database: 25 classes Scattering classification errors Training

Scat. Translation
Scat. Rigid Mouvt.

20 20 % 0.6%

Rotation and Scaling Invariance

Laurent Sifre

SLIDE 33

Classification Accuracy SJx Data Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80%

Rigid Mvt.

computes invariants

Complex Image Classification

Bateau Nénuphare Metronome Castore Arbre de Joshua Ancre

CalTech 101 data-basis:

Linear Classif. y x

Edouard Oyallon

variable selection

State of the art Unsupervised

: 2000

SLIDE 34

Energy of d interacting bodies:

Can we learn the interaction energy f(x) of a system with x = n positions, values

?

Astronomy Quantum Chemistry

Learning Physics: N-Body Problem

Matthew Hirn

N. Poilvert

SLIDE 35

Multiscale Interactions

35

A system of d particles involves d2 interactions
Multiscale separation into O(log2 d) interactions

SLIDE 36

with Organic molecules Hydrogne, Carbon Nitrogen, Oxygen Sulfur, Chlorine

Quantum Chemistry

Electronic density ρx(u): computed by solving Schrodinger

SLIDE 37

Kohn-Sham model: E(ρ) = T(ρ) + Z ρ(u) V (u) + 1 2 Z ρ(u)ρ(v) |u − v| dudv + Exc(ρ)

Molecular energy

At equilibrium:

Density Functional Theory

Kinetic energy electron-electron Coulomb repulsion electron-nuclei attraction Exchange

correlat. energy

37

deformation stable

f(x) is invariant to isometries and is

f(x) = E(ρx) = min

ρ E(ρ)

SLIDE 38

Atomization Density

Electronic density ρx(u) Approximate density ˜ ρx(u)

38

SLIDE 39

Quantum Chemistry

Partial Least Square regression on the training set: Matthew Hirn

Quantum Regression

N. Poilvert

invariant to action of isometries in R3: scattering coefficients and squared

Sparse regression computed over a representation

Fourier modulus coefficients and squared

r

Φx = {φn(˜ ρx)}n : fM(x) =

M

X

k=1

wk φnk(˜ ρx) M: number of variables

SLIDE 40

x Scattering Regression

1 2 3 4 5 6 7 8 9 10

Model Complexity log (M)

1 2 3 4 5 6 7 8

|

− |

Fourier

Wavelet Scattering Coulomb

log2 M

Regression:

40

5.8 14.2 16.7 kcal/mol 2.7 kcal/mol : State of the art

SLIDE 41

A major challenge of data analysis is to find
Continuity to action of diffeomorphisms ⇒ wavelets
Can learn physics from prior on geometry and invariants.

Conclusion

Unknown geometry: learn wavelets on appropriate groups.

Known geometry ⇒ no need to learn.

Euclidean embeddings of metrics ⇔ build Gaussian models

Applications to images, audio and natural languages