Learning Music, Images and Physics with Deep Neural Networks Joan - - PowerPoint PPT Presentation
Learning Music, Images and Physics with Deep Neural Networks Joan - - PowerPoint PPT Presentation
Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irne Waldspurger cole Normale Suprieure www.di.ens.fr/data High
SLIDE 1
SLIDE 2
given n sample values {xi , yi = f(xi)}i≤n
- High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
- Classification: estimate a class label f(x)
High Dimensional Learning
Image Classification
d = 106
Anchor Joshua Tree Beaver Lotus Water Lily
Huge variability inside classes
SLIDE 3
given n sample values {xi , yi = f(xi)}i≤n
- High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
- Classification: estimate a class label f(x)
High Dimensional Learning
Audio: instrument recognition Huge variability inside classes
SLIDE 4
- High-dimensional x = (x(1), ..., x(d)) ∈ Rd:
- Regression: approximate a functional f(x)
given n sample values {xi , yi = f(xi) ∈ R}i≤n
High Dimensional Learning
Astronomy Quantum Chemistry Physics: energy f(x) of a state vector x
SLIDE 5
Curse of Dimensionality
local interpolation if f is regular and there are close examples:
- f(x) can be approximated from examples {xi , f(xi)}i by
?
x
- Need ✏−d points to cover [0, 1]d at a Euclidean distance ✏
) kx xik is always large Huge variability inside classes
SLIDE 6
”Similarity” metric: ∆(x, x0) Data: How to define Φ ?
Learning by Euclidean Embedding
x ∈ Rd kx x0k: non-informative Φx ∈ H Representation Φ Linear Classifier C1 kΦx Φx0k ∆(x, x0) C2 kΦx Φx0k Equivalent Euclidean metric: x
Gaussian & Separated
kΦx Φx0k
SLIDE 7
x
ρ(u) = |u|
Linear Classificat.
ρ
linear convolution linear convolution
Optimize the Lk with support constraints: over 109 parameters
Deep Convolution Neworks
L2 ρ Φ(x)
. . .
Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo...
non-linear scalar:
L1
neuron
Why does it work so well ?
- The revival of an old (1950) idea: Y. LeCun, G. Hinton
SLIDE 8
Overview
- Deep multiscale networks: invariant and stable metrics on groups
- Image classification
- Models of audio and image textures: information theory
- Learning physics: quantum chemistry energy regression
SLIDE 9
x(u) x0(u)
Invariant to translations
- Low-dimensional ”geometric shapes”
Grenander Diffeomorphism action: Dτx(u) = x(u − τ(u))
Image Metrics
(classic mechanics)
Deformation metric: ∆(x, x0) ⇠ min
τ
kDτx x0k + krτk1 kxk
diffeomorphism amplitude
SLIDE 10
- High dimensional textures:
ergodic stationary processes
Image Metrics
2D Turbulence
Highly non-Gaussian processes X(u)
- Can we find Φ so that Φ(X) is nearly Gaussian,
without loosing information ?
- A Euclidean metric is a Maximum Likelihood on Gaussian models.
SLIDE 11
kΦx Φx0k C kx x0k
Euclidean Metric Embedding
- Invariance to translations:
xc(u) = x(u − c) ⇒ Φ(xc) = Φ(x)
- Stability to additive perturbations:
xτ(u) = x(u τ(u)) ) kΦx Φxτk C krτk∞ kxk
- Stability to deformations:
Failure of Fourier and classic invariants
SLIDE 12
- Dilated wavelets:
Q-constant band-pass filters ˆ ψλ ψλ(t) = 2−j/Q ψ(2−j/Qt) with λ = 2−j/Q
Wavelet Transform
| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)
Wx = ✓ x ? 2J(t) x ? λ(t) ◆
λ≤2J
- Wavelet transform:
Preserves norm: Wx2 = x2 .
: average : higher frequencies
SLIDE 13
rotated and dilated:
real parts imaginary parts
ψλ(t) = 2−j ψ(2−jrθt) with λ = (2j, θ)
- Complex wavelet: ψ(t) = g(t) exp iξt , t = (t1, t2)
Scale separation with Wavelets
| ˆ ψλ(ω)|2
ω1
ω2
Wx = ✓ x ? 2J(t) x ? λ(t) ◆
λ≤2J
- Wavelet transform:
Preserves norm: Wx2 = x2 .
: average : higher frequencies
SLIDE 14
20 21
|x ? 21,θ|
Fast Wavelet Transform
|W1|
2J Scale
SLIDE 15
20 22 23 2J
|x ? 22,θ| |x ? 23,θ|
Wavelet Transform
|W1|
Scale 21
|x ? 21,θ|
|W1|
x ? J : locally invariant by translation
How to make everything invariant to translation ?
Depth:
SLIDE 16
x(t)
|W1|x = ✓ x ? 2J |x ? λ1| ◆
λ1
First wavelet transform
Modulus improves invariance:
W1x = ✓ x ? λ1 ◆
λ1
x ? 2J
Wavelet Translation Invariance
x ? λ1(t) = x ? a
λ1(t) + i x ? b λ1(t)
|x ? λ1(t)| = q |x ? a
λ1(t)|2 + |x ? b λ1(t)|2
|x ? λ1| ? 2J(t)
2J
local translation invariance x ? 2J(t) full translation invariance
2J = ∞
Second wavelet transform modulus |W2| |x ? λ1|= ✓ |x ? λ1| ? 2J(t) ||x ? λ1| ? λ2(t)| ◆
λ2
but covariant
SLIDE 17
|x ⇥ λ000
1 (t)|
|x ⇥ λ00
1 (t)|
|x ⇥ λ0
1(t)|
|x ⇥ λ1(t)| x x ? 2J
Scattering Transform
|W1|
SLIDE 18
x x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2(t)|
|W1| |W2|
Scattering Transform
SLIDE 19
x x ? 2J |||x ? λ1| ? λ2| ? λ3| ||x ? λ1| ? λ2| ? 2J
|W3|
|x ? λ1| ? 2J
|W2| |W1|
Scattering Neural Network
SLIDE 20
= . . . |W3| |W2| |W1| x
SJx = x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2| ? 2J |||x ? λ2| ? λ2| ? λ3| ? 2J ...
λ1,λ2,λ3,...
preserves norms kSJxk = kxk
Scattering Properties
contractive kSJx SJyk kx yk (L2 stability)
Theorem: For appropriate wavelets, a scattering is
translations invariance and deformation stability: if xτ(u) = x(u − τ(u)) then lim
J→∞ kSJxτ SJxk C krτk∞ kxk
Wk is unitary ⇒ |Wk| is contractive
SLIDE 21
LeCun et. al.
Classification Errors Joan Bruna
Digit Classification: MNIST
Linear Classifier SJx y = f(x)
x
Training size
- Conv. Net.
Scattering 50000 0.5% 0.4%
SLIDE 22
- J. Bruna
- Scat. Moments
Classification of Textures
CUREt database 61 classes
Texte
Linear Classifier SJx y = f(x)
x
Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0.2 %
2J = image size
Classification Errors
SLIDE 23
The scattering transform of a stationary process X(t)
Scattering Moments of Processes
SJX = X |X ? λ1| ||X ? λ1| ? λ2| |||X ? λ2| ? λ2| ? λ3| ... ? 2J : Gaussian for 2J large
if X is ergodic
E(SX) = E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...
λ1,λ2,λ3,...
J → ∞
SLIDE 24
Representation of Random Processes
E(SX) = E(X) = E(U0X) E(|X ? λ1|) = E(U1X) E(||X ? λ1| ? λ2|) = E(U2X) E(|||X ? λ2| ? λ2| ? λ3|) = E(U3X) ...
λ1,λ2,λ3,...
Little loss of information: Hmax ≈ H(X) p(x) = 1 Z exp ⇣
∞
X
m=1
λm . Umx ⌘ Theorem (Boltzmann) The distribution p(x) which satisfies Z
RN Umx p(x) dx = E(UmX)
with a maximum entropy Hmax = − R p(x) log p(x) dx is Hmax ≥ H(X) (entropie of X)
SLIDE 25
Ergodic Texture Reconstructions
Joan Bruna Original Textures Gaussian process model with same second order moments
2D Turbulence E(|x ? λ1|) , E(||x ? λ1| ? λ2|)
Second order Gaussian Scattering: O(log N 2) moments
SLIDE 26
Original Paper Cocktail Party
Representation of Audio Textures
Joan Bruna
60 20 40 60 20 40 60
Applauds Gaussian in time Gaussian in scattering
t
ω
SLIDE 27
Failures: Harmonic Sounds
Speech Bird Cello
V .Lostanlen
Need to express frequency channel interactions: time-frequency image
SLIDE 28
- 1
2 3 4 5
- ctave
Harmonic Spiral
t λ
- j
θ
- More regular variations along (θ, j) than λ
- Alignment of harmonics in two main groups.
V .Lostanlen Need to capture frequency variability and structures.
R × Z R
SLIDE 29
UIUC database: 25 classes Scattering classification errors Training
- Scat. Translation
20 20 %
Rotation and Scaling Invariance
Laurent Sifre
SLIDE 30
- Action on wavelet coefficients:
Extension to Rigid Mouvements
Laurent Sifre x(u) |W1| R x(u)du
- Group of rigid displacements: translations and rotations
|W1| R x(u)du xj(rα(u − c), θ − α) x(rα(u − c)) xj(u, ✓) = |x ? 2j,θ(u)|
rotation & translation rotation & translation , angle translation
Need to capture the variability of spatial directions.
SLIDE 31
- To build invariants: second wavelet transform on L2(G):
with wavelets ψλ2(u, θ)
- Scattering on rigid mouvements:
Wavelets on Rigid Mvt. Wavelets on Rigid Mvt.
Extension to Rigid Mouvements
Laurent Sifre xj(u , θ)
Wavelets on Translations
x(u) R x(u)du |W1| |W2| |xj ~ ψλ2(v, θ)| R xj(u, θ) dudθ |W3|
Z |xj ~ ψλ2(v, θ)|dudθ
xj ~ ψλ2(u, θ) = Z
R2
Z 2π xj(v, α) ψλ2(u − v, θ − α) dv dα convolutions of xj(u, θ)
SLIDE 32
UIUC database: 25 classes Scattering classification errors Training
- Scat. Translation
- Scat. Rigid Mouvt.
20 20 % 0.6%
Rotation and Scaling Invariance
Laurent Sifre
SLIDE 33
Classification Accuracy SJx Data Basis Deep-Net Scat.-2 CalTech-101 85% 80% CIFAR-10 90% 80%
Rigid Mvt.
computes invariants
Complex Image Classification
Bateau Nénuphare Metronome Castore Arbre de Joshua Ancre
CalTech 101 data-basis:
Linear Classif. y x
Edouard Oyallon
variable selection
State of the art Unsupervised
: 2000
SLIDE 34
- Energy of d interacting bodies:
Can we learn the interaction energy f(x) of a system with x = n positions, values
- ?
Astronomy Quantum Chemistry
Learning Physics: N-Body Problem
Matthew Hirn
- N. Poilvert
SLIDE 35
Multiscale Interactions
35
- A system of d particles involves d2 interactions
- Multiscale separation into O(log2 d) interactions
SLIDE 36
with Organic molecules Hydrogne, Carbon Nitrogen, Oxygen Sulfur, Chlorine
Quantum Chemistry
Electronic density ρx(u): computed by solving Schrodinger
SLIDE 37
Kohn-Sham model: E(ρ) = T(ρ) + Z ρ(u) V (u) + 1 2 Z ρ(u)ρ(v) |u − v| dudv + Exc(ρ)
Molecular energy
At equilibrium:
Density Functional Theory
Kinetic energy electron-electron Coulomb repulsion electron-nuclei attraction Exchange
- correlat. energy
37
deformation stable
- f(x) is invariant to isometries and is
f(x) = E(ρx) = min
ρ E(ρ)
SLIDE 38
Atomization Density
Electronic density ρx(u) Approximate density ˜ ρx(u)
38
SLIDE 39
Quantum Chemistry
Partial Least Square regression on the training set: Matthew Hirn
Quantum Regression
- N. Poilvert
invariant to action of isometries in R3: scattering coefficients and squared
- Sparse regression computed over a representation
Fourier modulus coefficients and squared
- r
Φx = {φn(˜ ρx)}n : fM(x) =
M
X
k=1
wk φnk(˜ ρx) M: number of variables
SLIDE 40
x Scattering Regression
1 2 3 4 5 6 7 8 9 10
Model Complexity log (M)
1 2 3 4 5 6 7 8
- |
− |
- Fourier
Wavelet Scattering Coulomb
log2 M
Regression:
40
5.8 14.2 16.7 kcal/mol 2.7 kcal/mol : State of the art
SLIDE 41
- A major challenge of data analysis is to find
- Continuity to action of diffeomorphisms ⇒ wavelets
- Can learn physics from prior on geometry and invariants.
Conclusion
Unknown geometry: learn wavelets on appropriate groups.
- Known geometry ⇒ no need to learn.
Euclidean embeddings of metrics ⇔ build Gaussian models
- Applications to images, audio and natural languages