End-to-End Probabilistic Inference for Nonstationary Audio Analysis - - PowerPoint PPT Presentation

end to end probabilistic inference for nonstationary
SMART_READER_LITE
LIVE PREVIEW

End-to-End Probabilistic Inference for Nonstationary Audio Analysis - - PowerPoint PPT Presentation

End-to-End Probabilistic Inference for Nonstationary Audio Analysis (or how to apply Spectral Mixture GPs to audio) William Wilkinson , Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019 Queen Mary University of London /


slide-1
SLIDE 1

End-to-End Probabilistic Inference for Nonstationary Audio Analysis

(or how to apply Spectral Mixture GPs to audio)

William Wilkinson, Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019

Queen Mary University of London / Aalto University / Technical University of Denmark

slide-2
SLIDE 2

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data.

frequency (Hz) filter response (dB)

standard filter bank

frequency (Hz) filter response (dB)

probabilistic / adaptive filter bank

1

slide-3
SLIDE 3

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data. [Prior] f (t) ∼ GP

  • 0,

D

  • d=1

σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

  • ,

[Likelihood] yk = f (tk) + σyk εk,

1

slide-4
SLIDE 4

End-to-End probabilistic time-frequency analysis

The next step in the signal processing chain is often to analyse the dependencies in the spectrogram, with e.g. non-negative matrix factorisation (NMF).

2

slide-5
SLIDE 5

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk 3

slide-6
SLIDE 6

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk = GP carrier subbands fd(t) × GP spectrogram

F r e q . ( H z )

3

slide-7
SLIDE 7

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk = GP carrier subbands fd(t) × GP spectrogram

F r e q . ( H z )

×

GP spectrogram = NMF weights (W) × positive modulator GPs (gn(t))

3

slide-8
SLIDE 8

The model

GP prior: fd(t) ∼ GP

  • 0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

  • ,

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N,

4

slide-9
SLIDE 9

The model

GP prior: fd(t) ∼ GP

  • 0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

  • ,

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N, Likelihood model: yk =

  • d

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

d(tk) =

  • n

Wd,n softplus(gn(tk)),

4

slide-10
SLIDE 10

The model

GP prior: fd(t) ∼ GP

  • 0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

  • ,

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N, Likelihood model: yk =

  • d

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

d(tk) =

  • n

Wd,n softplus(gn(tk)), This is a nonstationary spectral mixture GP

4

slide-11
SLIDE 11

Inference

We show how to write the model as a stochastic differential equation: d˜ f(t) dt = F˜ f(t) + Lw(t), yk = H(˜ f(tk)) + σyεk, such that inference can proceed via Kalman filtering & smoothing.

5

slide-12
SLIDE 12

Inference

We show how to write the model as a stochastic differential equation: d˜ f(t) dt = F˜ f(t) + Lw(t), yk = H(˜ f(tk)) + σyεk, such that inference can proceed via Kalman filtering & smoothing. Usually the nonlinear H(·) is dealt with via linearisation (EKF), but we implement full Expectation Propagation (EP) in the Kalman smoother, and the infinite-horizon solution which scales as: O(M2T)

5

slide-13
SLIDE 13

Applications and Results

The fully probabilistic model can, without modification, be applied to:

6

slide-14
SLIDE 14

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

6

slide-15
SLIDE 15

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

6

slide-16
SLIDE 16

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

Source Separation

Input audio, y Source one: piano note C Source two: piano note E Source three: piano note G

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Time [secs]

6

slide-17
SLIDE 17

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

Source Separation

Input audio, y Source one: piano note C Source two: piano note E Source three: piano note G

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Time [secs]

Thanks for listening! Poster: 6:30pm Weds, Pacific Ballroom #217 Contact: william.wilkinson@aalto.fi

6