End-to-End Probabilistic Inference for Nonstationary Audio Analysis - - PowerPoint PPT Presentation

▶

Dec 04, 2023 335 likes •520 views

End-to-End Probabilistic Inference for Nonstationary Audio Analysis (or how to apply Spectral Mixture GPs to audio) William Wilkinson , Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019 Queen Mary University of London /

SLIDE 1

End-to-End Probabilistic Inference for Nonstationary Audio Analysis

(or how to apply Spectral Mixture GPs to audio)

William Wilkinson, Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019

Queen Mary University of London / Aalto University / Technical University of Denmark

SLIDE 2

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data.

frequency (Hz) filter response (dB)

standard filter bank

frequency (Hz) filter response (dB)

probabilistic / adaptive filter bank

SLIDE 3

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data. [Prior] f (t) ∼ GP

σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

[Likelihood] yk = f (tk) + σyk εk,

SLIDE 4

End-to-End probabilistic time-frequency analysis

The next step in the signal processing chain is often to analyse the dependencies in the spectrogram, with e.g. non-negative matrix factorisation (NMF).

SLIDE 5

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk 3

SLIDE 6

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk = GP carrier subbands fd(t) × GP spectrogram

F r e q . ( H z )

SLIDE 7

End-to-End probabilistic time-frequency analysis

Time (sampled at 16 kHz) Audio signal yk = GP carrier subbands fd(t) × GP spectrogram

F r e q . ( H z )

×

GP spectrogram = NMF weights (W) × positive modulator GPs (gn(t))

SLIDE 8

The model

GP prior: fd(t) ∼ GP

0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N,

SLIDE 9

The model

GP prior: fd(t) ∼ GP

0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N, Likelihood model: yk =

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

d(tk) =

Wd,n softplus(gn(tk)),

SLIDE 10

The model

GP prior: fd(t) ∼ GP

0, σ2

d exp(−|t − t′|/ℓd) cos(ωd (t − t′)

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

g (t, t′)),

n = 1, 2, . . . , N, Likelihood model: yk =

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

d(tk) =

Wd,n softplus(gn(tk)), This is a nonstationary spectral mixture GP

SLIDE 11

Inference

We show how to write the model as a stochastic differential equation: d˜ f(t) dt = F˜ f(t) + Lw(t), yk = H(˜ f(tk)) + σyεk, such that inference can proceed via Kalman filtering & smoothing.

SLIDE 12

Inference

We show how to write the model as a stochastic differential equation: d˜ f(t) dt = F˜ f(t) + Lw(t), yk = H(˜ f(tk)) + σyεk, such that inference can proceed via Kalman filtering & smoothing. Usually the nonlinear H(·) is dealt with via linearisation (EKF), but we implement full Expectation Propagation (EP) in the Kalman smoother, and the infinite-horizon solution which scales as: O(M2T)

SLIDE 13

Applications and Results

The fully probabilistic model can, without modification, be applied to:

SLIDE 14

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

SLIDE 15

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

SLIDE 16

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

Source Separation

Input audio, y Source one: piano note C Source two: piano note E Source three: piano note G

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Time [secs]

SLIDE 17

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

5 10 15 20 25 30 35 40 −2 −1 1 2 Time [ms] Signal EP IHGP EKF

Denoising

1 · 10−2 0.1 0.3 0.5 5 10 15 Corrupting noise variance SNR [dB] EP 1 EP 20 IHGP 1 IHGP 20 EKF 1 EKF 20 SpecSub

Source Separation

Input audio, y Source one: piano note C Source two: piano note E Source three: piano note G

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Time [secs]

End-to-End Probabilistic Inference for Nonstationary Audio Analysis

(or how to apply Spectral Mixture GPs to audio)

William Wilkinson, Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data.

standard filter bank

probabilistic / adaptive filter bank

Probabilistic time-frequency analysis

We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank, i.e. a filter bank that adapts to the signal and can make predictions / generate new data. [Prior] f (t) ∼ GP

σ2

[Likelihood] yk = f (tk) + σyk εk,

End-to-End probabilistic time-frequency analysis

The next step in the signal processing chain is often to analyse the dependencies in the spectrogram, with e.g. non-negative matrix factorisation (NMF).

End-to-End probabilistic time-frequency analysis

End-to-End probabilistic time-frequency analysis

End-to-End probabilistic time-frequency analysis

×

The model

GP prior: fd(t) ∼ GP

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

n = 1, 2, . . . , N,

The model

GP prior: fd(t) ∼ GP

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

n = 1, 2, . . . , N, Likelihood model: yk =

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

Wd,n softplus(gn(tk)),

The model

GP prior: fd(t) ∼ GP

d = 1, 2, . . . , D, gn(t) ∼ GP(0, κ(n)

n = 1, 2, . . . , N, Likelihood model: yk =

ad(tk) fd(tk) + σy εk, for square amplitudes (the magnitude spectrogram): a2

Wd,n softplus(gn(tk)), This is a nonstationary spectral mixture GP

Inference

We show how to write the model as a stochastic differential equation: d˜ f(t) dt = F˜ f(t) + Lw(t), yk = H(˜ f(tk)) + σyεk, such that inference can proceed via Kalman filtering & smoothing.

Inference

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

Denoising

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

Denoising

Source Separation

Applications and Results

The fully probabilistic model can, without modification, be applied to:

Missing Data Synthesis

Denoising

Source Separation

Thanks for listening! Poster: 6:30pm Weds, Pacific Ballroom #217 Contact: william.wilkinson@aalto.fi