Fast Robustness Quantification with Variational Bayes Tamara - - PowerPoint PPT Presentation

fast robustness quantification with variational bayes
SMART_READER_LITE
LIVE PREVIEW

Fast Robustness Quantification with Variational Bayes Tamara - - PowerPoint PPT Presentation

Fast Robustness Quantification with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan Bayesian inference Complex, modular


slide-1
SLIDE 1

Fast Robustness Quantification with Variational Bayes

ITT Career Development Assistant Professor, MIT

Tamara Broderick

With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan

slide-2
SLIDE 2
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

slide-3
SLIDE 3
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

slide-4
SLIDE 4
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

slide-5
SLIDE 5
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-6
SLIDE 6
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-7
SLIDE 7
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-8
SLIDE 8
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-9
SLIDE 9
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-10
SLIDE 10
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

slide-11
SLIDE 11
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-12
SLIDE 12
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-13
SLIDE 13
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-14
SLIDE 14
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-15
SLIDE 15
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-16
SLIDE 16
  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-17
SLIDE 17
  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-18
SLIDE 18
  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

slide-19
SLIDE 19
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

  • Robustness
  • Global & local
  • Rarely used
  • Approximation

MCMC

  • Our solution: linear

response variational Bayes

Bayes Theorem

slide-20
SLIDE 20
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

Bayes Theorem

slide-21
SLIDE 21
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

Bayes Theorem

slide-22
SLIDE 22
  • Bayesian inference
  • Complex, modular models; posterior distribution
  • Have to express prior beliefs in a distribution: challenges
  • Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

  • Robustness
  • Global & local
  • Rarely used
  • Approximation,

MCMC

  • Our solution: linear

response variational Bayes

Bayes Theorem

slide-23
SLIDE 23
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-24
SLIDE 24
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-25
SLIDE 25
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-26
SLIDE 26
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-27
SLIDE 27
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-28
SLIDE 28
  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

slide-29
SLIDE 29
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

slide-30
SLIDE 30
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

slide-31
SLIDE 31
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

slide-32
SLIDE 32
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

3

slide-33
SLIDE 33
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

q(θ)

3

slide-34
SLIDE 34

q(θ)

  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x)

3

slide-35
SLIDE 35

q(θ)

  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x)

3

slide-36
SLIDE 36
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ) p(θ|x) q∗(θ)

Variational Bayes

3

slide-37
SLIDE 37
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

slide-38
SLIDE 38
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

slide-39
SLIDE 39
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

slide-40
SLIDE 40
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

slide-41
SLIDE 41
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast

q∗(θ)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(θ|x) q∗(θ)

3

slide-42
SLIDE 42
  • Variational Bayes (VB)
  • Approximation for

posterior

  • Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

  • VB practical success
  • point estimates and prediction
  • fast, streaming, distributed

q∗(θ)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(θ|x) q∗(θ)

3

slide-43
SLIDE 43
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

4

slide-44
SLIDE 44
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj)

4

slide-45
SLIDE 45
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) θ1 θ2 p(θ|x)

4

slide-46
SLIDE 46
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)

4

slide-47
SLIDE 47
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)

4

slide-48
SLIDE 48
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

slide-49
SLIDE 49
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

slide-50
SLIDE 50
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

slide-51
SLIDE 51
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(θ|x) q∗(θ)

4

slide-52
SLIDE 52
  • Variational Bayes

!

  • Mean-field variational Bayes (MFVB)

!

  • Underestimates variance (sometimes

severely)

  • No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(θ|x) q∗(θ)

[Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]

4

slide-53
SLIDE 53
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

slide-54
SLIDE 54
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

slide-55
SLIDE 55
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

slide-56
SLIDE 56
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

slide-57
SLIDE 57
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x)

[Bishop 2006]

5

slide-58
SLIDE 58
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x)

5

slide-59
SLIDE 59
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-60
SLIDE 60
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-61
SLIDE 61
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-62
SLIDE 62
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-63
SLIDE 63
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-64
SLIDE 64
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-65
SLIDE 65
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-66
SLIDE 66
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-67
SLIDE 67
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-68
SLIDE 68
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ) Σ = d dtT  d dtCp(·|x)(t)

  • t=0

5

slide-69
SLIDE 69

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-70
SLIDE 70

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-71
SLIDE 71

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-72
SLIDE 72

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

slide-73
SLIDE 73
  • Cumulant-generating function

!

  • True posterior covariance vs MFVB covariance

!

  • “Linear response”

!

  • The LRVB approximation

V := d2 dtT dtCq∗(t)

  • t=0

Linear response

mean = d dtC(t)

  • t=0

Σ := d2 dtT dtCp(·|x)(t)

  • t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

  • t=0

≈ d dtT Eq∗

t θ

  • t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

slide-74
SLIDE 74
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt

6

slide-75
SLIDE 75
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt

6

slide-76
SLIDE 76
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-77
SLIDE 77
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-78
SLIDE 78
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-79
SLIDE 79
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-80
SLIDE 80
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization
  • Symmetric and positive definite at local min of KL
  • The LRVB assumption:

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-81
SLIDE 81
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization
  • Symmetric and positive definite at local min of KL
  • The LRVB assumption: Eptθ ≈ Eq∗

t θ

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-82
SLIDE 82
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization
  • Symmetric and positive definite at local min of KL
  • The LRVB assumption: Eptθ ≈ Eq∗

t θ

p(θ|x) q∗(θ)

[Bishop 2006]

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-83
SLIDE 83
  • LRVB covariance estimate
  • Suppose exponential family with mean parametrization
  • Symmetric and positive definite at local min of KL
  • The LRVB assumption: Eptθ ≈ Eq∗

t θ

p(θ|x) q∗(θ)

  • LRVB estimate is exact when MFVB gives

exact mean (e.g. multivariate normal)

[Bishop 2006]

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

  • t=0

qt mt = (I − V H)−1V

6

slide-84
SLIDE 84

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

7

slide-85
SLIDE 85

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

7

slide-86
SLIDE 86

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

7

slide-87
SLIDE 87

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

7

slide-88
SLIDE 88

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

7

slide-89
SLIDE 89

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

slide-90
SLIDE 90

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

slide-91
SLIDE 91

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

slide-92
SLIDE 92

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

slide-93
SLIDE 93

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

slide-94
SLIDE 94

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

slide-95
SLIDE 95

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

slide-96
SLIDE 96

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

slide-97
SLIDE 97

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

slide-98
SLIDE 98

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ profit 1 if microcredit

7

slide-99
SLIDE 99

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

7

slide-100
SLIDE 100

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆

iid

∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

7

C ∼ Sep&LKJ(η, c, d)

slide-101
SLIDE 101

Microcredit Experiment

8

slide-102
SLIDE 102

Microcredit Experiment

MFVB

8

slide-103
SLIDE 103

Microcredit Experiment

MFVB

8

slide-104
SLIDE 104

Microcredit Experiment

  • One set of 2500

MCMC draws: 45 minutes

  • All of MFVB
  • ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

  • Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

8

slide-105
SLIDE 105

Microcredit Experiment

  • One set of 2500

MCMC draws: 45 minutes

  • All of MFVB
  • ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

  • Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

8

slide-106
SLIDE 106

Microcredit Experiment

  • One set of 2500

MCMC draws: 45 minutes

  • All of MFVB
  • ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

  • Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

LRVB,! MFVB

8

slide-107
SLIDE 107

Microcredit Experiment

  • One set of 2500

MCMC draws: 45 minutes

  • All of MFVB
  • ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

  • Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

LRVB,! MFVB

8

slide-108
SLIDE 108

Robustness quantification

  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

9

slide-109
SLIDE 109

Robustness quantification

  • Variational Bayes as an alternative to MCMC
  • Challenges of VB
  • Accurate uncertainties from VB
  • Accurate robustness quantification from VB
  • Big idea: derivatives/perturbations are easy in VB

9

slide-110
SLIDE 110

Robustness quantification

  • Bayes Theorem

p(θ|x) ∝θ p(x|θ)p(θ)

10

slide-111
SLIDE 111

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

10

slide-112
SLIDE 112

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

10

slide-113
SLIDE 113

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

10

slide-114
SLIDE 114

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

10

slide-115
SLIDE 115

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α

10

slide-116
SLIDE 116

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α

10

slide-117
SLIDE 117

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α

10

slide-118
SLIDE 118

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α

10

slide-119
SLIDE 119

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α

10

slide-120
SLIDE 120

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α ≈ dEq∗

α[g(θ)]

  • α

∆α =: ˆ S

10

slide-121
SLIDE 121

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α ≈ dEq∗

α[g(θ)]

  • α

∆α =: ˆ S LRVB estimator

10

slide-122
SLIDE 122

Bayes Theorem

Robustness quantification

  • Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α ≈ dEq∗

α[g(θ)]

  • α

∆α =: ˆ S LRVB estimator

  • When in exponential family

q∗

α

10

slide-123
SLIDE 123

Bayes Theorem

Robustness quantification

  • Bayes Theorem

ˆ S = A ✓ ∂2KL ∂m∂mT

  • m=m∗

◆−1 B pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

  • Sensitivity

S := dEpα[g(θ)] dα

  • α

∆α ≈ dEq∗

α[g(θ)]

  • α

∆α =: ˆ S LRVB estimator

  • When in exponential family

q∗

α

10

slide-124
SLIDE 124

C ∼ Sep&LKJ(η, c, d)

Microcredit Experiment

  • Simplified from Meager (2015)
  • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

  • Nk businesses in kth site (~900 to ~17K)
  • Profit of nth business at kth site:

! !

  • Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆

iid

∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

11

slide-125
SLIDE 125

Microcredit Experiment

12

slide-126
SLIDE 126

Microcredit Experiment

MFVB

12

slide-127
SLIDE 127

Microcredit Experiment

  • Perturb Λ11:

0.03 ➔ 0.04

MFVB

12

slide-128
SLIDE 128

Microcredit Experiment

  • Perturb Λ11:

0.03 ➔ 0.04 Sensitivity

MFVB LRVB

12

slide-129
SLIDE 129

Microcredit Experiment

  • Perturb Λ11:

0.03 ➔ 0.04 Sensitivity

MFVB LRVB

12

slide-130
SLIDE 130

Microcredit Experiment

13

slide-131
SLIDE 131
  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g.

Microcredit Experiment

13

slide-132
SLIDE 132
  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g.

Microcredit Experiment

13

slide-133
SLIDE 133

Microcredit Experiment

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-134
SLIDE 134

Microcredit Experiment

StdDevqτ = 1.8

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-135
SLIDE 135

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-136
SLIDE 136

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-137
SLIDE 137

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-138
SLIDE 138

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-139
SLIDE 139

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ Λ12 + = 0.03

13

  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)
slide-140
SLIDE 140
  • Sensitivity of

the expected microcredit effect (τ)

  • Normalized to

be on scale of standard deviations in τ

  • E.g. (USD PPP)

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ Λ12 + = 0.03

13

Eqτ < 1.0 ∗ StdDevqτ

slide-141
SLIDE 141

Conclusion

  • We provide linear response variational Bayes:

supplements MFVB for fast & accurate covariance estimate

  • More from LRVB: fast & accurate robustness

quantification

  • Interested in your data and models:
  • Sensitivity to prior perturbations
  • Sensitivity to likelihood, data perturbations

14

slide-142
SLIDE 142

T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational Bayes. NIPS, 2013.

!

R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!

!

R Giordano, T Broderick, R Meager, J Huggins, and MI

  • Jordan. Fast robustness quantification with variational
  • Bayes. ICML Workshop on #Data4Good: Machine Learning in

Social Good Applications, 2016. ArXiv:1606.07153.!

!

J Huggins, T Campbell, and T Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.

15

References

slide-143
SLIDE 143

T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational Bayes. NIPS, 2013.

!

R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!

!

R Giordano, T Broderick, R Meager, J Huggins, and MI

  • Jordan. Fast robustness quantification with variational
  • Bayes. ICML Workshop on #Data4Good: Machine Learning in

Social Good Applications, 2016. ArXiv:1606.07153.!

!

J Huggins, T Campbell, and T Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.

15

References Special Thanks to Dan Cross

slide-144
SLIDE 144

References

R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015. CM Bishop. Pattern Recognition and Machine Learning, 2006. D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014. B Fosdick. Modeling Heterogeneity within and between Matrices and Arrays, Chapter 4.7. PhD Thesis, University of Washington, 2013. DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. R Meager. Understanding the impact of microcredit expansions: A Bayesian hierarchical analysis of 7 randomised experiments. ArXiv:1506.06669, 2015. RE Turner and M Sahani. Two problems with variational expectation maximisation for time- series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011. B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.

16