[PPT] - Fast Robustness Quantification with Variational Bayes Tamara PowerPoint Presentation

SLIDE 1

Fast Robustness Quantification with Variational Bayes

ITT Career Development Assistant Professor, MIT

Tamara Broderick

With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan

SLIDE 2

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

SLIDE 3

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

SLIDE 4

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

SLIDE 5

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 6

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 7

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 8

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 9

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 10

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

SLIDE 11

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 12

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 13

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 14

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 15

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 16

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 17

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 18

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Bayes Theorem

SLIDE 19

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Robustness
Global & local
Rarely used
Approximation

MCMC

Our solution: linear

response variational Bayes

Bayes Theorem

SLIDE 20

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayes Theorem

SLIDE 21

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayes Theorem

SLIDE 22

Bayesian inference
Complex, modular models; posterior distribution
Have to express prior beliefs in a distribution: challenges
Time-consuming; subjective; complex models

Robustness quantification

1

p(θ|x) ∝θ p(x|θ)p(θ)

Robustness
Global & local
Rarely used
Approximation,

MCMC

Our solution: linear

response variational Bayes

Bayes Theorem

SLIDE 23

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 24

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 25

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 26

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 27

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 28

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

SLIDE 29

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

SLIDE 30

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

SLIDE 31

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

p(θ|x) q(θ) q∗(θ)

Variational Bayes

3

SLIDE 32

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

3

SLIDE 33

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

q(θ)

3

SLIDE 34

q(θ)

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x)

3

SLIDE 35

q(θ)

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x)

3

SLIDE 36

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ) p(θ|x) q∗(θ)

Variational Bayes

3

SLIDE 37

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 38

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 39

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 40

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 41

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast

q∗(θ)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 42

Variational Bayes (VB)
Approximation for

posterior

Minimize Kullback-Liebler

(KL) divergence: p(θ|x) KL(qkp(·|x))

VB practical success
point estimates and prediction
fast, streaming, distributed

q∗(θ)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(θ|x) q∗(θ)

3

SLIDE 43

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

4

SLIDE 44

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj)

4

SLIDE 45

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) θ1 θ2 p(θ|x)

4

SLIDE 46

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)

4

SLIDE 47

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x)

4

SLIDE 48

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

SLIDE 49

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

SLIDE 50

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

[Bishop 2006]

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2 p(θ|x) q∗(θ)

4

SLIDE 51

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(θ|x) q∗(θ)

4

SLIDE 52

Variational Bayes

!

Mean-field variational Bayes (MFVB)

!

Underestimates variance (sometimes

severely)

No covariance estimates

What about uncertainty?

q(θ) =

J

Y

j=1

q(θj) KL(q||p(·|x)) = Z

θ

q(θ) log q(θ) p(θ|x)dθ θ1 θ2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(θ|x) q∗(θ)

[Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]

4

SLIDE 53

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

SLIDE 54

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

SLIDE 55

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

SLIDE 56

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

SLIDE 57

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ) p(θ|x)

[Bishop 2006]

5

SLIDE 58

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x)

5

SLIDE 59

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 60

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 61

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 62

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 63

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 64

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 65

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 66

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 67

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 68

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ) Σ = d dtT  d dtCp(·|x)(t)

t=0

5

SLIDE 69

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 70

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 71

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 72

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

[Bishop 2006]

p(θ|x) q∗(θ)

5

SLIDE 73

Cumulant-generating function

!

True posterior covariance vs MFVB covariance

!

“Linear response”

!

The LRVB approximation

V := d2 dtT dtCq∗(t)

t=0

Linear response

mean = d dtC(t)

t=0

Σ := d2 dtT dtCp(·|x)(t)

t=0

log pt(θ) := log p(θ|x) + tT θ − C(t), MFVB q∗

t

Σ = d dtT Eptθ

t=0

≈ d dtT Eq∗

t θ

t=0

=: ˆ Σ

[Bishop 2006]

C(t) := log EetT θ p(θ|x) q∗(θ)

5

SLIDE 74

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt

6

SLIDE 75

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt

6

SLIDE 76

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 77

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 78

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 79

LRVB covariance estimate
Suppose exponential family with mean parametrization

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 80

LRVB covariance estimate
Suppose exponential family with mean parametrization
Symmetric and positive definite at local min of KL
The LRVB assumption:

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 81

LRVB covariance estimate
Suppose exponential family with mean parametrization
Symmetric and positive definite at local min of KL
The LRVB assumption: Eptθ ≈ Eq∗

t θ

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 82

LRVB covariance estimate
Suppose exponential family with mean parametrization
Symmetric and positive definite at local min of KL
The LRVB assumption: Eptθ ≈ Eq∗

t θ

p(θ|x) q∗(θ)

[Bishop 2006]

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 83

LRVB covariance estimate
Suppose exponential family with mean parametrization
Symmetric and positive definite at local min of KL
The LRVB assumption: Eptθ ≈ Eq∗

t θ

p(θ|x) q∗(θ)

LRVB estimate is exact when MFVB gives

exact mean (e.g. multivariate normal)

[Bishop 2006]

LRVB estimator

ˆ Σ = ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 ˆ Σ := d dtT Eq∗

t θ

t=0

qt mt = (I − V H)−1V

6

SLIDE 84

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

7

SLIDE 85

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

7

SLIDE 86

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

7

SLIDE 87

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

7

SLIDE 88

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

7

SLIDE 89

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

SLIDE 90

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

SLIDE 91

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

SLIDE 92

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit

7

SLIDE 93

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

SLIDE 94

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

SLIDE 95

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

SLIDE 96

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

SLIDE 97

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

profit 1 if microcredit

7

SLIDE 98

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ profit 1 if microcredit

7

SLIDE 99

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

7

SLIDE 100

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆

iid

∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

7

C ∼ Sep&LKJ(η, c, d)

SLIDE 101

Microcredit Experiment

8

SLIDE 102

Microcredit Experiment

MFVB

8

SLIDE 103

Microcredit Experiment

MFVB

8

SLIDE 104

Microcredit Experiment

One set of 2500

MCMC draws: 45 minutes

All of MFVB
ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

8

SLIDE 105

Microcredit Experiment

One set of 2500

MCMC draws: 45 minutes

All of MFVB
ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

8

SLIDE 106

Microcredit Experiment

One set of 2500

MCMC draws: 45 minutes

All of MFVB
ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

LRVB,! MFVB

8

SLIDE 107

Microcredit Experiment

One set of 2500

MCMC draws: 45 minutes

All of MFVB
ptimization, LRVB

uncertainties, all sensitivity measures: 58 seconds!

Many other models

and data sets: Mixture models, generalized linear mixed models, etc

MFVB

LRVB,! MFVB

8

SLIDE 108

Robustness quantification

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

9

SLIDE 109

Robustness quantification

Variational Bayes as an alternative to MCMC
Challenges of VB
Accurate uncertainties from VB
Accurate robustness quantification from VB
Big idea: derivatives/perturbations are easy in VB

9

SLIDE 110

Robustness quantification

Bayes Theorem

p(θ|x) ∝θ p(x|θ)p(θ)

10

SLIDE 111

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

10

SLIDE 112

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

10

SLIDE 113

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

10

SLIDE 114

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

10

SLIDE 115

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α

10

SLIDE 116

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α

10

SLIDE 117

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α

10

SLIDE 118

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α

10

SLIDE 119

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α

10

SLIDE 120

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α ≈ dEq∗

α[g(θ)]

dα

α

∆α =: ˆ S

10

SLIDE 121

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α ≈ dEq∗

α[g(θ)]

dα

α

∆α =: ˆ S LRVB estimator

10

SLIDE 122

Bayes Theorem

Robustness quantification

Bayes Theorem

pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α ≈ dEq∗

α[g(θ)]

dα

α

∆α =: ˆ S LRVB estimator

When in exponential family

q∗

α

10

SLIDE 123

Bayes Theorem

Robustness quantification

Bayes Theorem

ˆ S = A ✓ ∂2KL ∂m∂mT

m=m∗

◆−1 B pα(θ) := p(θ|x, α) ∝θ p(x|θ)p(θ|α)

Sensitivity

S := dEpα[g(θ)] dα

α

∆α ≈ dEq∗

α[g(θ)]

dα

α

∆α =: ˆ S LRVB estimator

When in exponential family

q∗

α

10

SLIDE 124

C ∼ Sep&LKJ(η, c, d)

Microcredit Experiment

Simplified from Meager (2015)
K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia)

Nk businesses in kth site (~900 to ~17K)
Profit of nth business at kth site:

! !

Priors and hyperpriors:

ykn

indep

∼ N(µk + Tknτk, σ2

k)

✓ µk τk ◆

iid

∼ N ✓✓ µ τ ◆ , C ◆ ✓ µ τ ◆

iid

∼ N ✓✓ µ0 τ0 ◆ , Λ−1 ◆ σ−2

k iid

∼ Γ(a, b) profit 1 if microcredit

11

SLIDE 125

Microcredit Experiment

12

SLIDE 126

Microcredit Experiment

MFVB

12

SLIDE 127

Microcredit Experiment

Perturb Λ11:

0.03 ➔ 0.04

MFVB

12

SLIDE 128

Microcredit Experiment

Perturb Λ11:

0.03 ➔ 0.04 Sensitivity

MFVB LRVB

12

SLIDE 129

Microcredit Experiment

Perturb Λ11:

0.03 ➔ 0.04 Sensitivity

MFVB LRVB

12

SLIDE 130

Microcredit Experiment

13

SLIDE 131

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g.

Microcredit Experiment

13

SLIDE 132

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g.

Microcredit Experiment

13

SLIDE 133

Microcredit Experiment

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 134

Microcredit Experiment

StdDevqτ = 1.8

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 135

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 136

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 137

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 138

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 139

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ Λ12 + = 0.03

13

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

SLIDE 140

Sensitivity of

the expected microcredit effect (τ)

Normalized to

be on scale of standard deviations in τ

E.g. (USD PPP)

Microcredit Experiment

StdDevqτ = 1.8 Eqτ = 3.7 = 2.06 ∗ StdDevqτ Λ12 + = 0.03

13

Eqτ < 1.0 ∗ StdDevqτ

SLIDE 141

Conclusion

We provide linear response variational Bayes:

supplements MFVB for fast & accurate covariance estimate

More from LRVB: fast & accurate robustness

quantification

Interested in your data and models:
Sensitivity to prior perturbations
Sensitivity to likelihood, data perturbations

14

SLIDE 142

T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational Bayes. NIPS, 2013.

!

R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!

!

R Giordano, T Broderick, R Meager, J Huggins, and MI

Jordan. Fast robustness quantification with variational
Bayes. ICML Workshop on #Data4Good: Machine Learning in

Social Good Applications, 2016. ArXiv:1606.07153.!

!

J Huggins, T Campbell, and T Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.

15

References

SLIDE 143

T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational Bayes. NIPS, 2013.

!

R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!

!

R Giordano, T Broderick, R Meager, J Huggins, and MI

Jordan. Fast robustness quantification with variational
Bayes. ICML Workshop on #Data4Good: Machine Learning in

Social Good Applications, 2016. ArXiv:1606.07153.!

!

J Huggins, T Campbell, and T Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.

15

References Special Thanks to Dan Cross

SLIDE 144

References

R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015. CM Bishop. Pattern Recognition and Machine Learning, 2006. D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014. B Fosdick. Modeling Heterogeneity within and between Matrices and Arrays, Chapter 4.7. PhD Thesis, University of Washington, 2013. DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. R Meager. Understanding the impact of microcredit expansions: A Bayesian hierarchical analysis of 7 randomised experiments. ArXiv:1506.06669, 2015. RE Turner and M Sahani. Two problems with variational expectation maximisation for time- series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011. B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.

16