An Introduction to Bayesian Optimisation and (Potential) - - PowerPoint PPT Presentation

an introduction to bayesian optimisation and potential
SMART_READER_LITE
LIVE PREVIEW

An Introduction to Bayesian Optimisation and (Potential) - - PowerPoint PPT Presentation

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017 Designing Electrolytes in Batteries


slide-1
SLIDE 1

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science

Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017

slide-2
SLIDE 2

Designing Electrolytes in Batteries

1/19

slide-3
SLIDE 3

Black-box Optimisation in Computational Astrophysics

Cosmological Simulator

Observation

E.g: Hubble Constant Baryonic Density

Likelihood Score Likelihood computation

1/19

slide-4
SLIDE 4

Black-box Optimisation

Expensive Blackbox Function

Other Examples:

  • Pre-clinical Drug Discovery
  • Optimal policy in Autonomous Driving
  • Synthetic gene design

1/19

slide-5
SLIDE 5

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only via noisy evaluations.

x f(x)

2/19

slide-6
SLIDE 6

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only via noisy evaluations.

x f(x)

2/19

slide-7
SLIDE 7

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only via noisy evaluations. Let x⋆ = argmaxx f (x).

x f(x) x∗

f(x∗)

2/19

slide-8
SLIDE 8

Outline

◮ Part I: Bayesian Optimisation

◮ Bayesian Models for f ◮ Two algorithms: upper confidence bounds & Thompson

sampling

◮ Part II: Some Modern Challenges

◮ Multi-fidelity Optimisation ◮ Parallelisation 3/19

slide-9
SLIDE 9

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R.

4/19

slide-10
SLIDE 10

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R. Functions with no observations

x f(x)

4/19

slide-11
SLIDE 11

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R. Prior GP

x f(x)

4/19

slide-12
SLIDE 12

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R. Observations

x f(x)

4/19

slide-13
SLIDE 13

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R. Posterior GP given observations

x f(x)

4/19

slide-14
SLIDE 14

Bayesian Models for f

e.g. Gaussian Processes (GP) GP: A distribution over functions from X to R. Posterior GP given observations

x f(x)

After t observations, f (x) ∼ N( µt(x), σ2

t (x) ).

4/19

slide-15
SLIDE 15

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP. Gaussian Process Upper Confidence Bound (GP-UCB)

(Srinivas et al. 2010)

x f(x)

5/19

slide-16
SLIDE 16

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP. Gaussian Process Upper Confidence Bound (GP-UCB)

(Srinivas et al. 2010)

x f(x)

1) Construct posterior GP.

5/19

slide-17
SLIDE 17

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP. Gaussian Process Upper Confidence Bound (GP-UCB)

(Srinivas et al. 2010)

x f(x) ϕt = µt−1 + β1/2

t

σt−1

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2

t

σt−1 is a UCB.

5/19

slide-18
SLIDE 18

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP. Gaussian Process Upper Confidence Bound (GP-UCB)

(Srinivas et al. 2010)

x f(x) ϕt = µt−1 + β1/2

t

σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2

t

σt−1 is a UCB. 3) Choose xt = argmaxx ϕt(x).

5/19

slide-19
SLIDE 19

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP. Gaussian Process Upper Confidence Bound (GP-UCB)

(Srinivas et al. 2010)

x f(x) ϕt = µt−1 + β1/2

t

σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2

t

σt−1 is a UCB. 3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt.

5/19

slide-20
SLIDE 20

GP-UCB

(Srinivas et al. 2010)

x f(x)

6/19

slide-21
SLIDE 21

GP-UCB

(Srinivas et al. 2010)

t = 1 x f(x)

6/19

slide-22
SLIDE 22

GP-UCB

(Srinivas et al. 2010)

t = 2 x f(x)

6/19

slide-23
SLIDE 23

GP-UCB

(Srinivas et al. 2010)

t = 3 x f(x)

6/19

slide-24
SLIDE 24

GP-UCB

(Srinivas et al. 2010)

t = 4 x f(x)

6/19

slide-25
SLIDE 25

GP-UCB

(Srinivas et al. 2010)

t = 5 x f(x)

6/19

slide-26
SLIDE 26

GP-UCB

(Srinivas et al. 2010)

t = 6 x f(x)

6/19

slide-27
SLIDE 27

GP-UCB

(Srinivas et al. 2010)

t = 7 x f(x)

6/19

slide-28
SLIDE 28

GP-UCB

(Srinivas et al. 2010)

t = 11 x f(x)

6/19

slide-29
SLIDE 29

GP-UCB

(Srinivas et al. 2010)

t = 25 x f(x)

6/19

slide-30
SLIDE 30

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ). Thompson Sampling (TS)

(Thompson, 1933).

x f(x)

7/19

slide-31
SLIDE 31

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ). Thompson Sampling (TS)

(Thompson, 1933).

x f(x)

1) Construct posterior GP.

7/19

slide-32
SLIDE 32

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ). Thompson Sampling (TS)

(Thompson, 1933).

x f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

7/19

slide-33
SLIDE 33

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ). Thompson Sampling (TS)

(Thompson, 1933).

x f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior. 3) Choose xt = argmaxx g(x).

7/19

slide-34
SLIDE 34

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ). Thompson Sampling (TS)

(Thompson, 1933).

x f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior. 3) Choose xt = argmaxx g(x). 4) Evaluate f at xt.

7/19

slide-35
SLIDE 35

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find the

  • ptimum under certain smoothness assumptions of f .

8/19

slide-36
SLIDE 36

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find the

  • ptimum under certain smoothness assumptions of f .

Other criteria for selecting xt:

◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´

andez-Lobato et al. 2014)

◮ Information directed sampling (Russo & Van Roy 2014)

8/19

slide-37
SLIDE 37

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find the

  • ptimum under certain smoothness assumptions of f .

Other criteria for selecting xt:

◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´

andez-Lobato et al. 2014)

◮ Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

◮ Neural networks (Snoek et al. 2015) ◮ Random Forests (Hutter 2009)

8/19

slide-38
SLIDE 38

Some Modern Challenges/Opportunities

  • 1. Multi-fidelity Optimisation

(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

  • 2. Parallelisation

(Kandasamy et al. Arxiv 2017)

9/19

slide-39
SLIDE 39
  • 1. Multi-fidelity Optimisation

(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . . we have access to cheap approximations.

x⋆ f

10/19

slide-40
SLIDE 40
  • 1. Multi-fidelity Optimisation

(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . . we have access to cheap approximations.

x⋆ f1 f2 f3 f

f1, f2, f3 ≈ f which are cheaper to evaluate.

10/19

slide-41
SLIDE 41
  • 1. Multi-fidelity Optimisation

(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . . we have access to cheap approximations.

x⋆ f1 f2 f3 f

f1, f2, f3 ≈ f which are cheaper to evaluate. E.g. f : a real world battery experiment f2: lab experiment f1: computer simulation

10/19

slide-42
SLIDE 42

MF-GP-UCB

(Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation), x⋆ xt t = 14 f (1) f (2)

11/19

slide-43
SLIDE 43

MF-GP-UCB

(Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation), x⋆ xt t = 14 f (1) f (2) Theorem: MF-GP-UCB finds the optimum x⋆ with less resources than GP-UCB on f (2).

11/19

slide-44
SLIDE 44

MF-GP-UCB

(Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation), x⋆ xt t = 14 f (1) f (2) Theorem: MF-GP-UCB finds the optimum x⋆ with less resources than GP-UCB on f (2). Can be extended to multiple approximations and continuous approximations.

11/19

slide-45
SLIDE 45

Experiment: Cosmological Maximum Likelihood Inference

◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters:

◮ Hubble Constant H0 ◮ Dark Energy Fraction ΩΛ ◮ Dark Matter Fraction ΩM

◮ Likelihood: Robertson Walker metric

(Robertson 1936)

Requires numerical integration for each point in the dataset.

12/19

slide-46
SLIDE 46

Experiment: Cosmological Maximum Likelihood Inference

3 cosmological parameters. (d = 3) Fidelities: integration on grids of size (102, 104, 106). (M = 3)

500 1000 1500 2000 2500 3000 3500

  • 10
  • 5

5 10 13/19

slide-47
SLIDE 47

Experiment: Hartmann-3D

2 Approximations (3 fidelities). We want to optimise the m = 3rd fidelity, which is the most

  • expensive. m = 1st fidelity is cheapest.

0.5 1 1.5 2 2.5 3 3.5 5 10 15 20 25 30 35 40

  • Num. of Queries

Query frequencies for Hartmann-3D f (3)(x)

m=1 m=2 m=3

14/19

slide-48
SLIDE 48
  • 2. Parallelising function evaluations

Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time.

15/19

slide-49
SLIDE 49
  • 2. Parallelising function evaluations

Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker

15/19

slide-50
SLIDE 50
  • 2. Parallelising function evaluations

Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker Parallel evaluations with M workers (Asynchronous)

15/19

slide-51
SLIDE 51
  • 2. Parallelising function evaluations

Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker Parallel evaluations with M workers (Asynchronous) Parallel evaluations with M workers (Synchronous)

15/19

slide-52
SLIDE 52

Parallel Thompson Sampling

(Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS At any given time,

  • 1. (x′, y′) ← Wait for

a worker to finish.

  • 2. Compute posterior GP.
  • 3. Draw a sample g ∼ GP.
  • 4. Re-deploy worker at

argmax g.

16/19

slide-53
SLIDE 53

Parallel Thompson Sampling

(Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS At any given time,

  • 1. (x′, y′) ← Wait for

a worker to finish.

  • 2. Compute posterior GP.
  • 3. Draw a sample g ∼ GP.
  • 4. Re-deploy worker at

argmax g. Synchronous: synTS At any given time,

  • 1. {(x′

m, y′ m)}M m=1 ← Wait for

all workers to finish.

  • 2. Compute posterior GP.
  • 3. Draw M samples

gm ∼ GP, ∀m.

  • 4. Re-deploy worker m at

argmax gm, ∀m.

16/19

slide-54
SLIDE 54

Experiment: Branin-2D M = 4

Evaluation time sampled from a uniform distribution

10 20 30 40 10 -2 10 -1

17/19

slide-55
SLIDE 55

Experiment: Branin-2D M = 4

Evaluation time sampled from a uniform distribution

10 20 30 40 10 -2 10 -1

17/19

slide-56
SLIDE 56

Experiment: Branin-2D M = 4

Evaluation time sampled from a uniform distribution

synRAND synHUCB synUCBPE synTS asyRAND asyUCB asyHUCB asyEI asyHTS asyTS

10 20 30 40 10 -2 10 -1

17/19

slide-57
SLIDE 57

Experiment: Hartmann-18D M = 25

Evaluation time sampled from an exponential distribution

synRAND synHUCB synUCBPE synTS asyRAND asyUCB asyHUCB asyEI asyHTS asyTS

5 10 15 20 25 30 2.5 3 3.5 4 4.5 5 5.5 6 6.5

18/19

slide-58
SLIDE 58

Summary

◮ Black-box Optimisation methods are used in several scientific

and engineering applications.

◮ Bayesian Optimisation: A method for black-box optimisation

which uses Bayesian uncertainty estimates for f .

◮ Some modern challenges

◮ Multi-fidelity optimisation ◮ Parallel evaluations ◮ and several more . . . 19/19

slide-59
SLIDE 59

Summary

◮ Black-box Optimisation methods are used in several scientific

and engineering applications.

◮ Bayesian Optimisation: A method for black-box optimisation

which uses Bayesian uncertainty estimates for f .

◮ Some modern challenges

◮ Multi-fidelity optimisation ◮ Parallel evaluations ◮ and several more . . .

Thank you.

Slides are up on my website:

www.cs.cmu.edu/∼kkandasa

19/19