Time-Bounded Sequential Parameter Optimization Frank Hutter, Holger - - PowerPoint PPT Presentation

time bounded sequential parameter optimization
SMART_READER_LITE
LIVE PREVIEW

Time-Bounded Sequential Parameter Optimization Frank Hutter, Holger - - PowerPoint PPT Presentation

Time-Bounded Sequential Parameter Optimization Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, Kevin P. Murphy Department of Computer Science University of British Columbia Canada { hutter, hoos, kevinlb, murphyk } @cs.ubc.ca Automated


slide-1
SLIDE 1

Time-Bounded Sequential Parameter Optimization

Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, Kevin P. Murphy

Department of Computer Science University of British Columbia Canada {hutter, hoos, kevinlb, murphyk}@cs.ubc.ca

slide-2
SLIDE 2

Automated Parameter Optimization

Most algorithms have parameters

◮ Decisions that are left open during algorithm design ◮ Instantiate to optimize empirical performance

2

slide-3
SLIDE 3

Automated Parameter Optimization

Most algorithms have parameters

◮ Decisions that are left open during algorithm design ◮ Instantiate to optimize empirical performance ◮ E.g. local search

– neighbourhoods, restarts, types of perturbations, tabu length (or range for it), etc

2

slide-4
SLIDE 4

Automated Parameter Optimization

Most algorithms have parameters

◮ Decisions that are left open during algorithm design ◮ Instantiate to optimize empirical performance ◮ E.g. local search

– neighbourhoods, restarts, types of perturbations, tabu length (or range for it), etc

◮ E.g., tree search

– Branching heuristics, no-good learning, restarts, pre-processing, etc

2

slide-5
SLIDE 5

Automated Parameter Optimization

Most algorithms have parameters

◮ Decisions that are left open during algorithm design ◮ Instantiate to optimize empirical performance ◮ E.g. local search

– neighbourhoods, restarts, types of perturbations, tabu length (or range for it), etc

◮ E.g., tree search

– Branching heuristics, no-good learning, restarts, pre-processing, etc

Automatically find good instantiation of parameters

◮ Eliminate most tedious part of algorithm design and end use ◮ Save development time & improve performance

2

slide-6
SLIDE 6

Parameter Optimization Methods

◮ Lots of work on numerical parameters, e.g.

– CALIBRA [Adenso-Diaz & Laguna, ’06] – Population-based, e.g. CMA-ES [Hansen et al, ’95-present]

3

slide-7
SLIDE 7

Parameter Optimization Methods

◮ Lots of work on numerical parameters, e.g.

– CALIBRA [Adenso-Diaz & Laguna, ’06] – Population-based, e.g. CMA-ES [Hansen et al, ’95-present]

◮ Categorical parameters

– Racing algorithms, F-Race [Birattari et al., ’02-present] – Iterated Local Search, ParamILS [Hutter et al., AAAI ’07 & JAIR’09]

3

slide-8
SLIDE 8

Parameter Optimization Methods

◮ Lots of work on numerical parameters, e.g.

– CALIBRA [Adenso-Diaz & Laguna, ’06] – Population-based, e.g. CMA-ES [Hansen et al, ’95-present]

◮ Categorical parameters

– Racing algorithms, F-Race [Birattari et al., ’02-present] – Iterated Local Search, ParamILS [Hutter et al., AAAI ’07 & JAIR’09]

◮ Success of parameter optimization

– Many parameters (e.g., CPLEX with 63 parameters) – Large speedups (sometimes orders of magnitude!) – For many problems: SAT, MIP, time-tabling, protein folding, ...

3

slide-9
SLIDE 9

Limitations of Model-Free Parameter Optimization

Model-free methods only return the best parameter setting

◮ Often that is all you need

– E.g.: end user can customize algorithm

4

slide-10
SLIDE 10

Limitations of Model-Free Parameter Optimization

Model-free methods only return the best parameter setting

◮ Often that is all you need

– E.g.: end user can customize algorithm

◮ But sometimes we would like to know more

– How important is each of the parameters? – Which parameters interact? – For which types of instances is a parameter setting good? Inform algorithm designer

4

slide-11
SLIDE 11

Limitations of Model-Free Parameter Optimization

Model-free methods only return the best parameter setting

◮ Often that is all you need

– E.g.: end user can customize algorithm

◮ But sometimes we would like to know more

– How important is each of the parameters? – Which parameters interact? – For which types of instances is a parameter setting good? Inform algorithm designer

Response surface models can help

◮ Predictive models of algorithm performance with given

parameter settings

4

slide-12
SLIDE 12

Sequential Parameter Optimization (SPO)

◮ Original SPO [Bartz-Beielstein et al., ’05-present]

◮ SPO toolbox ◮ Set of interactive tools for parameter optimization 5

slide-13
SLIDE 13

Sequential Parameter Optimization (SPO)

◮ Original SPO [Bartz-Beielstein et al., ’05-present]

◮ SPO toolbox ◮ Set of interactive tools for parameter optimization

◮ Studied SPO components [Hutter et al, GECCO-09]

◮ Want completely automated tool

More robust version: SPO+

5

slide-14
SLIDE 14

Sequential Parameter Optimization (SPO)

◮ Original SPO [Bartz-Beielstein et al., ’05-present]

◮ SPO toolbox ◮ Set of interactive tools for parameter optimization

◮ Studied SPO components [Hutter et al, GECCO-09]

◮ Want completely automated tool

More robust version: SPO+

◮ This work: TB-SPO, reduce computational overheads

5

slide-15
SLIDE 15

Sequential Parameter Optimization (SPO)

◮ Original SPO [Bartz-Beielstein et al., ’05-present]

◮ SPO toolbox ◮ Set of interactive tools for parameter optimization

◮ Studied SPO components [Hutter et al, GECCO-09]

◮ Want completely automated tool

More robust version: SPO+

◮ This work: TB-SPO, reduce computational overheads ◮ Ongoing work: extend TB-SPO to handle

– Categorical parameters – Multiple benchmark instances

5

slide-16
SLIDE 16

Sequential Parameter Optimization (SPO)

◮ Original SPO [Bartz-Beielstein et al., ’05-present]

◮ SPO toolbox ◮ Set of interactive tools for parameter optimization

◮ Studied SPO components [Hutter et al, GECCO-09]

◮ Want completely automated tool

More robust version: SPO+

◮ This work: TB-SPO, reduce computational overheads ◮ Ongoing work: extend TB-SPO to handle

– Categorical parameters – Multiple benchmark instances – Very promising results for both

5

slide-17
SLIDE 17

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models
  • 3. Conclusions

6

slide-18
SLIDE 18

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models
  • 3. Conclusions

7

slide-19
SLIDE 19

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. . True function . .

8

slide-20
SLIDE 20

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. True function Function evaluations .

8

slide-21
SLIDE 21

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. . . Function evaluations .

8

slide-22
SLIDE 22

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings
  • 1. Fit a model to the data

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev . Function evaluations .

8

slide-23
SLIDE 23

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings
  • 1. Fit a model to the data
  • 2. Use model to pick promising parameter setting

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev . Function evaluations EI (scaled)

8

slide-24
SLIDE 24

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings
  • 1. Fit a model to the data
  • 2. Use model to pick promising parameter setting
  • 3. Perform an algorithm run with that parameter setting

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

8

slide-25
SLIDE 25

Sequential Model-Based Optimization (SMBO)

Blackbox function optimization; function = algo. performance

  • 0. Run algorithm with initial parameter settings
  • 1. Fit a model to the data
  • 2. Use model to pick promising parameter setting
  • 3. Perform an algorithm run with that parameter setting

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

8

slide-26
SLIDE 26

Computational Overhead due to Models: Example

Example times

  • 0. Run algorithm with initial parameter settings
  • 1. Fit a model to the data
  • 2. Use model to pick promising parameter setting
  • 3. Perform an algorithm run with that parameter setting

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

9

slide-27
SLIDE 27

Computational Overhead due to Models: Example

Example times

  • 0. Run algorithm with initial parameter settings 1000s
  • 1. Fit a model to the data
  • 2. Use model to pick promising parameter setting
  • 3. Perform an algorithm run with that parameter setting

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

9

slide-28
SLIDE 28

Computational Overhead due to Models: Example

Example times

  • 0. Run algorithm with initial parameter settings 1000s
  • 1. Fit a model to the data 50s
  • 2. Use model to pick promising parameter setting
  • 3. Perform an algorithm run with that parameter setting

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

9

slide-29
SLIDE 29

Computational Overhead due to Models: Example

Example times

  • 0. Run algorithm with initial parameter settings 1000s
  • 1. Fit a model to the data 50s
  • 2. Use model to pick promising parameter setting 20s
  • 3. Perform an algorithm run with that parameter setting

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

9

slide-30
SLIDE 30

Computational Overhead due to Models: Example

Example times

  • 0. Run algorithm with initial parameter settings 1000s
  • 1. Fit a model to the data 50s
  • 2. Use model to pick promising parameter setting 20s
  • 3. Perform an algorithm run with that parameter setting 10s

◮ Repeat 1-3 until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

9

slide-31
SLIDE 31

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models

Do More Algorithm Runs To Bound Model Overhead Using a Cheaper (and Better!) Model

  • 3. Conclusions

10

slide-32
SLIDE 32

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models

Do More Algorithm Runs To Bound Model Overhead Using a Cheaper (and Better!) Model

  • 3. Conclusions

11

slide-33
SLIDE 33

Removing the costly initial design (phase 0)

◮ How to choose number of param. settings in initial design?

◮ Too large: take too long to evaluate all of the settings ◮ Too small: poor first model, might not recover 12

slide-34
SLIDE 34

Removing the costly initial design (phase 0)

◮ How to choose number of param. settings in initial design?

◮ Too large: take too long to evaluate all of the settings ◮ Too small: poor first model, might not recover

◮ Our solution: simply drop the initial design

◮ Instead: interleave random settings during the search ◮ Much better anytime performance 12

slide-35
SLIDE 35

Overhead due to Models

Central SMBO algorithm loop

◮ Repeat: Example times

  • 1. Fit model using performance data gathered so far 50s
  • 2. Use model to select promising parameter setting 20s
  • 3. Perform algorithm run(s) with that parameter setting 10s

Only small fraction of time spent actually running algorithms

13

slide-36
SLIDE 36

Overhead due to Models

Central SMBO algorithm loop

◮ Repeat: Example times

  • 1. Fit model using performance data gathered so far 50s
  • 2. Use model to select promising parameter setting 20s
  • 3. Perform algorithm run(s) with that parameter setting 10s

Only small fraction of time spent actually running algorithms

Solution 1

◮ Do more algorithm runs to bound model overhead

– Select not one but many promising points (little overhead) – Perform runs for at least as long as phases 1 and 2 took

13

slide-37
SLIDE 37

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc ◮ Stop once time bound is reached

14

slide-38
SLIDE 38

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc

– Use mechanism from SPO+:

◮ Stop once time bound is reached

14

slide-39
SLIDE 39

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc

– Use mechanism from SPO+: – Incrementally perform runs for θ until either

+ Empirical performance for θ worse than for θinc drop θ + Performed as many runs for θ as for θinc θ becomes new θinc

◮ Stop once time bound is reached

14

slide-40
SLIDE 40

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc

– Use mechanism from SPO+: – Incrementally perform runs for θ until either

+ Empirical performance for θ worse than for θinc drop θ + Performed as many runs for θ as for θinc θ becomes new θinc

◮ Stop once time bound is reached

Algorithms

◮ TB-SPO

– Get ordered list of promising parameter settings using model – Interleave random settings: 2nd, 4th, etc

14

slide-41
SLIDE 41

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc

– Use mechanism from SPO+: – Incrementally perform runs for θ until either

+ Empirical performance for θ worse than for θinc drop θ + Performed as many runs for θ as for θinc θ becomes new θinc

◮ Stop once time bound is reached

Algorithms

◮ TB-SPO

– Get ordered list of promising parameter settings using model – Interleave random settings: 2nd, 4th, etc – Compare one param. setting at a time to incumbent – Nice side effect: additional runs on good random settings

14

slide-42
SLIDE 42

Which Setting to Perform How Many Runs for

Heuristic Mechanism

◮ Compare one configuration θ at a time to the incumbent θinc

– Use mechanism from SPO+: – Incrementally perform runs for θ until either

+ Empirical performance for θ worse than for θinc drop θ + Performed as many runs for θ as for θinc θ becomes new θinc

◮ Stop once time bound is reached

Algorithms

◮ TB-SPO

– Get ordered list of promising parameter settings using model – Interleave random settings: 2nd, 4th, etc – Compare one param. setting at a time to incumbent – Nice side effect: additional runs on good random settings

◮ “Strawman” algorithm: TB-Random

– Only use random settings – Compare one param. setting at a time to incumbent

14

slide-43
SLIDE 43

Experimental validation: setup

◮ Optimizing SLS algorithm SAPS

– Prominent SAT solver with 4 continuous parameters – Previously used to evaluate parameter optimization approaches

15

slide-44
SLIDE 44

Experimental validation: setup

◮ Optimizing SLS algorithm SAPS

– Prominent SAT solver with 4 continuous parameters – Previously used to evaluate parameter optimization approaches

◮ Seven different SAT instances

– 1 Quasigroups with holes (QWH) instance used previously – 3 instances from Quasigroup completion (QCP) – 3 instances from Graph colouring based on smallworld graphs (SWGCP)

15

slide-45
SLIDE 45

Experimental validation: results

SAPS-QWH instance

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) .

Both methods with same LHD

16

slide-46
SLIDE 46

Experimental validation: results

SAPS-QWH instance

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) .

Both methods with same LHD

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) TB−SPO

TB-SPO with empty LHD

16

slide-47
SLIDE 47

Experimental validation: results

SAPS-QWH instance

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) .

Both methods with same LHD

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) TB−SPO

TB-SPO with empty LHD

Scenario SPO+ TB-SPO pval1 Saps-QCP-med [·10−2] 4.50 ± 0.31 4.32 ± 0.21 4 · 10−3 Saps-QCP-q075 3.77 ± 9.72 0.19 ± 0.02 2 · 10−6 Saps-QCP-q095 49.91 ± 0.00 2.20 ± 1.17 1 · 10−10 Saps-QWH [·103] 10.7 ± 0.76 10.1 ± 0.58 6 · 10−3 Saps-SWGCP-med 49.95 ± 0.00 0.18 ± 0.03 1 · 10−10 Saps-SWGCP-q075 50 ± 0 0.24 ± 0.04 1 · 10−10 Saps-SWGCP-q095 50 ± 0 0.25 ± 0.05 1 · 10−10

16

slide-48
SLIDE 48

Experimental validation: results

SAPS-QWH instance

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) .

Both methods with same LHD

10

1

10

2

10

3

10

4

10

5

CPU time t spent for configuration [s] performance pt SPO+ TB−SPO (w/ LHD) TB−SPO

TB-SPO with empty LHD

Scenario SPO+ TB-SPO TB-Random pval1 pval2 Saps-QCP-med [·10−2] 4.50 ± 0.31 4.32 ± 0.21 4.23 ± 0.15 4 · 10−3 0.17 Saps-QCP-q075 3.77 ± 9.72 0.19 ± 0.02 0.19 ± 0.01 2 · 10−6 0.78 Saps-QCP-q095 49.91 ± 0.00 2.20 ± 1.17 2.64 ± 1.24 1 · 10−10 0.12 Saps-QWH [·103] 10.7 ± 0.76 10.1 ± 0.58 9.88 ± 0.41 6 · 10−3 0.14 Saps-SWGCP-med 49.95 ± 0.00 0.18 ± 0.03 0.17 ± 0.02 1 · 10−10 0.37 Saps-SWGCP-q075 50 ± 0 0.24 ± 0.04 0.22 ± 0.03 1 · 10−10 0.08 Saps-SWGCP-q095 50 ± 0 0.25 ± 0.05 0.28 ± 0.10 1 · 10−10 0.89

16

slide-49
SLIDE 49

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models

Do More Algorithm Runs To Bound Model Overhead Using a Cheaper (and Better!) Model

  • 3. Conclusions

17

slide-50
SLIDE 50

2 Different GP Models for Noisy Optimization

◮ Model I

– Fit standard GP assuming Gaussian observation noise

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

GP mean prediction GP mean +/− 2*stddev True function Function evaluations EI (scaled)

Model I: noisy fit of original response

18

slide-51
SLIDE 51

2 Different GP Models for Noisy Optimization

◮ Model I

– Fit standard GP assuming Gaussian observation noise

◮ Model II (used in SPO, SPO+, and TB-SPO)

– Compute empirical mean of responses at each param. setting – Fit noise-free GP to those means

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

GP mean prediction GP mean +/− 2*stddev True function Function evaluations EI (scaled)

Model I: noisy fit of original response

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Model II: noise-free fit of empir. means

18

slide-52
SLIDE 52

2 Different GP Models for Noisy Optimization

◮ Model I

– Fit standard GP assuming Gaussian observation noise

◮ Model II (used in SPO, SPO+, and TB-SPO)

– Compute empirical mean of responses at each param. setting – Fit noise-free GP to those means – But assumes empirical means are perfect (even when based on just 1 run!)

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

GP mean prediction GP mean +/− 2*stddev True function Function evaluations EI (scaled)

Model I: noisy fit of original response

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Model II: noise-free fit of empir. means

18

slide-53
SLIDE 53

2 Different GP Models for Noisy Optimization

◮ Model I

– Fit standard GP assuming Gaussian observation noise

◮ Model II (used in SPO, SPO+, and TB-SPO)

– Compute empirical mean of responses at each param. setting – Fit noise-free GP to those means – But assumes empirical means are perfect (even when based on just 1 run!) – Cheaper (here 11 means vs 110 raw data points)

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

GP mean prediction GP mean +/− 2*stddev True function Function evaluations EI (scaled)

Model I: noisy fit of original response

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30 35

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Model II: noise-free fit of empir. means

18

slide-54
SLIDE 54

How much faster is the approximate Gaussian Process?

Complexity of Gaussian process regression (GPR)

◮ n data points ◮ Basic GPR equations: inverting n × n matrix ◮ Numerical optimization of hyper-parameters: h steps

19

slide-55
SLIDE 55

How much faster is the approximate Gaussian Process?

Complexity of Gaussian process regression (GPR)

◮ n data points ◮ Basic GPR equations: inverting n × n matrix ◮ Numerical optimization of hyper-parameters: h steps

O(h · n3) for model fitting

19

slide-56
SLIDE 56

How much faster is the approximate Gaussian Process?

Complexity of Gaussian process regression (GPR)

◮ n data points ◮ Basic GPR equations: inverting n × n matrix ◮ Numerical optimization of hyper-parameters: h steps

O(h · n3) for model fitting

◮ O(n2) for each model prediction

19

slide-57
SLIDE 57

How much faster is the approximate Gaussian Process?

Complexity of Gaussian process regression (GPR)

◮ n data points ◮ Basic GPR equations: inverting n × n matrix ◮ Numerical optimization of hyper-parameters: h steps

O(h · n3) for model fitting

◮ O(n2) for each model prediction

Complexity of projected process (PP) approximation

◮ Active set of p data points only invert p × p matrix ◮ Throughout: use p = 300

19

slide-58
SLIDE 58

How much faster is the approximate Gaussian Process?

Complexity of Gaussian process regression (GPR)

◮ n data points ◮ Basic GPR equations: inverting n × n matrix ◮ Numerical optimization of hyper-parameters: h steps

O(h · n3) for model fitting

◮ O(n2) for each model prediction

Complexity of projected process (PP) approximation

◮ Active set of p data points only invert p × p matrix ◮ Throughout: use p = 300 ◮ O(n · p2 + h · p3) for model fitting ◮ O(p2) for each model prediction

19

slide-59
SLIDE 59

Empirical Evaluation of the Model

Empirical time performance (1 000 data points)

PP NF 0.5 1 1.5 2 QCP−med PP NF 0.5 1 1.5 2 QCP−q075 PP NF 0.5 1 1.5 2 QCP−q095 PP NF 0.5 1 1.5 2 QWH PP NF 0.5 1 1.5 2 SWGCP−med PP NF 0.5 1 1.5 2 SWGCP−q075 PP NF 0.5 1 1.5 2 SWGCP−q095

Log10 of CPU time (in seconds)

20

slide-60
SLIDE 60

Empirical Evaluation of the Model

Empirical time performance (1 000 data points)

PP NF 0.5 1 1.5 2 QCP−med PP NF 0.5 1 1.5 2 QCP−q075 PP NF 0.5 1 1.5 2 QCP−q095 PP NF 0.5 1 1.5 2 QWH PP NF 0.5 1 1.5 2 SWGCP−med PP NF 0.5 1 1.5 2 SWGCP−q075 PP NF 0.5 1 1.5 2 SWGCP−q095

Log10 of CPU time (in seconds)

Empirical model quality

◮ Measures correlation between

– how promising the model judges a parameter setting to be – true performance of that parameter setting (evaluated offline)

20

slide-61
SLIDE 61

Empirical Evaluation of the Model

Empirical time performance (1 000 data points)

PP NF 0.5 1 1.5 2 QCP−med PP NF 0.5 1 1.5 2 QCP−q075 PP NF 0.5 1 1.5 2 QCP−q095 PP NF 0.5 1 1.5 2 QWH PP NF 0.5 1 1.5 2 SWGCP−med PP NF 0.5 1 1.5 2 SWGCP−q075 PP NF 0.5 1 1.5 2 SWGCP−q095

Log10 of CPU time (in seconds)

Empirical model quality

◮ Measures correlation between

– how promising the model judges a parameter setting to be – true performance of that parameter setting (evaluated offline)

PP NF 0.2 0.3 0.4 0.5 0.6 QCP−med PP NF 0.5 0.6 0.7 0.8 QCP−q075 PP NF 0.5 0.6 0.7 0.8 QCP−q095 PP NF 0.4 0.6 0.8 QWH PP NF 0.2 0.4 0.6 0.8 SWGCP−med PP NF −0.2 0.2 0.4 0.6 SWGCP−q075 PP NF −0.2 0.2 0.4 0.6 SWGCP−q095

Correlation (high is good, 1 is optimal)

20

slide-62
SLIDE 62

Final Evaluation

◮ Comparing:

◮ R: TB-Random ◮ S: TB-SPO 21

slide-63
SLIDE 63

Final Evaluation

◮ Comparing:

◮ R: TB-Random ◮ S: TB-SPO ◮ P: TB-SPO(PP) 21

slide-64
SLIDE 64

Final Evaluation

◮ Comparing:

◮ R: TB-Random ◮ S: TB-SPO ◮ P: TB-SPO(PP) ◮ F: FocusedILS (variant of ParamILS; limited by discretization) 21

slide-65
SLIDE 65

Final Evaluation

◮ Comparing:

◮ R: TB-Random ◮ S: TB-SPO ◮ P: TB-SPO(PP) ◮ F: FocusedILS (variant of ParamILS; limited by discretization)

Scenario TB-Random TB-SPO TB-SPO(PP) FocusedILS

Saps-QCP-med [·10−2]

4.23 ± 0.15 4.32 ± 0.21 4.13 ± 0.14 5.12 ± 0.41

Saps-QCP-q075

0.19 ± 0.01 0.19 ± 0.02 0.18 ± 0.01 0.24 ± 0.02

Saps-QCP-q095

2.64 ± 1.24 2.20 ± 1.17 1.44 ± 0.53 2.99 ± 3.20

Saps-QWH [·103]

9.88 ± 0.41 10.1 ± 0.58 9.42 ± 0.32 10.6 ± 0.49

Saps-SWGCP-med

0.17 ± 0.02 0.18 ± 0.03 0.16 ± 0.02 0.27 ± 0.12

Saps-SWGCP-q075

0.22 ± 0.03 0.24 ± 0.04 0.21 ± 0.02 0.35 ± 0.08

Saps-SWGCP-q095

0.28 ± 0.10 0.25 ± 0.05 0.23 ± 0.05 0.37 ± 0.16

◮ TB-SPO(PP) best on all 7 instances ◮ Good models do help

21

slide-66
SLIDE 66

Outline

  • 1. Sequential Model-Based Optimization
  • 2. Reducing the Computational Overhead Due To Models
  • 3. Conclusions

22

slide-67
SLIDE 67

Conclusions

Parameter optimization

◮ Can be performed by automated approaches

– Sometimes much better than by human experts – Automation can cut development time & improve results

23

slide-68
SLIDE 68

Conclusions

Parameter optimization

◮ Can be performed by automated approaches

– Sometimes much better than by human experts – Automation can cut development time & improve results

Sequential Parameter Optimization (SPO)

◮ Uses predictive models of algorithm performance ◮ Can inform algorithm designer about parameter space

23

slide-69
SLIDE 69

Conclusions

Parameter optimization

◮ Can be performed by automated approaches

– Sometimes much better than by human experts – Automation can cut development time & improve results

Sequential Parameter Optimization (SPO)

◮ Uses predictive models of algorithm performance ◮ Can inform algorithm designer about parameter space

Time-Bounded SPO

◮ Eliminates Computational Overheads of SPO

– No need for costly initial design – Bounds the time spent building and using the model – Uses efficient approximate Gaussian process model Practical for parameter optimization in a time budget

23

slide-70
SLIDE 70

Conclusions

Parameter optimization

◮ Can be performed by automated approaches

– Sometimes much better than by human experts – Automation can cut development time & improve results

Sequential Parameter Optimization (SPO)

◮ Uses predictive models of algorithm performance ◮ Can inform algorithm designer about parameter space

Time-Bounded SPO

◮ Eliminates Computational Overheads of SPO

– No need for costly initial design – Bounds the time spent building and using the model – Uses efficient approximate Gaussian process model Practical for parameter optimization in a time budget

◮ Clearly outperforms previous SPO versions and ParamILS

23

slide-71
SLIDE 71

Current & Future Work

◮ Generalizations of TB-SPO to handle

– Categorical parameters – Multiple benchmark instances

24

slide-72
SLIDE 72

Current & Future Work

◮ Generalizations of TB-SPO to handle

– Categorical parameters – Multiple benchmark instances

◮ Applications of Automated Parameter Optimization

– Optimization of MIP solvers [to be submitted to CP-AI-OR]

24

slide-73
SLIDE 73

Current & Future Work

◮ Generalizations of TB-SPO to handle

– Categorical parameters – Multiple benchmark instances

◮ Applications of Automated Parameter Optimization

– Optimization of MIP solvers [to be submitted to CP-AI-OR]

◮ Use models to gain scientific insights

– Importance of each parameter – Interaction of parameters – Interaction of parameters and instances features

24

slide-74
SLIDE 74

Current & Future Work

◮ Generalizations of TB-SPO to handle

– Categorical parameters – Multiple benchmark instances

◮ Applications of Automated Parameter Optimization

– Optimization of MIP solvers [to be submitted to CP-AI-OR]

◮ Use models to gain scientific insights

– Importance of each parameter – Interaction of parameters – Interaction of parameters and instances features

◮ Per-instance approaches

– Build joint model of instance features and parameters – Given a new unseen instance:

+ Compute instance features (fast) + Use parameter setting predicted to be best for those features

24