Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 - - PowerPoint PPT Presentation

▶

Feb 26, 2023 104 likes •352 views

Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2 1 Albert-Ludwigs-Universit at Freiburg, Germany 2 Universit at Basel, Switzerland June 2012 IPC 2011 Sequential

SLIDE 1

Learning Portfolios of Automatically Tuned Planners

Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2

1Albert-Ludwigs-Universit¨

at Freiburg, Germany

2Universit¨

at Basel, Switzerland

June 2012

SLIDE 2

IPC 2011 – Sequential Satisficing Track

Results

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

SLIDE 3

IPC 2011 – Sequential Satisficing Track

Results

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

SLIDE 4

Motivation

Tuned planners:

Tune for complete benchmark set Commit to single planner

Portfolio planners:

Manually select planners Calculate times greedily

Our approach:

Tune one planner for each domain in training set automatically Evaluate multiple portfolio generation methods

SLIDE 5

Overview

Domain Tuning Portfolio Learning

SLIDE 6

Domain Tuning

SLIDE 7

Tuning Procedure – Domains

Training set of 21 former IPC domains (1998–2006) Tune Fast Downward with ParamILS for each domain

SLIDE 8

Tuning Procedure – Configurations

Heuristics: hFF, hadd, hcg, hcea, hLM Searches: eager, lazy Type of landmarks, cost-handling, preferred operators Numerous combination options and conditional parameters → 2.99 · 1013 configurations

SLIDE 9

Tuning Results – Trends

Preferred operators (19/21) Lazy search (20x), eager search (1x) Most configurations use one (10x) or two (9x) heuristics hFF (12x), hLM (11x), hcg (6x), hcea (4x), hadd (1x)

SLIDE 10

Tuning Results

coverage Planners

ptical-t

pathways pipes-t tpp . . . Domains

ptical-t (48)

21 3 . . . pathways (30) 22 30 29 30 . . . pipes-t (50) 26 39 42 38 . . . tpp (30) 24 30 30 30 . . . . . . . . . . . . . . . . . . . . .

SLIDE 11

Portfolio Learning

SLIDE 12

Portfolio Generators

Input: planners, results on training set, total time limit Output: {depot: 18s, gripper: 65s, . . . }

SLIDE 13

Stone Soup

Hill-climbing in the portfolio space Start: {depot: 0, gripper: 0, . . . } Successors: {depot: g, gripper: 0, . . . }, {depot: 0, gripper: g, . . . }, . . . Choose best and repeat

SLIDE 14

Uniform

Run all planners for same amout of time Result: {depot: 85, gripper: 85, . . . }

SLIDE 15

Selector

Brute force For all subset sizes {1, . . . , 21} compute best portfolio with equal time shares

SLIDE 16

Cluster

Find k clusters with k-means Cluster by quality From each cluster choose best planner Give all planners equal time shares

SLIDE 17

Increasing Time Limit

Iteratively increase the portfolio time limit Get problems that can be solved in that limit Find best planner for these problems Give it the needed time Repeat until no more problems solvable or time limit exceeded

SLIDE 18

Domain-wise

Iteratively retrieve domain with highest improvement potential Give the fastest improving planner the needed time Continue until total time limit reached or no more domains can be improved

SLIDE 19

Randomized Iterative Search

Use any existing portfolio as initialization (e.g. uniform) Successors:

Swap time slice between planners Collect time from all planners and give it to single one

Commit to first successor improving score Run until score stagnates long enough

SLIDE 20

Portfolio Results

30 minutes

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 S t

e S

p

1 U n i f

m S e l e c t

2 C l u s t e r

6 I T L

D

a i n w i s e R I S 160 180 200 220 240 Quality

SLIDE 21

Different timeouts

1, 3, 5, 15 minutes

Uniform portfolio outperforms LAMA even in 3 min setting Other portfolios are even better Less planners in portfolio when less time is available No portfolio dominates others for all timeouts Cluster and Increasing Time Limit among best performers Randomized Iterative Search prone to overfitting

SLIDE 22

Outlook

Promising initial results for optimal configurations Adaptively select next configuration Use more heterogeneous planners Apply automatic portfolio diversification in other areas

SLIDE 23

Learning Portfolios of Automatically Tuned Planners

Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2

June 2012

IPC 2011 – Sequential Satisficing Track

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

IPC 2011 – Sequential Satisficing Track

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

Motivation

Tuned planners:

Portfolio planners:

Our approach:

Overview

Domain Tuning Portfolio Learning

Domain Tuning

Tuning Procedure – Domains

Training set of 21 former IPC domains (1998–2006) Tune Fast Downward with ParamILS for each domain

Tuning Procedure – Configurations

Heuristics: hFF, hadd, hcg, hcea, hLM Searches: eager, lazy Type of landmarks, cost-handling, preferred operators Numerous combination options and conditional parameters → 2.99 · 1013 configurations

Tuning Results – Trends

Preferred operators (19/21) Lazy search (20x), eager search (1x) Most configurations use one (10x) or two (9x) heuristics hFF (12x), hLM (11x), hcg (6x), hcea (4x), hadd (1x)

Tuning Results

coverage Planners

pathways pipes-t tpp . . . Domains

21 3 . . . pathways (30) 22 30 29 30 . . . pipes-t (50) 26 39 42 38 . . . tpp (30) 24 30 30 30 . . . . . . . . . . . . . . . . . . . . .

Portfolio Learning

Portfolio Generators

Input: planners, results on training set, total time limit Output: {depot: 18s, gripper: 65s, . . . }

Stone Soup

Hill-climbing in the portfolio space Start: {depot: 0, gripper: 0, . . . } Successors: {depot: g, gripper: 0, . . . }, {depot: 0, gripper: g, . . . }, . . . Choose best and repeat

Uniform

Run all planners for same amout of time Result: {depot: 85, gripper: 85, . . . }

Selector

Brute force For all subset sizes {1, . . . , 21} compute best portfolio with equal time shares

Cluster

Find k clusters with k-means Cluster by quality From each cluster choose best planner Give all planners equal time shares

Increasing Time Limit

Iteratively increase the portfolio time limit Get problems that can be solved in that limit Find best planner for these problems Give it the needed time Repeat until no more problems solvable or time limit exceeded

Domain-wise

Iteratively retrieve domain with highest improvement potential Give the fastest improving planner the needed time Continue until total time limit reached or no more domains can be improved

Randomized Iterative Search

Use any existing portfolio as initialization (e.g. uniform) Successors:

Commit to first successor improving score Run until score stagnates long enough

Portfolio Results

A u t

u n e 2 A u t

u n e 1 S t

e S

p 2 S t

e S

p 1 L A M A 2 1 1 S t

e S

p

1 U n i f

m S e l e c t

2 C l u s t e r

6 I T L

D

a i n w i s e R I S 160 180 200 220 240 Quality

Different timeouts

Uniform portfolio outperforms LAMA even in 3 min setting Other portfolios are even better Less planners in portfolio when less time is available No portfolio dominates others for all timeouts Cluster and Increasing Time Limit among best performers Randomized Iterative Search prone to overfitting

Outlook

Promising initial results for optimal configurations Adaptively select next configuration Use more heterogeneous planners Apply automatic portfolio diversification in other areas

Summary

Tuning for domains is effective Tuned planners yield very good results in portfolio