Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 - - PowerPoint PPT Presentation

learning portfolios of automatically tuned planners
SMART_READER_LITE
LIVE PREVIEW

Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 - - PowerPoint PPT Presentation

Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2 1 Albert-Ludwigs-Universit at Freiburg, Germany 2 Universit at Basel, Switzerland June 2012 IPC 2011 Sequential


slide-1
SLIDE 1

Learning Portfolios of Automatically Tuned Planners

Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2

1Albert-Ludwigs-Universit¨

at Freiburg, Germany

2Universit¨

at Basel, Switzerland

June 2012

slide-2
SLIDE 2

IPC 2011 – Sequential Satisficing Track

Results

A u t

  • t

u n e 2 A u t

  • t

u n e 1 S t

  • n

e S

  • u

p 2 S t

  • n

e S

  • u

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

slide-3
SLIDE 3

IPC 2011 – Sequential Satisficing Track

Results

A u t

  • t

u n e 2 A u t

  • t

u n e 1 S t

  • n

e S

  • u

p 2 S t

  • n

e S

  • u

p 1 L A M A 2 1 1 160 180 200 220 240 Quality

slide-4
SLIDE 4

Motivation

Tuned planners:

Tune for complete benchmark set Commit to single planner

Portfolio planners:

Manually select planners Calculate times greedily

Our approach:

Tune one planner for each domain in training set automatically Evaluate multiple portfolio generation methods

slide-5
SLIDE 5

Overview

Domain Tuning Portfolio Learning

slide-6
SLIDE 6

Domain Tuning

slide-7
SLIDE 7

Tuning Procedure – Domains

Training set of 21 former IPC domains (1998–2006) Tune Fast Downward with ParamILS for each domain

slide-8
SLIDE 8

Tuning Procedure – Configurations

Heuristics: hFF, hadd, hcg, hcea, hLM Searches: eager, lazy Type of landmarks, cost-handling, preferred operators Numerous combination options and conditional parameters → 2.99 · 1013 configurations

slide-9
SLIDE 9

Tuning Results – Trends

Preferred operators (19/21) Lazy search (20x), eager search (1x) Most configurations use one (10x) or two (9x) heuristics hFF (12x), hLM (11x), hcg (6x), hcea (4x), hadd (1x)

slide-10
SLIDE 10

Tuning Results

coverage Planners

  • ptical-t

pathways pipes-t tpp . . . Domains

  • ptical-t (48)

21 3 . . . pathways (30) 22 30 29 30 . . . pipes-t (50) 26 39 42 38 . . . tpp (30) 24 30 30 30 . . . . . . . . . . . . . . . . . . . . .

slide-11
SLIDE 11

Portfolio Learning

slide-12
SLIDE 12

Portfolio Generators

Input: planners, results on training set, total time limit Output: {depot: 18s, gripper: 65s, . . . }

slide-13
SLIDE 13

Stone Soup

Hill-climbing in the portfolio space Start: {depot: 0, gripper: 0, . . . } Successors: {depot: g, gripper: 0, . . . }, {depot: 0, gripper: g, . . . }, . . . Choose best and repeat

slide-14
SLIDE 14

Uniform

Run all planners for same amout of time Result: {depot: 85, gripper: 85, . . . }

slide-15
SLIDE 15

Selector

Brute force For all subset sizes {1, . . . , 21} compute best portfolio with equal time shares

slide-16
SLIDE 16

Cluster

Find k clusters with k-means Cluster by quality From each cluster choose best planner Give all planners equal time shares

slide-17
SLIDE 17

Increasing Time Limit

Iteratively increase the portfolio time limit Get problems that can be solved in that limit Find best planner for these problems Give it the needed time Repeat until no more problems solvable or time limit exceeded

slide-18
SLIDE 18

Domain-wise

Iteratively retrieve domain with highest improvement potential Give the fastest improving planner the needed time Continue until total time limit reached or no more domains can be improved

slide-19
SLIDE 19

Randomized Iterative Search

Use any existing portfolio as initialization (e.g. uniform) Successors:

Swap time slice between planners Collect time from all planners and give it to single one

Commit to first successor improving score Run until score stagnates long enough

slide-20
SLIDE 20

Portfolio Results

30 minutes

A u t

  • t

u n e 2 A u t

  • t

u n e 1 S t

  • n

e S

  • u

p 2 S t

  • n

e S

  • u

p 1 L A M A 2 1 1 S t

  • n

e S

  • u

p

  • 1

1 U n i f

  • r

m S e l e c t

  • r
  • 1

2 C l u s t e r

  • 1

6 I T L

  • 1

D

  • m

a i n w i s e R I S 160 180 200 220 240 Quality

slide-21
SLIDE 21

Different timeouts

1, 3, 5, 15 minutes

Uniform portfolio outperforms LAMA even in 3 min setting Other portfolios are even better Less planners in portfolio when less time is available No portfolio dominates others for all timeouts Cluster and Increasing Time Limit among best performers Randomized Iterative Search prone to overfitting

slide-22
SLIDE 22

Outlook

Promising initial results for optimal configurations Adaptively select next configuration Use more heterogeneous planners Apply automatic portfolio diversification in other areas

slide-23
SLIDE 23

Summary

Tuning for domains is effective Tuned planners yield very good results in portfolio