Planner Metrics Should Satisfy Independence of Irrelevant - - PowerPoint PPT Presentation

planner metrics should satisfy independence of irrelevant
SMART_READER_LITE
LIVE PREVIEW

Planner Metrics Should Satisfy Independence of Irrelevant - - PowerPoint PPT Presentation

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland Independence of irrelevant alternatives (IIA) one of four criteria from Arrows impossibility theorem


slide-1
SLIDE 1

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives

Jendrik Seipp July 12, 2019

University of Basel, Switzerland

slide-2
SLIDE 2

Independence of irrelevant alternatives (IIA)

  • one of four criteria from Arrow’s impossibility theorem
  • decision whether A > B or A < B is irrelevant from C
  • important for planner metrics, but some violate it

1/9

slide-3
SLIDE 3

Independence of irrelevant alternatives (IIA)

  • one of four criteria from Arrow’s impossibility theorem
  • decision whether A > B or A < B is irrelevant from C
  • important for planner metrics, but some violate it

1/9

slide-4
SLIDE 4

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

  • total score: sum of task scores
  • Cost∗(π) is the cost of a reference plan
  • if reference plans are optimal, sat satisfies IIA
  • if reference plans can come from competitors,

sat does not satisfy IIA

2/9

slide-5
SLIDE 5

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

  • total score: sum of task scores
  • Cost∗(π) is the cost of a reference plan
  • if reference plans are optimal, sat satisfies IIA
  • if reference plans can come from competitors,

sat does not satisfy IIA

2/9

slide-6
SLIDE 6

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

  • total score: sum of task scores
  • Cost∗(π) is the cost of a reference plan
  • if reference plans are optimal, sat satisfies IIA
  • if reference plans can come from competitors,

sat does not satisfy IIA

2/9

slide-7
SLIDE 7

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C

1

2/5 2/4 2/5

2

1/4 1/5 1/1 0.65 0.7 1.4 B > A use optimal planners or domain-specific solvers to find good reference plans

3/9

slide-8
SLIDE 8

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C π1 2/5 2/4 2/5 π2 1/4 1/5 1/1 ∑ 0.65 0.7 1.4 → B > A use optimal planners or domain-specific solvers to find good reference plans

3/9

slide-9
SLIDE 9

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C π1 2/5 2/4 2/5 π2 1/4 1/5 1/1 ∑ 0.65 0.7 1.4 → B > A → use optimal planners or domain-specific solvers to find good reference plans

3/9

slide-10
SLIDE 10

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

  • therwise

agl2018 P 1 if T P 1 1

T P 300

if 1 T P 300 if T P 300 use agl2018 in future agile tracks

4/9

slide-11
SLIDE 11

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

  • therwise

agl2018 P 1 if T P 1 1

T P 300

if 1 T P 300 if T P 300 use agl2018 in future agile tracks

4/9

slide-12
SLIDE 12

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

  • therwise

agl2018(P, π) =        1 if T(P, π) < 1 1 − log(T(P,π))

log(300)

if 1 ≤ T(P, π) ≤ 300 if T(P, π) > 300 → use agl2018 in future agile tracks

4/9

slide-13
SLIDE 13

Sparkle planning challenge

  • new planning competition in 2019
  • “analyse the contribution of each planner to the real

state of the art”

  • measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

  • therwise
  • focuses on coverage
  • uses runtime to break ties
  • removing which planner decreases coverage the most?

5/9

slide-14
SLIDE 14

Sparkle planning challenge

  • new planning competition in 2019
  • “analyse the contribution of each planner to the real

state of the art”

  • measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

  • therwise
  • focuses on coverage
  • uses runtime to break ties
  • removing which planner decreases coverage the most?

5/9

slide-15
SLIDE 15

Sparkle planning challenge

  • new planning competition in 2019
  • “analyse the contribution of each planner to the real

state of the art”

  • measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

  • therwise
  • focuses on coverage
  • uses runtime to break ties
  • removing which planner decreases coverage the most?

5/9

slide-16
SLIDE 16

Sparkle planning challenge

  • new planning competition in 2019
  • “analyse the contribution of each planner to the real

state of the art”

  • measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

  • therwise
  • focuses on coverage
  • uses runtime to break ties
  • removing which planner decreases coverage the most?

5/9

slide-17
SLIDE 17

Sparkle planning challenge – example

  • 100 tasks
  • planner A solves 1 task π
  • planners B and C solve 99 tasks but fail to solve π
  • {A, B}

B > A

  • {A, B, C}

A > B

6/9

slide-18
SLIDE 18

Sparkle planning challenge – example

  • 100 tasks
  • planner A solves 1 task π
  • planners B and C solve 99 tasks but fail to solve π
  • {A, B} → B > A
  • {A, B, C} → A > B

6/9

slide-19
SLIDE 19

Sparkle planning challenge – problems of the metric

  • penalizes similar planners
  • easily gameable: submit several “dummy” planners and
  • ne “real” planner (leader board, IPC planners available)
  • penalizes collaboration, favors closed-source planners
  • discourages submitting multiple planners

7/9

slide-20
SLIDE 20

Sparkle planning challenge – suggestion

  • IIA: use fixed portfolio of baseline planners

8/9

slide-21
SLIDE 21

Summary

  • IIA is critical for evaluation metrics
  • several planner metrics do not satisfy IIA
  • there are alternatives that do satisfy IIA

9/9