Planner Metrics Should Satisfy Independence of Irrelevant - - PowerPoint PPT Presentation

▶

Oct 25, 2022 127 likes •355 views

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland Independence of irrelevant alternatives (IIA) one of four criteria from Arrows impossibility theorem

SLIDE 1

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives

Jendrik Seipp July 12, 2019

University of Basel, Switzerland

SLIDE 2

Independence of irrelevant alternatives (IIA)

one of four criteria from Arrow’s impossibility theorem
decision whether A > B or A < B is irrelevant from C
important for planner metrics, but some violate it

1/9

SLIDE 3

Independence of irrelevant alternatives (IIA)

one of four criteria from Arrow’s impossibility theorem
decision whether A > B or A < B is irrelevant from C
important for planner metrics, but some violate it

1/9

SLIDE 4

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

total score: sum of task scores
Cost∗(π) is the cost of a reference plan
if reference plans are optimal, sat satisfies IIA
if reference plans can come from competitors,

sat does not satisfy IIA

2/9

SLIDE 5

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

total score: sum of task scores
Cost∗(π) is the cost of a reference plan
if reference plans are optimal, sat satisfies IIA
if reference plans can come from competitors,

sat does not satisfy IIA

2/9

SLIDE 6

IPC satisficing track

sat(P, π) =   

Cost∗(π) Cost(P,π)

if solved if unsolved

total score: sum of task scores
Cost∗(π) is the cost of a reference plan
if reference plans are optimal, sat satisfies IIA
if reference plans can come from competitors,

sat does not satisfy IIA

2/9

SLIDE 7

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C

1

2/5 2/4 2/5

2

1/4 1/5 1/1 0.65 0.7 1.4 B > A use optimal planners or domain-specific solvers to find good reference plans

3/9

SLIDE 8

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C π1 2/5 2/4 2/5 π2 1/4 1/5 1/1 ∑ 0.65 0.7 1.4 → B > A use optimal planners or domain-specific solvers to find good reference plans

3/9

SLIDE 9

IPC satisficing track – example

Cost R A B C π1 2 5 4 5 π2 6 4 5 1 sat A B π1 2/5 2/4 π2 4/4 4/5 ∑ 1.4 1.3 → A > B sat A B C π1 2/5 2/4 2/5 π2 1/4 1/5 1/1 ∑ 0.65 0.7 1.4 → B > A → use optimal planners or domain-specific solvers to find good reference plans

3/9

SLIDE 10

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

therwise

agl2018 P 1 if T P 1 1

T P 300

if 1 T P 300 if T P 300 use agl2018 in future agile tracks

4/9

SLIDE 11

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

therwise

agl2018 P 1 if T P 1 1

T P 300

if 1 T P 300 if T P 300 use agl2018 in future agile tracks

4/9

SLIDE 12

IPC agile track

T∗(π): mininum runtime of all participating planners agl2014(P, π) =    1/(1 + log10

T(P,π) T∗(π) )

if T(P, π) ≤ 300

therwise

agl2018(P, π) =        1 if T(P, π) < 1 1 − log(T(P,π))

log(300)

if 1 ≤ T(P, π) ≤ 300 if T(P, π) > 300 → use agl2018 in future agile tracks

4/9

SLIDE 13

Sparkle planning challenge

new planning competition in 2019
“analyse the contribution of each planner to the real

state of the art”

measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

therwise
focuses on coverage
uses runtime to break ties
removing which planner decreases coverage the most?

5/9

SLIDE 14

Sparkle planning challenge

new planning competition in 2019
“analyse the contribution of each planner to the real

state of the art”

measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

therwise
focuses on coverage
uses runtime to break ties
removing which planner decreases coverage the most?

5/9

SLIDE 15

Sparkle planning challenge

new planning competition in 2019
“analyse the contribution of each planner to the real

state of the art”

measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

therwise
focuses on coverage
uses runtime to break ties
removing which planner decreases coverage the most?

5/9

SLIDE 16

Sparkle planning challenge

new planning competition in 2019
“analyse the contribution of each planner to the real

state of the art”

measure marginal contribution of each planner P to a

portfolio selector over planners S sparkle(P, π) =    log10

par10(S\{P}) par10(S)

if par10(S \ {P}) > par10(S)

therwise
focuses on coverage
uses runtime to break ties
removing which planner decreases coverage the most?

5/9

SLIDE 17

Sparkle planning challenge – example

100 tasks
planner A solves 1 task π
planners B and C solve 99 tasks but fail to solve π
{A, B}

B > A

{A, B, C}

A > B

6/9

SLIDE 18

Sparkle planning challenge – example

100 tasks
planner A solves 1 task π
planners B and C solve 99 tasks but fail to solve π
{A, B} → B > A
{A, B, C} → A > B

6/9

SLIDE 19

Sparkle planning challenge – problems of the metric

penalizes similar planners
easily gameable: submit several “dummy” planners and
ne “real” planner (leader board, IPC planners available)
penalizes collaboration, favors closed-source planners
discourages submitting multiple planners

7/9

SLIDE 20

Sparkle planning challenge – suggestion

IIA: use fixed portfolio of baseline planners

8/9

SLIDE 21

Summary

IIA is critical for evaluation metrics
several planner metrics do not satisfy IIA
there are alternatives that do satisfy IIA