Knowledge-Based Policies for Qualitative Decentralized POMDPs - - PowerPoint PPT Presentation

knowledge based policies for qualitative decentralized
SMART_READER_LITE
LIVE PREVIEW

Knowledge-Based Policies for Qualitative Decentralized POMDPs - - PowerPoint PPT Presentation

Knowledge-based programs Semantics Mathematical Properties Conclusion Knowledge-Based Policies for Qualitative Decentralized POMDPs Abdallah Saffidine Bruno Zanuttini Franc ois Schwarzentruber May 14th, 2019 1 / 42 Knowledge-based


slide-1
SLIDE 1

Knowledge-based programs Semantics Mathematical Properties Conclusion

Knowledge-Based Policies for Qualitative Decentralized POMDPs

Abdallah Saffidine Bruno Zanuttini Franc ¸ois Schwarzentruber May 14th, 2019

1 / 42

slide-2
SLIDE 2

Knowledge-based programs Semantics Mathematical Properties Conclusion

Automation of complex tasks

Building surveillance Nuclear decommissioning Intelligent farming

2 / 42

slide-3
SLIDE 3

Knowledge-based programs Semantics Mathematical Properties Conclusion

Multiple robots

more robust/efficient than

3 / 42

slide-4
SLIDE 4

Knowledge-based programs Semantics Mathematical Properties Conclusion

Multiple robots

more robust/efficient than

4 / 42

slide-5
SLIDE 5

Knowledge-based programs Semantics Mathematical Properties Conclusion

Multiple robots

more robust/efficient than

5 / 42

slide-6
SLIDE 6

Knowledge-based programs Semantics Mathematical Properties Conclusion

Multiple robots

more robust/efficient than Settings Cooperative agents; Common goal; Imperfect information; Decentralized execution.

6 / 42

slide-7
SLIDE 7

Knowledge-based programs Semantics Mathematical Properties Conclusion

Methodology

Model Goal Planning

a ’s program b ’s program c ’s program

7 / 42

slide-8
SLIDE 8

Knowledge-based programs Semantics Mathematical Properties Conclusion

Need: understandable system

Motivation Legal issues in case of failure Interaction with humans

8 / 42

slide-9
SLIDE 9

Knowledge-based programs Semantics Mathematical Properties Conclusion

Our contribution: use of knowledge-based programs

KBP for agent a

listenRadio

if a knows strike

toStation

else

toAirport

KBP for agent b

readNewsPaper

if b knows strike

toStation

else

toAirport

Operational Semantics for Knowledge-based programs; Succinctness; (Un)decidability/complexity. Extends: [Lang, Zanuttini, ECAI2012, TARK2013]

9 / 42

slide-10
SLIDE 10

Knowledge-based programs Semantics Mathematical Properties Conclusion

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties

4

Conclusion

10 / 42

slide-11
SLIDE 11

Knowledge-based programs Semantics Mathematical Properties Conclusion

Program constructions

Language constructions

turn left stay broadcast temperature

...; ...

if ϕ then ...else ... while ϕ do ... Example (knowledge-based program for agent a) if a knows ( door 12 is locked and justobserved( )) then

turn left broadcast temperature

else

stay

11 / 42

slide-12
SLIDE 12

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Operational semantics of KBPs

3

Mathematical Properties

4

Conclusion

12 / 42

slide-13
SLIDE 13

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Operational semantics of KBPs

3

Mathematical Properties

4

Conclusion

13 / 42

slide-14
SLIDE 14

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

QdecPOMDP

Qualitative decentralized Partially Observable Markov Decision Processes = Concurrent game structures with observations.

Transitions of the form: state1 state2 a:

stay

b:

turn left

a: b: A non-empty set of possible initial states; A set of goal states.

14 / 42

slide-15
SLIDE 15

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

States

Typically, a state describes: positions of agents; battery levels; etc.

15 / 42

slide-16
SLIDE 16

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Outline

1

Knowledge-based programs

2

Semantics Models: QdecPOMDP Operational semantics of KBPs

3

Mathematical Properties

4

Conclusion

16 / 42

slide-17
SLIDE 17

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Operational semantics

  • ne step of computation
  • f KBPs in the QdecPOMDP

Epistemic structure Higher-order knowledge about: the current state of the QdecPOMDP; the current program counters in KBPs.

17 / 42

slide-18
SLIDE 18

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Assumptions

Common knowledge of: the QdecPOMDP; the KBPs; synchrony of the system;

tests last 0 unit of time; actions last 1 unit of time.

KBP for agent a

listenRadio

if a knows strike

toStation

else

toAirport

KBP for agent b

readNewsPaper

if b knows strike

toStation

else

toAirport

18 / 42

slide-19
SLIDE 19

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Epistemic structures at time T: worlds

listenRadio

if Kastrike then

toStation

else

toAirport

Worlds = consistent

(wait few slides)

histories of the form s0−

pc0 −

− →

  • bs1s1−

pc1

. . . − − →

  • bsTsT−

pcT where

− − →

  • bst

vector of observations at time t st state at time t

− →

pct vector of program counters at time t

19 / 42

slide-20
SLIDE 20

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Epistemic structures at time t: indistinguishability relations

Agent a confuses two histories iff she has received the same

  • bservations.

s0−

pc0 −

− →

  • bs1s1−

pc1

. . . − − →

  • bsTsT−

pcT

∼a

s′0−

pc′0 −

− →

  • bs′1s′1−

pc′1 . . . −

− →

  • bs′Ts′T−

pc′T iff for all t ∈ {1, . . . , T},

− − →

  • bst

a = −

− →

  • bs′t

a

20 / 42

slide-21
SLIDE 21

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Program counters

Definition (Program counter) (guard, action just executed, continuation)

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • 21 / 42
slide-22
SLIDE 22

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Control-flow graph

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • 22 / 42
slide-23
SLIDE 23

Knowledge-based programs Semantics Mathematical Properties Conclusion Models: QdecPOMDP Operational semantics of KBPs

Consistent histories (explained with one agent)

In the QdecPOMDP: s0

listenRadio ,

− − − − − − − − − − − − − → s1

s1

toStation ,

− − − − − − − − − − − → s2

KBP control-flow graph

listenRadio

if Kastrike then

toStation

else

toAirport

(⊤, start , )

  • ⊤, listenRadio ,
  • Kastrike, toStation ,
  • ¬Kastrike, toAirport ,
  • s0 (⊤, start ,

)

s1

⊤, listenRadio ,

  • |=Kastrike

s2 Kastrike, toStation ,

  • 23 / 42
slide-24
SLIDE 24

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties Verification Execution Problem Succinctness

4

Conclusion

24 / 42

slide-25
SLIDE 25

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties Verification Execution Problem Succinctness

4

Conclusion

25 / 42

slide-26
SLIDE 26

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem

Input: A QdecPOMDP model (given in STRIPS-like symbolic form); Knowledge-based programs for each agent; Output: yes if all executions of the KBPs lead to a goal state.

26 / 42

slide-27
SLIDE 27

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

27 / 42

slide-28
SLIDE 28

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

28 / 42

slide-29
SLIDE 29

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

29 / 42

slide-30
SLIDE 30

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem for while-free KBPs

Theorem The verification problem for while-free KBPs is PSPACE-complete. Proof idea. Upper bound: on-the-fly model checking; Lower bound: reduction from TQBF .

agent 1 value of p1 agent 2 value of p2 agent 3 value of p3

30 / 42

slide-31
SLIDE 31

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Verification problem for general KBPs

Theorem The verification problem for general KBPs is undecidable. Proof idea. Reduction from the halting problem of a Turing machine on input ǫ.

. . .

31 / 42

slide-32
SLIDE 32

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties Verification Execution Problem Succinctness

4

Conclusion

32 / 42

slide-33
SLIDE 33

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Execution Problem

Input: an agent a; a QdecPOMDP model; policies (e.g. KBPs), one for each agent; a local view of the history for agent a. Output: the action act agent a should take.

33 / 42

slide-34
SLIDE 34

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Execution Problem (decision problem)

Input: an agent a; a QdecPOMDP model; policies (e.g. KBPs), one for each agent; a local view of the history for agent a; an action act. Output: yes, if the next action of agent a is act; no otherwise.

34 / 42

slide-35
SLIDE 35

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Reactive policy representation

Definition (reactive policy representation) A class of policy representations is reactive iff its corresponding execution problem is in P . Example (Tree policies are reactive policy representation) if justobserved( ) then turn left else stay Unless P = PSPACE, KBPs are not reactive. Indeed: Proposition The execution problem for KBPs is PSPACE-complete.

35 / 42

slide-36
SLIDE 36

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties Verification Execution Problem Succinctness

4

Conclusion

36 / 42

slide-37
SLIDE 37

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Modal depth

Modal depth = number of nested ‘... knows ’ operators. Formulas Modal depths justobserved( ) a knows p 1 a knows (b knows p) 2

37 / 42

slide-38
SLIDE 38

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Succinctness

Theorem (

[Lang, Zanuttini, 2012] for d = 1; [AAAI2018], for d > 1)

Let d ≥ 1. There is a poly(n)-size QdecPOMDP family (Mn,d)n∈N for which:

1

there is a d-modal depth poly(n)-size valid KBP family;

2

no (d − 1)-modal depth valid KBP family;

3

assuming NP P/ poly, for any reactive policy representations, no poly(n)-size valid policy family.

38 / 42

slide-39
SLIDE 39

Knowledge-based programs Semantics Mathematical Properties Conclusion Verification Execution Problem Succinctness

Succinctness

Theorem (

[Lang, Zanuttini, 2012] for d = 1; [AAAI2018], for d > 1)

Let d ≥ 1. There is a poly(n)-size QdecPOMDP family (Mn,d)n∈N for which:

1

there is a d-modal depth poly(n)-size valid KBP family;

2

no (d − 1)-modal depth valid KBP family;

3

assuming NP P/ poly, for any reactive policy representations, no poly(n)-size valid policy family. Proof idea. Mn,d : run a poly(n)-time protocol revealing a poly(n)-size 3-CNF β;

β satisfiable iff a d-md non d − 1-md expressible epistemic property holds.

39 / 42

slide-40
SLIDE 40

Knowledge-based programs Semantics Mathematical Properties Conclusion

Outline

1

Knowledge-based programs

2

Semantics

3

Mathematical Properties

4

Conclusion

40 / 42

slide-41
SLIDE 41

Knowledge-based programs Semantics Mathematical Properties Conclusion

Conclusion

Model Goal Planning

a ’s KBP b ’s KBP c ’s KBP a ’s reactive policy b ’s reactive policy c ’s reactive policy

Higher-order knowledge... for get explanable policies (e.g. making cooperation visible) for concise programs

41 / 42

slide-42
SLIDE 42

Knowledge-based programs Semantics Mathematical Properties Conclusion

Perspectives

Efficient implementation of the verification/execution problems; Heuristics for the planning problem; More tractable fragments; decPOMDP (with probabilities); Temporal properties; Strategic reasoning;

42 / 42