Operator approach to stochastic games with varying stage duration - PowerPoint PPT Presentation
Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach Table of contents
Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 26 January 2016, ADGO II, Santiago de Chile 1 G.Vigeral (with S. Sorin) Operator approach
Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 2 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 3 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Zero-sum stochastic game A zero-sum stochastic game Γ is a 5-tuple ( Ω , I , J , g , ρ ) where: Ω is the set of states. I (resp. J ) is the action set of Player 1 (resp. Player 2). g : I × J × Ω → [ − 1 , 1 ] is the payoff function (that Player 1 maximizes and Player 2 minimizes). ρ : I × J × Ω → ∆ ( Ω ) is the transition probability. 4 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games How the Game is played An initial state ω 1 is given, known by each player. At each stage k ∈ N : the players observe the current state ω k . According to the past history, Player 1 (resp. Player 2) chooses a mixed action x k in X = ∆ ( I ) (resp. y k in Y = ∆ ( J ) ). Done independently by each player. An action i k of Player 1 (resp. j k of Player 2) is drawn according to his mixed strategy x k (resp. y k ). This gives the payoff at stage k : g k = g ( i k , j k , ω k ) . A new state ω k + 1 is drawn according to ρ ( i k , j k , ω k ) . 5 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games The n -stage game For any stochastic game Γ , any finite horizon n ∈ N , and any starting state ω 1 , the n -stage game Γ n is the zero-sum game with payoff � � n ∑ E g k , k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ n ( ω 1 ) is denoted by V n ( ω 1 ) . Normalized value v n = V n n . 6 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games The discounted game For any stochastic game Γ , any discount factor λ ∈ ] 0 , 1 [ , and any starting state ω 1 , the discounted game Γ λ ( ω 1 ) is the zero-sum game with payoff � � + ∞ ( 1 − λ ) k − 1 g k ∑ , E k = 1 that Player 1 maximizes and Player 2 minimizes. The value of Γ λ ( ω 1 ) is denoted by W λ ( ω 1 ) . Normalized value w λ = λ v λ . 7 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Recursive structure Shapley (1953) proved that the values satisfy a recursive structure: � � V n ( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( V n − 1 ( · )) y ∈ Y sup inf x ∈ X � � W λ ( ω ) = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+( 1 − λ ) E ρ ( x , y , ω ) ( W λ ( · )) . y ∈ Y sup inf x ∈ X 8 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Shapley operator This can be summarized by: Ψ ( V n − 1 ) = Ψ n ( 0 ) = V n W λ = Ψ (( 1 − λ ) W λ ) �� ∞ � 1 − λ � � � 1 − λ = λ Ψ = λ Ψ · w λ w λ λ λ for some operator Ψ . � � Ψ ( f )( ω ) = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) sup inf y ∈ Y x ∈ X � � = g ( x , y , ω )+ E ρ ( x , y , ω ) ( f ( · )) y ∈ Y sup inf . x ∈ X Ψ is nonexpansive for the infinite norm � Ψ ( f ) − Ψ ( f ′ ) � ∞ ≤ � f − f ′ � ∞ . 9 G.Vigeral (with S. Sorin) Operator approach
Zero-sum stochastic games Framework This was proven by Shapley in the finite case but true in a very wide framework. For example if Ω finite, X and Y compact, g and ρ continuous. Ω , X and Y are compact metric, g and ρ continuous. See Maitra Partasarathy, Nowak, Mertens Sorin Zamir for more general frameworks. 10 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Table of contents Zero-sum stochastic games 1 Exact games with varying stage duration 2 Finite horizon Discounted evaluation Discretization of a continuous timed game 3 Conclusion and remarks 4 11 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Definition Definition due to Neyman (2013). Instead of playing at time 1 , 2 , ··· , n , ··· , players play at times t 1 , t 2 , ··· , t n , ··· The intensity of both payoff and transition at time t k is h k = t k + 1 − t k That is g h = hg and ρ h = ( 1 − h ) Id + h ρ . Shapley operator of "exact game" with duration h : Ψ h = ( 1 − h ) Id + h Ψ 12 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Some natural questions What happens, for a fixed horizon t or discount factor λ , 1 when the duration h i of each stage vanishes ? Does the value converge, to which limit ? What happens, for a fixed sequence of stage duration h i , 2 when the horizon goes to infinity or the discount factor goes to 0. Does the normalized value converge, to which limit ? What happens when both λ (or 1 n ) and h i go to 0 ? 3 What can be said of optimal strategies in games with 4 varying duration ? Neyman answers questions 1 3 4 for finite discounted games. Here we use the operator approach to give a general answer to 1 2 3. 13 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Game with finite horizon and varying duration Finite horizon t , finite sequence of stage duration h 1 , ··· , h n with ∑ h i = t . The value V of such a game satisfies V = z n with z i + 1 = Ψ h i ( z i ) = ( 1 − h i ) z i + h i Ψ ( z i ) z i + 1 − z i = − ( Id − Ψ )( z i ) h i Eulerian scheme associated to f ′ = − ( Id − Ψ )( f ) . One can use general results associated to such schemes, for any non expansive operator defined on a Banach space. 14 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Eulerian schemes in Banach spaces For general nonexpansive Ψ : Proposition (Miyadera-Oharu ‘70, Crandall-Liggett ‘71) h ( z 0 ) � ≤ � z 0 − Ψ ( z 0 ) � h √ n . � f nh ( z 0 ) − Ψ n Proposition (V. ’10) If z i + 1 = ( 1 − h i ) z i + h i Ψ ( z i ) , then � n ∑ h 2 � f t ( z 0 ) − x n � ≤ � z 0 − Ψ ( z 0 ) � i . i = 1 with t = ∑ n i = 1 h i . 15 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Result with t fixed Let h = max h i and t = ∑ h i , then √ � V − f ( t ) � ≤ K ht . Hence as the mesh h goes to 0, the value of the game goes to f ( t ) . f ( t ) can be interpreted as the value of a game played in continuous time (Neyman ’13). 16 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Finite horizon Asymptotic results For any h i , � V − f ( t ) � ≤ K √ t . t All the repeated games with varying stage duration have the same (normalized) asymptotic behavior. Same asymptotic behavior for the normalized value in continuous time f ( t ) and for the normalized value of the t original game v n . 17 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Game with discount factor and varying duration Discount factor λ = weight on the payoff on [ 0 , 1 ] compared to [ 0 , + ∞ ] . Infinite sequence of stage durations h 1 , ··· , h n , ··· . � � 1 − λ h When h is constant, normalized value w h λ = λ Ψ h . λ In general w is � � + ∞ D h i ∏ ( 0 ) λ i = 1 with � 1 − λ h � D h λ ( f ) = λ Ψ h . f λ 18 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Result with λ fixed and vanishing duration λ For a uniform duration h , w h λ = w µ with µ = 1 + λ − λ h . For any λ and h i ≤ h , the value w of the λ − discounted game with stage durations h i satisfies � w − ˆ w λ � ≤ Kh w λ : = w with ˆ 1 + λ . λ Hence as the mesh h goes to 0, the value of the game goes to w 1 + λ . Already known when the game is finite λ (Neyman 2013). w λ can be interpreted as the value of a game played in ˆ continuous time (Neyman ’13). 19 G.Vigeral (with S. Sorin) Operator approach
Exact games with varying stage duration Discounted evaluation Asymptotic results Assumption: there exists nondecreasing k : ] 0 , 1 ] → R + and √ ℓ : [ 0 , + ∞ ] → R + with k ( λ ) = o ( λ ) as λ goes to 0 and � D 1 λ ( z ) − D 1 µ ( z ) � ≤ k ( | λ − µ | ) ℓ ( � z � ) for all ( λ , µ ) ∈ ] 0 , 1 ] 2 and z ∈ Z . Always true for Shapley operators of games with bounded payoff. Then for any λ and h i , the value w of the λ − discounted game with stage durations h i satisfies � w − w λ � ≤ K λ . All the repeated games with varying stage duration have the same (normalized) asymptotic behavior as λ goes to 0. Same asymptotic behavior for the normalized value in continuous time ˆ w λ and for the normalized value of the original game w λ . 20 G.Vigeral (with S. Sorin) Operator approach
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.