MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS
René Carmona
Department of Operations Research & Financial Engineering PACM Princeton University
T HE R EP . M INOR P LAYER B EST R ESPONSE ( CONT .) Representative - - PowerPoint PPT Presentation
M EAN F IELD G AMES WITH M AJOR AND M INOR P LAYERS Ren Carmona Department of Operations Research & Financial Engineering PACM Princeton University CEMRACS - Luminy, July 17, 2017 MFG WITH M AJOR AND M INOR P LAYERS S ET -U P R.C. - G.
Department of Operations Research & Financial Engineering PACM Princeton University
t
t , µt, α0 t )dt + σ0(t, X 0 t , µt, α0 t )dW 0 t
t , αt, α0 t )dt + σ(t, Xt, µt, X 0 t , αt, α0 t dWt,
0 f0(t, X 0 t , µt, α0 t )dt + g0(X 0 T, µT)
0 f(t, Xt, µN t , X 0 t , αt, α0 t )dt + g(XT, µT)
t = φ0(t, W 0 [0,T]),
[0,T], W[0,T]),
t , µt, α0 t )dt + g0(X 0 T, µT)
t
t , µt, α0 t )dt + σ0(t, X 0 t , µt, α0 t )dW 0 t
t , φ(t, W 0 [0,T], W[0,T]), α0 t )dt
t , φ(t, W 0 [0,T], W[0,T]), α0 t )dWt,
[0,t]) conditional distribution of Xt given W 0 [0,t].
α0
t =φ0(t,W 0 [0,T])
◮ a major player ◮ a field of minor players different from the representative minor player ◮ Major player uses strategy α0
t = φ0(t, W 0 [0,T])
◮ Representative of the field of minor players uses strategy
[0,T], W[0,T]).
t = b0(t, X 0 t , µt, φ0(t, W 0 [0,T]))dt + σ0(t, X 0 t , µt, φ0(t, W 0 [0,T]))dW 0 t
t , φ(t, W 0 [0,T], W[0,T]), φ0(t, W 0 [0,T]))dt
t , φ(t, W 0 [0,T], W[0,T]), φ0(t, W 0 [0,T]))dWt,
[0,t]) is the conditional distribution of Xt given W 0 [0,t].
[0,T], W[0,T]) to
t , µt, ¯
[0,T]))dt + g(X T, µt)
t , ¯
[0,T], W[0,T]), φ0(t, W 0 [0,T]))dt
t , ¯
[0,T], W[0,T]), φ0(t, W 0 [0,T]))dW t,
◮ Optimization problem NOT of McKean-Vlasov type. ◮ Classical optimal control problem with random coefficients
∗(φ0, φ) = arg
αt =φ(t,W 0
[0,T],W[0,T])
◮ Closed Loop Version
t = φ0(t, X 0 [0,T], µt),
[0,T]),
◮ Markovian Version
t = φ0(t, X 0 t , µt),
t ),
◮ Xt Agent output ◮ αt agent effort (control) ◮ νt distribution of output and effort (control) of agent
0 f(t, Xt, νt, αt)dt + UA(ξ)
◮ Given the choice of a contract ξ by the Principal
◮ Each agent in the field of exchangeable agents ◮ chooses an effort level αt ◮ meets his/her reservation price ◮ get the field of agents in a (MF) Nash equilibrium
◮ Principal chooses the contract to maximize his/her expected utility
t
t + B0α0 t + F0 ¯
t
t )dt + DdWt
t ], (F 0 t )t≥0 filtration generated by W0
t − H0 ¯
t − H0 ¯
t R0α0 t ]dt
t − H1 ¯
t − H1 ¯
t Rαt]dt
◮ Open Loop Version
◮ Optimization problems + fixed point =
◮ affine FBSDE solved by a large matrix Riccati equation
◮ Closed Loop Version
◮ Fixed point step more difficult ◮ Search limited to controls of the form
t = φ0(t, X 0 t , ¯
0(t) + φ0 1(t)X 0 t + φ0 2(t)¯
t , ¯
t + φ3(t)¯
◮ Optimization problems + fixed point =
◮ affine FBSDE solved by a large matrix Riccati equation
◮ V 0,N
t
◮ V i,N
t
◮ Linear dynamics
t
t dt + Σ0dW 0 t
t
tdt + ΣdW i t
◮ Minimization of Quadratic costs
t
t
t 2 + (1 − λ0 − λ1)α0 t 2
t
N
i=1 V i,N t
◮ deterministic function [0, T] ∋ t → νt ∈ Rd (leader’s free will) ◮ λ0 and λ1 are positive real numbers satisfying λ0 + λ1 ≤ 1
t
t
t
t 2 + (1 − l0 − l1)αi t2
0.0 0.5 1.0 −1.0 −0.5 0.0 0.5
x y
k0 = 0.80 k1 = 0.19 l0 = 0.19 l1 = 0.80 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5
x y
k0 = 0.80 k1 = 0.19 l0 = 0.80 l1 = 0.19
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x y
k0 = 0.19 k1 = 0.80 l0 = 0.19 l1 = 0.80 0.0 0.5 1.0 1.5 2.0 −0.5 0.0 0.5 1.0 1.5
x y
k0 = 0.19 k1 = 0.80 l0 = 0.80 l1 = 0.19
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1
N = 5
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1
N = 10
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1
N = 20
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1
N = 50
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1
N = 100
◮ N computers in a network (minor players) ◮ One hacker / attacker (major player) ◮ Action of major player affect minor player states (even when N >> 1) ◮ Major player feels only µN
t the empirical distribution of the minor players’ states
◮ protected & infected ◮ protected & sucseptible to be infected ◮ unprotected & infected ◮ unprotected & sucseptible to be infected
t
t
α∈A H(t, x, µ, h, α),
α∈A H(t, x, µ, h, α) = H(t, x, µ, h, ˆ
DI DS UI US DI
rec DS
inf + βDDµ({DI}) + βUDµ({UI})
UI
rec US
inf + βUUµ({UI}) + βDUµ({DI})
DI DS UI US DI
DS
inf + βDDµ({DI}) + βUDµ({UI})
UI
rec US
inf + βUUµ({UI}) + βDUµ({DI})
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US]
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Time evolution of the optimal feedback function φ(t)[DI] time t φ(t)[DI] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Time evolution of the optimal feedback function φ(t)[DS] time t φ(t)[DS] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Time evolution of the optimal feedback function φ(t)[UI] time t φ(t)[UI] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Time evolution of the optimal feedback function φ(t)[US] time t φ(t)[US]
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US]
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US] 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Time evolution of the state distribution µ(t)
time t µ(t) µ(t)[DI] µ(t)[DS] µ(t)[UI] µ(t)[US]
◮ Zero-Sum Game between Attacker and Network Manager ◮ Compute Expected Cost to Network for Protection
◮ Let the individual computer owners take care of their security ◮ Hope for a Nash Equilibrium ◮ Compute Expected Cost to Network for Protection
t , · · · , X N t ) state at time t, with X i t ∈ {e1, · · · , ed}
◮ Use distributed feedback controls, for state to be a continuous time Markov Chain ◮ Dynamics given by Q-matrices (qt(x, x′)t≥0,x,x′∈E ◮ Empirical measures
x = 1
N
d
◮ Cost Functionals
t , µN−1 X−i
t
t) dt + g(X i T , µN−1 X−i
T
t = φ(t, X i t )
φ
φ
N
N→∞ J(N) φ
N→∞
N
N→∞
N
t , µN Xt , φ(t, X i t )) dt + g(X i T , µN XT )
N→∞ E
Xt , φ(t, · )), µN Xt > dt+ < g( · , µN XT ), µN XT >
Xt converge toward a deterministic µt, the social cost becomes:
◮ φ is the optimal feedback function for a MFG equilibrium for which µ is the
◮ φ is the feedback (chosen by a central planner) minimizing the social cost SCφ(µ)
φ
t
t
t
φ∈˜ A
φ(t,µ,ϕ) t
i=1 piδei
◮ δv/δµ when v is defined on an open neighborhood of the probability simplex Sd. ◮ ∂v(t, µ)/∂µ({x′}) is the derivative of v with respect to the weight µ({x′}).
t
d−1
◮ Major Player = bond issuer
◮ Bond is Callable ◮ Major Player (issuer) chooses a stopping time to ◮ pay-off the investors ◮ stop coupon payments to the investors ◮ refinance his debt with better terms
◮ Minor Players = field of investors
◮ Bond is Convertible ◮ Each Minor Player (investor) chooses a stopping time at which to ◮ convert the bond certificate in a fixed number (conversion ratio) of
◮ if and when owning the stock is more profitable ◮ creating Dilution of the stock