Learning and Efficiency in Games (with Dynamic Population) va - - PowerPoint PPT Presentation

โ–ถ
learning and efficiency in games
SMART_READER_LITE
LIVE PREVIEW

Learning and Efficiency in Games (with Dynamic Population) va - - PowerPoint PPT Presentation

Learning and Efficiency in Games (with Dynamic Population) va Tardos Cornell Joint work with Thodoris Lykouris and Vasilis Syrgkanis Large population games: traffic routing Traffic subject to congestion delays cars and packets follow


slide-1
SLIDE 1

Learning and Efficiency in Games

(with Dynamic Population)

ร‰va Tardos

Cornell

Joint work with Thodoris Lykouris and Vasilis Syrgkanis

slide-2
SLIDE 2

Large population games: traffic routing

  • Traffic subject to congestion delays
  • cars and packets follow shortest path
  • Congestion game =cost (delay) depends only on congestion on edges
slide-3
SLIDE 3

Example 2: advertising auctions

  • Advertisers leave and join the system
  • Changes in system setup
  • Advertiser values change

3

advertising auctions

$

$

$

slide-4
SLIDE 4

Questions + Motivation

  • Repeated game: How do players behave?
  • Nash equilibrium?
  • Today: Machine Learning
  • With players (or player objectives) changing over time
  • Efficiency loss due to selfish behavior of players (Price of

Anarchy)

slide-5
SLIDE 5

A B C D y/100 x/100 1 hour 1 hour 0 min

Traffic Pattern (optimal)

Time: 1.5 hours

delay

slide-6
SLIDE 6

A B C D y/100 x/100 1 hour 1 hour 0 min

Not Nash equilibrium!

Time: 1.5 hours

Nash: Stable solution: no incentive to deviate

slide-7
SLIDE 7

A B C D y/100 1 hour 1 hour 0min 100 x/100

Nash equilibrium

Time: 2 hours

Nash: Stable solution: no incentive to deviate But how did the players find it?

slide-8
SLIDE 8

Congestion game in Social Science

Kleinberg-Oren STOCโ€™11

projects Which project should I try?

  • Each project j has reward ๐‘‘

๐‘˜

  • Each player has a probability ๐‘ž๐‘—๐‘˜ for solving
  • Fair credit: equally shared by discoverers

Uniform players and fair sharing= congestion game Unfair sharing and/or different abilities: Vetta utility game ???

slide-9
SLIDE 9

Nash as Selfish Outcome ?

  • Can the players find Nash?
  • Which Nash?

Daskalakis-Goldberg-Papadimitrouโ€™06 Nash exists, but โ€ฆ. Finding Nash is

  • PPAD hard in many games
  • Coordination problem (multiple Nash)
slide-10
SLIDE 10

Repeated games

time

a11 a21 an1

โ€ฆ Outcome for ( a11, a21, โ€ฆ, an1) Outcome for ( a1t, a2t, โ€ฆ, ant)

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

a1t a2t ant

โ€ฆ

  • Assume same game each period
  • Playerโ€™s value/cost additive over periods
slide-11
SLIDE 11

Learning outcome

time

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

a1t a2t ant

โ€ฆ

Maybe here they donโ€™t know how to play, who are the other players, โ€ฆ

By here they have a better ideaโ€ฆ

slide-12
SLIDE 12

Nash equilibrium

time

Nash equilibrium: Stable actions a with no regret for any alternate strategy ๐‘ฆ: ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘ฆ, ๐‘โˆ’๐‘— โ‰ฅ ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘)

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ

a1 a2 an

โ€ฆ No regret

slide-13
SLIDE 13

No-regret without stability: learning

time

For any fixed action ๐‘ฆ (with d options) : ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ) ๐‘ข ๐‘ข

Regret: Ri(x,T)= ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โˆ’ ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ) ๐‘ข ๐‘ข

Many simple rules ensure Ri(x,T) approx. ~ ๐‘ˆ๐‘š๐‘๐‘• ๐‘’ for all x MWU (Hedge), Regret Matching, etc.

a1t a2t ant

โ€ฆ

โ‰ค ๐‘(๐‘ˆ) No-regret

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-14
SLIDE 14

No-regret without stability: learning

time

For any fixed action ๐‘ฆ (with d options) : ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ) ๐‘ข ๐‘ข

Regret: Ri(x,T)= ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โˆ’ (1 + ๐œ—) ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ) ๐‘ข ๐‘ข

Many simple rules ensure Ri(x,T) approx. ~๐‘ƒ(log ๐‘’/๐œ—) for all x MWU (Hedge), Regret Matching, etc. Foster, Li, Lykouris, Sridharan, Tโ€™16

a1t a2t ant

โ€ฆ

โ‰ค ๐‘(๐‘ˆ) Approx. no-regret

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-15
SLIDE 15

Dynamics of rock-paper-scissor

R P S R

  • 9
  • 9

1

  • 1
  • 1

1

P

  • 1

1

  • 9
  • 9

1

  • 1

S

1

  • 1
  • 1

1

  • 9
  • 9
  • Doesnโ€™t converge
  • correlates on shared history

Rock Scissor Paper Nash:

1 3 1 3 1 3

Learning dynamic

Payoffs/utility

slide-16
SLIDE 16

Main Question

  • Efficiency loss due to selfish behavior of players (Price of Anarchy)
  • In repeated game settings
  • With players (or player objectives) changing over time

Examples

16

internet routing advertising auctions

  • Advertisers leave and join the system
  • Advertiser values change

$

$

$

  • Traffic changes over time
slide-17
SLIDE 17

Result: routing, limit for very small users

Theorem (Roughgarden-Tโ€™02): In any network with continuous, non-decreasing cost functions and small users

cost of Nash with rates ri for all i cost of opt with rates 2ri for all i

๏‚ฃ

Nash equilibrium: stable solution where no player had incentive to deviate. cost of worst Nash equilibrium โ€œsocially optimumโ€ cost Price of Anarchy=

slide-18
SLIDE 18

Quality of Learning outcomes: Price of Total Anarchy

Bounds average welfare assuming no-regret learners [Blum, Hajiaghayi, Ligett, Roth, 2008]

18

1 ๐‘ˆ ๐‘‘๐‘๐‘ก๐‘ข(๐‘๐‘ข)

๐‘ˆ ๐‘ข=1

โ€œsocially optimumโ€ cost Price of Total Anarchy= lim

๐‘ˆโ†’โˆž

slide-19
SLIDE 19

Result 2: routing with learning players

Theorem (Blum, Even-Dar, Ligettโ€™06; Roughgardenโ€™09): Price of anarchy bounds developed for Nash equilibria extend to no- regret learning outcomes

time

Assumes a stable set of participants

a1t a2t ant

โ€ฆ

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-20
SLIDE 20

Today: Dynamic Population

Classical model:

  • Game is repeated identically and nothing changes

Dynamic population model: At each step t each player i is replaced with an arbitrary new player with probability p

In a population of N players, each step, Np players replaced in expectation

20

slide-21
SLIDE 21

Learning players can adaptโ€ฆ.

Goal:

Bound average welfare assuming adaptive no-regret learners ๐‘„๐‘๐ต = lim

๐‘ˆโ†’โˆž

๐‘‘๐‘๐‘ก๐‘ข(๐‘๐‘ข, ๐‘ค๐‘ข)

๐‘ˆ ๐‘ข=1

๐‘ƒ๐‘ž๐‘ข(๐‘ค๐‘ข)

๐‘ˆ ๐‘ข=1

where ๐‘ค๐‘ข is the vector of player types at time t even when the rate of change is high, i.e. a large fraction can turn over at every step.

21

slide-22
SLIDE 22

Need for adaptive learning

Example routing

  • Strategy = path
  • Best โ€œfixedโ€ strategy in hindsight very weak in

changing environment

  • Learners can adapt to the changing

environment time

22

a1t a2t ant

โ€ฆ

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-23
SLIDE 23

Need for adaptive learning

Example 2: matching (project selection)

  • Strategy = choose a project
  • Best โ€œfixedโ€ strategy in hindsight very weak in

changing environment

  • Learners can adapt to the changing

environment

23

time

a1t a2t ant

โ€ฆ

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

projects

slide-24
SLIDE 24

Adaptive Learning

  • Adaptive regret [Hazan-Seshadiriโ€™07, Luo-Schapireโ€™15, Blum-Mansourโ€™07, Lehrerโ€™03]

for all player i, strategy x and interval [๐œ1, ๐œ2] ๐‘†๐‘— ๐‘ฆ, ๐œ1, ๐œ2 = ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข; ๐‘ค๐‘ข โˆ’ ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ; ๐‘ค๐‘ข ๐œ2 ๐‘ข=๐œ1

โ‰ค ๐‘ ๐œ2 โˆ’ ๐œ1 rates of ~ ๐œ2 โˆ’ ๐œ1 ๏ƒž Regret with respect to a strategy that changes k times โ‰ค ~ ๐‘™๐‘ˆ

24

time

๐œ1 ๐œ2

a1t a2t ant

โ€ฆ

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-25
SLIDE 25

Adaptive Learning

  • Adaptive regret [Foster,Li,Lykouris,Sridharan,Tโ€™16]

for all player i, strategy x and interval [๐œ1, ๐œ2] ๐‘†๐‘— ๐‘ฆ, ๐œ1, ๐œ2 = ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข; ๐‘ค๐‘ข โˆ’ 1 + ๐œ— ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ; ๐‘ค๐‘ข ๐œ2 ๐‘ข=๐œ1

โ‰ค ๐‘ƒ(k log ๐‘’/๐œ—) Regret with respect to a strategy that changes k times Using any of MWU (Hedge), Regret Matching, etc. mixed with a bit of โ€œforgettingโ€

25

time

๐œ1 ๐œ2

a1t a2t ant

โ€ฆ

a11 a21 an1

โ€ฆ

a12 a22 an2

โ€ฆ

a13 a23 an3

โ€ฆ

slide-26
SLIDE 26

Result (Lykouris, Syrgkanis, Tโ€™16) :

Bound average welfare close to Price of Anarchy for Nash even when the rate of change is high, ๐’’ โ‰ˆ

๐Ÿ ๐ฆ๐ฉ๐ก ๐’ with n players

assuming adaptive no-regret learners

  • Worst case change of player type ๏ƒž need for adapting to changing

environment

  • Sudden large change is unlikely

26

slide-27
SLIDE 27

No-regret and Price of Anarchy

Low regret:

๐‘†๐‘— ๐‘ฆ = ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข; ๐‘ค๐‘ข โˆ’ ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘ฆ, ๐‘โˆ’๐‘—

๐‘ข ; ๐‘ค๐‘ข ๐‘ˆ ๐‘ข=1

โ‰ค ๐‘ ๐‘ˆ

Best action varies with choices of othersโ€ฆ Consider Optimal Solution Let x=๐‘๐‘—

โˆ— be the choice in OPT

No regret for all players i:

๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐’ƒ๐’‹

โˆ—, ๐‘โˆ’๐‘—) ๐‘ข ๐‘ข

Players donโ€™t have to know ๐’ƒ๐’‹

โˆ—

27

projects

slide-28
SLIDE 28

Proof Technique: Smoothness (Roughgardenโ€™09)

Consider optimal solution: player i does action ๐‘๐‘—

โˆ— in optimum

No regret: ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘๐‘—

โˆ—, ๐‘โˆ’๐‘— ๐‘ข ) ๐‘ข ๐‘ข

(doesnโ€™t need to know ๐‘๐‘—

โˆ—)

A game is (ฮป,ฮผ)-smooth (ฮป > 0; ฮผ< 1): if for all strategy vectors a ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘๐‘—

โˆ—, ๐‘โˆ’๐‘— ๐‘—

) โ‰ค ๐œ‡ ๐‘ƒ๐‘„๐‘ˆ + ๐œˆ ๐‘‘๐‘๐‘ก๐‘ข(๐‘) A Nash equilibrium a has ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘ โ‰ค

๐‘—

cost(a) โ‰ค

๐œ‡ 1โˆ’๐œˆOpt

slide-29
SLIDE 29

Smoothness and no-regret learning

Consider optimal solution: player i does action ๐‘๐‘—

โˆ— in optimum

No regret: ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘๐‘—

โˆ—, ๐‘โˆ’๐‘— ๐‘ข ) ๐‘ข ๐‘ข

(doesnโ€™t need to know ๐‘๐‘—

โˆ—)

A cost minimization game is (ฮป,ฮผ)-smooth (ฮป > 0; ฮผ< 1): if for all strategy vectors a ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘๐‘—

โˆ—, ๐‘๐‘— ๐‘ข ๐‘—

) โ‰ค ๐œ‡ ๐‘ƒ๐‘„๐‘ˆ + ๐œˆ ๐‘‘๐‘๐‘ก๐‘ข(๐‘๐‘ข) A no-regret sequence ๐‘๐‘ข has and hence

1 ๐‘ˆ ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข โ‰ค

๐‘— ๐‘ข

1 ๐‘ˆ ๐‘‘๐‘๐‘ก๐‘ข(๐‘๐‘ข) ๐‘ข

โ‰ค

๐œ‡ 1โˆ’๐œˆOpt

๏€ 

1 ๐‘ˆ๏“t

1 ๐‘ˆ๏“t

slide-30
SLIDE 30

Smoothness Example:

Credit allocation Monotone uti๐‘š๐‘— =expected credit: game is (1,1)-smooth: ๐‘๐‘—

โˆ— (Opt) with ๏€ข action vector a

๐‘ฃ๐‘ข๐‘—๐‘š๐‘—(๐‘๐‘—โˆ—, ๐‘โˆ’๐‘—

๐‘—

) โ‰ฅ ๐‘ƒ๐‘„๐‘ˆ โˆ’ ๐‘ฃ๐‘ข๐‘—๐‘š๐‘—(๐‘)

๐‘—

Note: ๐‘ฃ๐‘ข๐‘—๐‘š๐‘— ๐‘

๐‘—

is total value of successful projects = ๐‘‘

๐‘˜ ๐‘˜:๐‘ก๐‘ฃ๐‘‘๐‘“๐‘“๐‘’๐‘ก

True project by project: ๐‘™๐‘˜ and ๐‘™๐‘˜

โˆ— the number of players choosing project j

in a and OPT. If ๐‘™๐‘˜ โ‰ฅ ๐‘™๐‘˜

โˆ— then right hand side is non-positive

Else: players benefit more than in OPT from trying their opt project

slide-31
SLIDE 31

Examples of โ€œsmoothness boundsโ€

  • Monotone increasing congestion costs (1,1) smooth

๏ƒž Nash cost โ‰ค opt of double traffic rate (Roughgarden-Tโ€™02)

  • affine congestion cost are (1, ยผ) smooth (Roughgarden-Tโ€™02)

๏ƒž 4/3 price of anarchy

  • Atomic game (players with >0 traffic) with linear delay (5/3,1/3)-

smooth (Awerbuch-Azar-Epstein & Christodoulou-Koutsoupiasโ€™05) ๏ƒž 2.5 price of anarchy Resulting bounds are tight

slide-32
SLIDE 32

Smoothness in utility games

  • Vetta utility games are (1,1)-smooth Vetta FOCSโ€™02
  • First price is (1-1/e)-smooth (we have seen ยฝ, see also Hassidim, Kaplan,

Mansour, Nisan ECโ€™11)

  • All pay auction ยฝ-smooth
  • First position auction (GFP) is ยฝ-smooth
  • Variants with second price (see also Christodoulou, Kovacs, Schapira ICALPโ€™08)

Other applications include:

  • public goods
  • Fair sharing (Kelly, Johari-Tsitsiklis)
  • Walrasian Mechanism (Babaioff, Lucier, Nisan, and Paes Leme ECโ€™13)
slide-33
SLIDE 33

Adapting smoothness to dynamic populations

Inequality we โ€œwish to haveโ€ ๐‘‘๐‘๐‘ก๐‘ข๐‘— ๐‘๐‘ข; ๐‘ค๐‘ข โ‰ค ๐‘‘๐‘๐‘ก๐‘ข๐‘—(๐‘๐‘—

โˆ—๐‘ข, ๐‘โˆ’๐‘— ๐‘ข ; ๐‘ค๐‘ข) ๐‘ข ๐‘ข

where ๐‘๐‘—

โˆ—๐‘ข is the optimum strategy for the players at time t.

with stable population = no regret for ๐‘๐‘—

โˆ—

Too much to hope for in dynamic case:

  • sequence ๐‘โˆ—๐‘ข of optimal solutions changes too much.
  • No hope of learners not to regret this!
slide-34
SLIDE 34

Change in Optimum Solution

True optimum is too sensitive

  • Example using matching
  • The optimum solution
  • One person leaving
  • Can change the solution for everyone
  • Np changes each step ๏‚ฎ No time to

learn!! (we have p>>1/N)

slide-35
SLIDE 35

Theorem (high level)

If a game satisfies a โ€œsmoothness propertyโ€ [Roughgardenโ€™09] The welfare optimization problem admits an approximation algorithm whose

  • utcome ๐‘โˆ—

is stable to changes in one playerโ€™s type Then any adaptive learning outcome is approximately efficient even when the rate

  • f change is high.

Proof idea: use this approximate solution as ๐’ƒโˆ— in Price of Anarchy proof With ๐’ƒโˆ— not changing much, learners have time to learn not to regret following ๐’ƒโˆ— Note: learner doesnโ€™t have to know ๐’ƒโˆ— !!

35

slide-36
SLIDE 36

Do Stable Solutions Exist?

  • How close can we remain to the optimum, while being stable?
  • How much change can we manage, while being stable?

Recall: Regret of adaptive learning is bounded by โ‰ค ๐‘™๐‘ˆ with respect to any strategy that changes k times

slide-37
SLIDE 37

Stable ๏‚ป Optimum in Matching

True optimum is too sensitive

  • Use greedy allocation: assign large values first

(loss of factor of 2)

  • Use coarse approximation of value, e.g.,

power of 2 only

  • Potential function argument:

increase in log value of allocation only m log ๐‘ค๐‘›๐‘๐‘ฆ , decrease due to departures

slide-38
SLIDE 38

Use Differential Privacy ๏‚ฎ Stable Solutions

Joint privacy [Kearns et al. โ€™14, Dwork et al. โ€˜06] A randomized algorithm is jointly differentially private if

  • when input from player i changes
  • the probability of change in solution of players other than i is

smaller than ๐‘

  • Turn a sequence of randomized solutions to a randomized

sequence with small number of changes using Coupling Lemma

  • and handling โ€œfailure probabilitiesโ€ of private algorithms

38

slide-39
SLIDE 39

Application 1: Large Congestion Games

  • Using joint differentially private algorithm of Rogers et al ECโ€™15,
  • the (5/3,1/3)-smoothness congestion with affine cost:
  • Theorem. Atomic congestion game with m edges, and affine and

increasing costs: 1 ๐‘ˆ ๐ท๐‘๐‘ก๐‘ข ๐‘๐‘ข; ๐‘ค๐‘ข

๐‘ข

โ‰ค 2.5 1 + ๐œ— 1 ๐‘ˆ OPT ๐‘ค๐‘ข

๐‘ข

with ๐‘ž = ๐‘ƒ

๐‘ž๐‘๐‘š๐‘ง(๐œ—) ๐‘ž๐‘๐‘š๐‘ง(๐‘›) ๐‘ž๐‘๐‘š๐‘ง๐‘š๐‘๐‘• ๐‘œ

if each player controls only a 1/n fraction of the total flow. Almost a constant fraction of change each step: dependence on number of players only polylog

39

slide-40
SLIDE 40

Other Applications

Using joint differentially private algorithm of Hsu et al โ€™14 Theorem 2. Matching markets if values are [๐œ,1]

1 ๐‘ˆ ๐‘‹ ๐‘๐‘ข; ๐‘ค๐‘ข ๐‘ข

โ‰ฅ

1 4 1+๐œ— 1 ๐‘ˆ OPT ๐‘ค๐‘ข ๐‘ข

with ๐‘ž = ๐‘ƒ

๐œ2๐œ—2 ๐‘ž๐‘๐‘š๐‘ง๐‘š๐‘๐‘• ๐‘›,1/๐œ,1/๐œ—

Theorem 3. Large Combinatorial Markets with Gross-Substitutes

1 ๐‘ˆ ๐‘‹ ๐‘๐‘ข; ๐‘ค๐‘ข ๐‘ข

โ‰ฅ

1 2 1+๐œ— 1 ๐‘ˆ OPT ๐‘ค๐‘ข ๐‘ข

with ๐‘ž = ๐‘ƒ

๐œ5๐œ—5 ๐‘› ๐‘ž๐‘๐‘š๐‘ง๐‘š๐‘๐‘• ๐‘œ

Each item in large supply ฮฉ ๐‘ž๐‘๐‘š๐‘ง๐‘š๐‘๐‘• ๐‘œ log (

1 ๐œ— , 1 ๐œ) and ฮ˜ ๐‘œ items

40

slide-41
SLIDE 41

Do players really learn?

  • Data from Microsoft: 9 frequent bid changing advertisers

Value of advertiser?

  • Nekipelov, Syrgkanis, Tโ€™15: infer the value smallest multiplicative

regret

41

slide-42
SLIDE 42

Distribution of smallest rationalizable multiplicative regret

42

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.00 0.04 0.07 0.11 0.15 0.18 0.22 0.26 0.29 0.33 0.37 0.41 0.44 0.48 0.52 0.55 0.59 0.63 0.66 0.70 0.74 0.77 0.81 0.85 More Frequency Multiplicative Regret Frequency Cumulative %

๐

slide-43
SLIDE 43

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.00 0.04 0.07 0.11 0.15 0.18 0.22 0.26 0.29 0.33 0.37 0.41 0.44 0.48 0.52 0.55 0.59 0.63 0.66 0.70 0.74 0.77 0.81 0.85 More Frequency Multiplicative Regret Frequency Cumulative %

Distribution of smallest rationalizable multiplicative regret

43

Maybe converged to best response Strictly positive regret: learning phase ๐

slide-44
SLIDE 44

Conclusions

Learning in games:

  • Good way to adapt to opponents
  • No need for common prior
  • Takes advantage of opponent playing badly.

Learning players do well even in dynamic environments

  • Stable approx. solution + good PoA bound ๏ƒž good efficiency with

dynamic population

  • Strong connection of stable solutions with differential privacy

44