[PPT] - Lea Learn rning ing to to Bi Bid d Wi With thout out Kn PowerPoint Presentation

SLIDE 1

Lea Learn rning ing to to Bi Bid d Wi With thout

ut

Kn Knowin wing g yo your ur Va Valu lue

Zhe Feng, Harvard Joint work with Chara Podimata (Harvard) and Vasilis Syrgkanis (MSR)

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 1

SLIDE 2

Wa Warm rm-up up

19th ACM Conference on Economics and Computation, EC’18 6/21/2018

Auction theory & Mechanism Design

Auction

vi bi (ai, pi)

Utility to buyer i: ui = aivi − pi

2

SLIDE 3

Motiva tivation tion

Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 3

SLIDE 4

Motiva tivation tion

Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 4

SLIDE 5

Motiva tivation tion

Key assumption in Auction Theory & Mechanism Design

Small markets; Bidders have time to prepare to bid (market research) Digital economy: online advertisement auctions; No time to prepare to bid (market research)

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 5

SLIDE 6

How to design a bidding strategy for the learner in online advertisement auctions when he/she doesn’t know the value before submitting the bid.

Main ain que uest stion ion

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 6

SLIDE 7

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 7

Advertiser (Learner) bids Platform (Auctioneer)

SLIDE 8

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 8

Advertiser (Learner) bids Platform (Auctioneer) Generates 𝑦𝑢(⋅), 𝑞𝑢(⋅)

SLIDE 9

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 9

Advertiser (Learner) bids Platform (Auctioneer) Generates 𝑦𝑢(⋅), 𝑞𝑢(⋅) Clicked by users Generates value 𝑤𝑢

SLIDE 10

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 10

Advertiser (Learner) bids Platform (Auctioneer) Generates 𝑦𝑢(⋅), 𝑞𝑢(⋅) Clicked by users Generates value 𝑤𝑢 Observes (estimated) 𝑦𝑢(⋅), 𝑞𝑢(⋅)

SLIDE 11

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 11

Advertiser (Learner) bids Platform (Auctioneer) Generates 𝑦𝑢(⋅), 𝑞𝑢(⋅) Observes (estimated) 𝑦𝑢(⋅), 𝑞𝑢(⋅)

SLIDE 12

Sp Sponsored nsored Se Search arch Example xample

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 12

Advertiser (Learner) bids Platform (Auctioneer) Generates 𝑦𝑢(⋅), 𝑞𝑢(⋅) Clicked by users Generates value 𝑤𝑢 Expected utility 𝑣𝑢(𝑐) = (𝑤𝑢−𝑞𝑢 𝑐 ) ⋅ 𝑦𝑢(𝑐) Reward 𝑤𝑢 − 𝑞𝑢(⋅) Observes (estimated) 𝑦𝑢(⋅), 𝑞𝑢(⋅)

SLIDE 13

Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns

At each day 𝒖:
Designer and competitors choose allocation rule, 𝒚𝒖(⋅); payment rule, 𝒒𝒖(⋅)
Learner submits 𝒄𝒖 ∈ 𝑪 (finite set)
The learner wins item with probability 𝒚𝒖(𝐜𝐮)
At the end, observes 𝒚𝒖(⋅), 𝒒𝒖(⋅)
If the learner wins, observes 𝒘𝒖

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 13

SLIDE 14

Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns

At each day 𝒖:
Designer and competitors choose allocation rule, 𝒚𝒖(⋅); payment rule, 𝒒𝒖(⋅)
Learner submits 𝒄𝒖 ∈ 𝑪 (finite set)
The learner wins item with probability 𝒚𝒖(𝐜𝐮)
At the end, observes 𝒚𝒖(⋅), 𝒒𝒖(⋅)
If the learner wins, observes 𝒘𝒖
Expected utility function: 𝒗𝒖 𝒄 = 𝒘𝒖 − 𝒒𝒖 𝒄

⋅ 𝒚𝒖(𝒄)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 14

SLIDE 15

Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns

At each day 𝒖:
Designer and competitors choose allocation rule, 𝒚𝒖(⋅); payment rule, 𝒒𝒖(⋅)
Learner submits 𝒄𝒖 ∈ 𝑪 (finite set)
The learner wins item with probability 𝒚𝒖(𝐜𝐮)
At the end, observes 𝒚𝒖(⋅), 𝒒𝒖(⋅)
If the learner wins, observes 𝒘𝒖
Expected utility function: 𝒗𝒖 𝒄 = 𝒘𝒖 − 𝒒𝒖 𝒄

⋅ 𝒚𝒖(𝒄)

Goal: minimize expected regret

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 15

𝑺 𝑼 = 𝐭𝐯𝐪

𝒄∗ 𝔽 ෍ 𝒖=𝟐 𝑼

𝒗𝒖(𝒄∗) − 𝔽 ෍

𝒖=𝟐 𝑼

𝒗𝒖(𝒄𝒖)

Utility with best fixed bid in hindsight Utility with bids generated by algorithm

SLIDE 16

Mul ulti ti-Arme Armed d Ban andit dit (MAB AB)

At each round 𝒖 = 𝟐, ⋯ , 𝑼

Adversary chooses reward vector 𝒔𝒖 = (𝒔𝟐,𝒖, ⋯ , 𝒔𝑳,𝒖)
Learner chooses an action 𝒋𝒖 ∈ 𝑪
Learner gets reward 𝒔𝒋𝒖,𝒖 and only observes 𝒔𝒋𝒖,𝒖

EXP3 achieves regret 𝑷 𝑼|𝑪|

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 16

SLIDE 17

Formal rmal mai ain n que uest stion ion

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 17

Can we design an online learning algorithm for the learner to achieve better regret than generic MAB?

SLIDE 18

Ou Our Re r Resu sults: lts: WI WIN-EXP EXP al algorithm

rithm

Utilize partial feedback information from the auctions. Partial feedback: between bandit feedback and full information feedback Recall: EXP3 achieves 𝑷( 𝑼|𝑪|)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 18

Theorem 1. WIN-EXP algorithm achieves regret at most 𝟓 𝑼 𝐦𝐩𝐡|𝑪|

SLIDE 19

Rel elated ated Wo Work rk

No regret learning in GT/MD

From auctioneer side: [Blum et. al, 04], [Amin et. al, 05], [Amin et. al, 06], [Cesa-Bianchi et.al, 15], … From bidder side: [Dikkala & Tardos, 13], [Balseiro & Gur, 17], [Weed et. al, 16]

Learning with partial feedback

Contextual Bandit: [Bubeck & Cesa-Bianchi, 12] [Agarwal et. al, 14]… Feedback graphs: [Alon et. al, 13], [Alon et. al, 15]

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 19

SLIDE 20

Technical Parts

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 20

SLIDE 21

The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback

At each day 𝒖:

Learner chooses an action 𝒄𝒖 ∈ 𝑪.

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 21

SLIDE 22

The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback

At each day 𝒖:

Learner chooses an action 𝒄𝒖 ∈ 𝑪.
The adversary chooses a reward function 𝒔𝒖: 𝑪 → [−𝟐, 𝟐]

and allocation function 𝒚𝒖(⋅).

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 22

SLIDE 23

The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback

At each day 𝒖:

Learner chooses an action 𝒄𝒖 ∈ 𝑪.
The adversary chooses a reward function 𝒔𝒖: 𝑪 → [−𝟐, 𝟐]

and allocation function 𝒚𝒖(⋅).

The learner wins reward 𝒔𝒖(𝒄𝒖) with probability of 𝒚𝒖(𝒄𝒖)

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 23

SLIDE 24

The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback

At each day 𝒖:

Learner chooses an action 𝒄𝒖 ∈ 𝑪.
The adversary chooses a reward function 𝒔𝒖: 𝑪 → [−𝟐, 𝟐]

and allocation function 𝒚𝒖(⋅).

The learner wins reward 𝒔𝒖(𝒄𝒖) with probability of 𝒚𝒖(𝒄𝒖)
Feedback: always learns the allocation rule 𝒚𝒖; if she wins,

also learns 𝒔𝒖(⋅)

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 24

SLIDE 25

WI WIN-EXP EXP Alg lgorithm

rithm For

r Wi Win-Only Only Fee eedback dback

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 25

At each round 𝒖:

Draw a bid 𝒄𝒖 ∼ 𝝆𝒖

SLIDE 26

WI WIN-EXP EXP Alg lgorithm

rithm For

r Wi Win-Only Only Fee eedback dback

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 26

At each round 𝒖:

Draw a bid 𝒄𝒖 ∼ 𝝆𝒖
Observe allocation rule 𝒚𝒖; if wins, observe 𝒔𝒖(⋅)

SLIDE 27

WI WIN-EXP EXP Alg lgorithm

rithm For

r Wi Win-Only Only Fee eedback dback

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 27

At each round 𝒖:

Draw a bid 𝒄𝒖 ∼ 𝝆𝒖
Observe allocation rule 𝒚𝒖; if wins, observe 𝒔𝒖(⋅)
Compute the unbiased estimator of 𝒗𝒖 𝒄 − 𝟐

෥ 𝒗𝒖 𝒄 = (𝒔𝒖 𝒄 −𝟐) ⋅ 𝒚𝒖 (𝒄) σ𝒄 𝝆𝒖 𝒄 𝒚𝒖(𝒄) , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐱𝐣𝐨𝐭 − 𝟐 − 𝒚𝒖 𝒄 𝟐 − σ𝒄 𝝆𝒖 𝒄 𝒚𝒖 𝒄 , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐞𝐩𝐟𝐭𝐨′𝐮 𝐱𝐣𝐨

SLIDE 28

WI WIN-EXP EXP Alg lgorithm

rithm For

r Wi Win-Only Only Fee eedback dback

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 28

At each round 𝒖:

Draw a bid 𝒄𝒖 ∼ 𝝆𝒖
Observe allocation rule 𝒚𝒖; if wins, observe 𝒔𝒖(⋅)
Compute the unbiased estimator of 𝒗𝒖 𝒄 − 𝟐

෥ 𝒗𝒖 𝒄 = (𝒔𝒖 𝒄 −𝟐) ⋅ 𝒚𝒖 (𝒄) σ𝒄 𝝆𝒖 𝒄 𝒚𝒖(𝒄) , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐱𝐣𝐨𝐭 − 𝟐 − 𝒚𝒖 𝒄 𝟐 − σ𝒄 𝝆𝒖 𝒄 𝒚𝒖 𝒄 , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐞𝐩𝐟𝐭𝐨′𝐮 𝐱𝐣𝐨

Update: 𝝆𝒖+𝟐 𝒄 ∝ 𝝆𝒖 𝒄 ⋅ 𝐟𝐲𝐪(𝜽 ⋅ ෦

𝒗𝒖 𝒄 )

SLIDE 29

Pro roof

f Sk

Sket etch ch of T f The heorem rem 1. 1.

1. The regret w.r.t 𝒗𝒖(𝒄) is equal to the regret w.r.t 𝒗𝒖 𝒄 − 𝟐

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 29

SLIDE 30

Pro roof

f Sk

Sket etch ch of T f The heorem rem 1. 1.

1. The regret w.r.t 𝒗𝒖(𝒄) is equal to the regret w.r.t 𝒗𝒖 𝒄 − 𝟐

2. ෥ 𝒗𝒖(𝒄) is the unbiased estimator of 𝒗𝒖 𝒄 − 𝟐 [Lemma 1] 𝑺 𝑼 ≤

𝜽 𝟑 σ𝒖=𝟐 𝑼

σ𝒄∈𝑪 𝝆𝒖 𝒄 ⋅ 𝔽 ෦ 𝒗𝒖 𝒄 𝟑 +

𝟐 𝜽 𝒎𝒑𝒉(|𝑪|)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 30

SLIDE 31

Pro roof

f Sk

Sket etch ch of T f The heorem rem 1. 1.

1. The regret w.r.t 𝒗𝒖(𝒄) is equal to the regret w.r.t 𝒗𝒖 𝒄 − 𝟐

2. ෥ 𝒗𝒖(𝒄) is the unbiased estimator of 𝒗𝒖 𝒄 − 𝟐 [Lemma 1] 𝑺 𝑼 ≤

𝜽 𝟑 σ𝒖=𝟐 𝑼

σ𝒄∈𝑪 𝝆𝒖 𝒄 ⋅ 𝔽 ෦ 𝒗𝒖 𝒄 𝟑 +

𝟐 𝜽 𝒎𝒑𝒉(|𝑪|)

3. Variance of the estimator:

෍

𝒄∈𝑪

𝝆𝒖 𝒄 ⋅ 𝔽 ෦ 𝒗𝒖 𝒄 𝟑 ≤ 𝟔 (𝟐) Q.E.D Note: in EXP3, (1) grows as # of actions.

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 31

SLIDE 32

Beyond Binary outcomes: a set of outcomes 𝑷

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 32

Ext xtension ension 1: 1: Ou Outc tcome

me-based

based Fee eedback dback

SLIDE 33

Beyond Binary outcomes: a set of outcomes 𝑷

Reward function 𝒔𝒖: 𝑪 × 𝑷 → −𝟐, 𝟐 and allocation 𝒚𝒖: 𝑪 → 𝚬(𝐏)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 33

Ext xtension ension 1: 1: Ou Outc tcome

me-based

based Fee eedback dback

SLIDE 34

Beyond Binary outcomes: a set of outcomes 𝑷

Reward function 𝒔𝒖: 𝑪 × 𝑷 → −𝟐, 𝟐 and allocation 𝒚𝒖: 𝑪 → 𝚬(𝐏)
𝒑𝒖 is chosen based on distribution 𝒚𝒖(𝒄𝒖) and learner wins reward

𝒔𝒖(𝒄𝒖, 𝒑𝒖).

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 34

Ext xtension ension 1: 1: Ou Outc tcome

me-based

based Fee eedback dback

SLIDE 35

Beyond Binary outcomes: a set of outcomes 𝑷

Reward function 𝒔𝒖: 𝑪 × 𝑷 → −𝟐, 𝟐 and allocation 𝒚𝒖: 𝑪 → 𝚬(𝐏)
𝒑𝒖 is chosen based on distribution 𝒚𝒖(𝒄𝒖) and learner wins reward

𝒔𝒖(𝒄𝒖, 𝒑𝒖).

Feedback: the learner observes 𝒚𝒖 and 𝒔𝒖(⋅, 𝒑𝒖)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 35

Ext xtension ension 1: 1: Ou Outc tcome

me-based

based Fee eedback dback

SLIDE 36

Ext xtension ension 1: 1: Ou Outc tcome

me-based

based Fee eedback dback

Beyond Binary outcomes: a set of outcomes 𝑷

Reward function 𝒔𝒖: 𝑪 × 𝑷 → −𝟐, 𝟐 and allocation 𝒚𝒖: 𝑪 → 𝚬(𝐏)
𝒑𝒖 is chosen based on distribution 𝒚𝒖(𝒄𝒖) and learner wins reward

𝒔𝒖(𝒄𝒖, 𝒑𝒖).

Feedback, the learner observes 𝒚𝒖 and 𝒔𝒖(⋅, 𝒑𝒖)

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 36

Theorem 2. WIN-EXP algorithm with Outcome-based feedback achieves regret at most 2 𝟑𝑼|𝑷|𝐦𝐩𝐡|𝑪|

SLIDE 37

Application plication 1: 1: Ou Outc tcome

me-based

based fe feedback edback

Binary Outcome

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 37

SLIDE 38

Application plication 1: Ou Outc tcome

me-based

based fe feed edback back

Binary Outcome

Second-price auction
𝑷 = {win, not win}
Recover [Weed et. al, 16] result by choosing discretization

appropriately

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 38

SLIDE 39

Application plication 1: 1: Ou Outc tcome

me-based

based fe feedback edback

Binary Outcome

Second-price auction
𝑷 = {win, not win}
Recover [Weed et. al, 16] result by choosing discretization

appropriately

Value-per-click auction
𝑷 = {get clicked, not clicked}

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 39

SLIDE 40

Application plication 1: 1: Ou Outc tcome

me-based

based fe feedback edback

Binary Outcome

Second-price auction
𝑷 = {win, not win}
Recover [Weed et. al, 16] result by choosing discretization

appropriately

Value-per-click auction
𝑷 = {get clicked, not clicked}

Non-Binary Outcome

Unit-demand 𝑳-items auctions
𝑷 = {1, 2,…, K+1}, where outcome 𝑳 + 𝟐 is associated with not

getting item

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 40

SLIDE 41

Ext xtension ension 2: 2: Cont ntinu inuous us ac actio tion n sp space aces s

Piecewise-Lipschitz rewards

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 41

Theorem 3 (Regret of WIN-EXP algorithm in the continuous action space with 𝚬𝒑-Piecewise 𝑴-Lipschitz Average Utilities). WIN-EXP algorithm achieves regret at most 2 𝟑𝒆𝑼|𝑷|𝐦𝐩𝐡(𝐧𝐛𝐲{

𝟐 𝚬𝐩 , 𝑴𝑼})+1

Δ𝑝: length of minimum interval

SLIDE 42

Application plication 2: 2: Cont ntinu inuous

us ac

action tion sp spaces aces

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 42

Second-price auctions
𝚬𝐩 is the smallest difference between highest other bids at any two

iterations 𝒖 and 𝒖′

𝑴 = 𝟏

SLIDE 43

Application plication 2: 2: Cont ntinu inuous

us ac

action tion sp spaces aces

Second-price auctions
First-price and All-pay auctions
𝚬𝐩 is the smallest difference between highest bids at any two

iterations 𝒖 and 𝒖′

𝑴 = 𝟐

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 43

SLIDE 44

Application plication 2: 2: Cont ntinu inuous

us ac

action tion sp spaces aces

Second-price auctions
First-price and All-pay auctions
Weighted GSP auction
Each bidder is assigned a score 𝒕𝒋 ∈ [𝟏, 𝟐] (drawn by auctioneer)
Allocating with decreasing order of score-weighted bids 𝒕𝒋 ⋅ 𝒄𝒋
If bidder wins slot 𝒍, charge

𝝇𝒍+𝟐 𝒕𝒍 , where 𝝇𝒍+𝟐 is the score-weighted

bid of bidder wins slot 𝒍 + 𝟐

Utility is Lipschitz if score is generated from distribution with

Lipschitz CDF

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 44

SLIDE 45

Si Simu mulation lations

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 45

Set up:

Weighted GSP auctions; 𝒘𝒋 ∈ 𝟏, 𝟐 , randomly draw 20 bidders, 3 slots Consider three behaviors for other bidders (opponents): Stochastic, EXP3, WIN-EXP

SLIDE 46

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 46

Different discretization of bidding space

SLIDE 47

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 47

Robust to Noisy CTR Estimates: 𝑶 𝟏,

𝟐 𝒏

Stochastic adversaries

SLIDE 48

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 48

Robust to Noisy CTR Estimates: 𝑶 𝟏,

𝟐 𝒏

EXP3 adversaries

SLIDE 49

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 49

Robust to Noisy CTR Estimates: 𝑶 𝟏,

𝟐 𝒏

WIN-EXP adversaries

SLIDE 50

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 50

Robust to CTR/payment estimates w/ regression

SLIDE 51

Conc nclusion lusion

Design an online learning algorithm (WIN-EXP) for bidding in

the repeated auctions without knowing your value

Utilize partial feedback to achieve better regret than generic

MAB algorithm

Applications to a lot of auction settings
Robust experimental performance

6/21/2018 19th ACM Conference on Economics and Computation, EC’18 51

SLIDE 52

Th Thanks for anks for yo your ur at attention! tention!

19th ACM Conference on Economics and Computation, EC’18 6/21/2018 52