Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - - PowerPoint PPT Presentation

efficient algorithms for infinite armed bandit
SMART_READER_LITE
LIVE PREVIEW

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - - PowerPoint PPT Presentation

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for


slide-1
SLIDE 1

Efficient Algorithms for Infinite-Armed Bandit

Arghya Roy Chaudhuri under the guidance of

  • Prof. Shivaram Kalyanakrishnan

Department of Computer Science and Engineering Indian Institute of Technology Bombay

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-2
SLIDE 2

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-3
SLIDE 3

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-4
SLIDE 4

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Arghya

Efficient Algorithms for Infinite-Armed Bandit

slide-5
SLIDE 5

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Round 3

1

  • Arghya

Efficient Algorithms for Infinite-Armed Bandit

slide-6
SLIDE 6

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Round 3

1

  • Round 4

1

  • Arghya

Efficient Algorithms for Infinite-Armed Bandit

slide-7
SLIDE 7

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Round 3

1

  • Round 4

1

  • Round 5
  • Arghya

Efficient Algorithms for Infinite-Armed Bandit

slide-8
SLIDE 8

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Round 3

1

  • Round 4

1

  • Round 5
  • Round 6
  • 1
  • Arghya

Efficient Algorithms for Infinite-Armed Bandit

slide-9
SLIDE 9

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

  • Round 3

1

  • Round 4

1

  • Round 5
  • Round 6
  • 1
  • Objective: Output the arm with the highest expected reward with

high probability, while incurring a minimal number of samples

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-10
SLIDE 10

Key Principle: Confidence Bounds

0.9 1 0.89 0.5 0.45 0.57 0.6 0.7 0.65 0.1 0.13 0.7 0.69

ˆ p −

  • 2

u ln 1 δ

  • Lower Confidence Bound(LCB)

≤ p ≤ ˆ p +

  • 2

u ln 1 δ

  • Upper Confidence Bound(UCB)

w.p 1 − δ Approach: Track confidence bounds for each arm

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-11
SLIDE 11

Key Principle: Confidence Bounds

0.9 1 0.89 0.5 0.45 0.57 0.6 0.7 0.65 0.1 0.13 0.7 0.69

ˆ p −

  • 2

u ln 1 δ

  • Lower Confidence Bound(LCB)

≤ p ≤ ˆ p +

  • 2

u ln 1 δ

  • Upper Confidence Bound(UCB)

w.p 1 − δ Approach: Track confidence bounds for each arm Return an arm whose LCB exceeds UCB of all the other arms

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-12
SLIDE 12

Our Problem

What if the number of arms is too large?

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-13
SLIDE 13

Our Problem

What if the number of arms is too large? Problem Definition: Find an arm from an infinite set of arms whose expected reward is greater than (1 − ρ)th-quantile (for 0 < ρ < 1) of distribution of rewards over arms.

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-14
SLIDE 14

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-15
SLIDE 15

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-16
SLIDE 16

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348 20 0.122

Arghya Efficient Algorithms for Infinite-Armed Bandit

slide-17
SLIDE 17

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348 20 0.122 50 0.005 Applications: Large/continuous action spaces with discontinuous rewards

Arghya Efficient Algorithms for Infinite-Armed Bandit