Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - - PowerPoint PPT Presentation

▶

Apr 16, 2023 176 likes •352 views

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for

SLIDE 1

Efficient Algorithms for Infinite-Armed Bandit

Arghya Roy Chaudhuri under the guidance of

Prof. Shivaram Kalyanakrishnan

Department of Computer Science and Engineering Indian Institute of Technology Bombay

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 2

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 3

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 4

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Arghya

Efficient Algorithms for Infinite-Armed Bandit

SLIDE 5

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Round 3

1

Arghya

Efficient Algorithms for Infinite-Armed Bandit

SLIDE 6

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Round 3

1

Round 4

1

Arghya

Efficient Algorithms for Infinite-Armed Bandit

SLIDE 7

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Round 3

1

Round 4

1

Round 5
Arghya

Efficient Algorithms for Infinite-Armed Bandit

SLIDE 8

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Round 3

1

Round 4

1

Round 5
Round 6
1
Arghya

Efficient Algorithms for Infinite-Armed Bandit

SLIDE 9

What is a Multi Armed Bandit ?

Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 1 Round 2

Round 3

1

Round 4

1

Round 5
Round 6
1
Objective: Output the arm with the highest expected reward with

high probability, while incurring a minimal number of samples

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 10

Key Principle: Confidence Bounds

0.9 1 0.89 0.5 0.45 0.57 0.6 0.7 0.65 0.1 0.13 0.7 0.69

ˆ p −

u ln 1 δ

Lower Confidence Bound(LCB)

≤ p ≤ ˆ p +

u ln 1 δ

Upper Confidence Bound(UCB)

w.p 1 − δ Approach: Track confidence bounds for each arm

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 11

Key Principle: Confidence Bounds

0.9 1 0.89 0.5 0.45 0.57 0.6 0.7 0.65 0.1 0.13 0.7 0.69

ˆ p −

u ln 1 δ

Lower Confidence Bound(LCB)

≤ p ≤ ˆ p +

u ln 1 δ

Upper Confidence Bound(UCB)

w.p 1 − δ Approach: Track confidence bounds for each arm Return an arm whose LCB exceeds UCB of all the other arms

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 12

Our Problem

What if the number of arms is too large?

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 13

Our Problem

What if the number of arms is too large? Problem Definition: Find an arm from an infinite set of arms whose expected reward is greater than (1 − ρ)th-quantile (for 0 < ρ < 1) of distribution of rewards over arms.

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 14

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 15

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 16

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348 20 0.122

Arghya Efficient Algorithms for Infinite-Armed Bandit

SLIDE 17

Key to our Approach Consider a biased coin with P(HEAD) = 0.1 and P(TAIL) = 0.9 Number of tosses P(no Head) 1 0.9 10 0.348 20 0.122 50 0.005 Applications: Large/continuous action spaces with discontinuous rewards

Arghya Efficient Algorithms for Infinite-Armed Bandit