OnlineOptimizationinX OnlineOptimizationinX ArmedBandits - - PowerPoint PPT Presentation

online optimization in x online optimization in x armed
SMART_READER_LITE
LIVE PREVIEW

OnlineOptimizationinX OnlineOptimizationinX ArmedBandits - - PowerPoint PPT Presentation

OnlineOptimizationinX OnlineOptimizationinX ArmedBandits ArmedBandits CS101.2 January20 th ,2009 PaperbyS.Bubeck,R.Munos,G.Stoltz,C. Szepersvri SlidesbyC.Chang


slide-1
SLIDE 1

OnlineOptimizationinX OnlineOptimizationinX

  • ArmedBandits

ArmedBandits

CS101.2 January20th,2009 PaperbyS.Bubeck,R.Munos,G.Stoltz,C. Szepersvári SlidesbyC.Chang

slide-2
SLIDE 2

ReviewofBandits ReviewofBandits

Startedwithk arms

  • Integral,finitedomainofarms
  • Generalidea:Keeptrackofaverageand

confidenceforeacharm

  • ExpectedregretusingUCB1 =O(logn)
slide-3
SLIDE 3

ReviewofBandits ReviewofBandits

Lastweek Banditarmsagainst“adversaries”

  • Oblivious

O(n2/3)

  • Adaptive

O(n3/4)

slide-4
SLIDE 4

ExtendingtheArms ExtendingtheArms

Whataboutinfinitelymanyarms? DrawarmsfromX =[0,1]D

  • Ddimensionalvectorofvaluesfrom0to1

Meanpayofffunction,f,mapsfromX

  • Noadversaries(fixedpayoffs)
slide-5
SLIDE 5

ExtendingtheArms ExtendingtheArms

Whatiftherearenorestrictionsonthe

shapeoff?

slide-6
SLIDE 6

ExtendingtheArms ExtendingtheArms

Whatiftherearenorestrictionsonthe

shapeoff?

Thenwedon’tknowanythingaboutarms

wehaven’tpulled

slide-7
SLIDE 7

ExtendingtheArms ExtendingtheArms

Whatiftherearenorestrictionsonthe

shapeoff?

Thenwedon’tknowanythingaboutarms

wehaven’tpulled

Withinfinitelymanyarms,thismeanswe

can’tdoanything!

slide-8
SLIDE 8

ExtendingtheArms ExtendingtheArms

Okay,sonocontinuityatallgoestoofar Generalizethemeanpayofffunction

functiontobe“prettysmooth”

Thatway,wecan(hopefully)get

informationaboutaneighborhoodof armsfromasinglepull

WewilluseLipschitzcontinuity

slide-9
SLIDE 9

LipschitzContinuity LipschitzContinuity

Intuitively,theslopeofthefunctionis

bounded

Thatis,itneverincreasesordecreases

fasterthanacertainrate

Thisseemslikeitcangiveusinformation

aboutanareawithasinglepull

slide-10
SLIDE 10

LipschitzContinuity LipschitzContinuity

Formaldefinition: Functionf(x) isLipschitzcontinuousif, Givenadissimilarityfunction,d(x,y), f(x)– f(y)≤ k× d(x,y) kistheLipschitzconstant

slide-11
SLIDE 11

LipschitzContinuity LipschitzContinuity

Forafunctionf withacertainconstantk,

wecallthefunctionkLipschitz

We’llassume1Lipschitz

  • Foranotherk,wecanjustadjustthepayoffs

tomakethefunction1Lipschitz

  • We’rereallyjustconcernedwithrelative

performanceversusotherstrategiesonthe samef

slide-12
SLIDE 12

LipschitzContinuity LipschitzContinuity

Functionwillstayinsidethegreencone

(GraphictakenwithpermissionfromWikipediaunder GNUFreeDocumentationLicense1.2)

slide-13
SLIDE 13

LipschitzFunctions LipschitzFunctions

ExamplesoffunctionsthatareLipschitz:

slide-14
SLIDE 14

LipschitzFunctions LipschitzFunctions

ExamplesoffunctionsthatareLipschitz:

  • f(x)=sin(x)
  • f(x)=|x|
  • f(x,y)=x+y
slide-15
SLIDE 15

LipschitzFunctions LipschitzFunctions

ExamplesoffunctionsthatareLipschitz:

  • f(x)=sin(x)
  • f(x)=|x|
  • f(x,y)=x+y

Andfunctionsthataren’t:

slide-16
SLIDE 16

LipschitzFunctions LipschitzFunctions

ExamplesoffunctionsthatareLipschitz:

  • f(x)=sin(x)
  • f(x)=|x|
  • f(x,y)=x+y

Andfunctionsthataren’t:

  • f(x)=x2
  • f(x)=x/(x– 3)
slide-17
SLIDE 17

Application Application

Whywouldweneedabanditarm

strategyfornonlinearmeanpayoff functions?

slide-18
SLIDE 18

Application Application

Oneexample:Modelingairflowovera

planewing

Aparametervectorisanarm Pullinganarmiscostly

  • Difficulttoactuallycalculate(computer

models,PDEs…)

Stillwanttomaximizesomekindofresult

acrossthearms

slide-19
SLIDE 19

DevelopinganAlgorithm DevelopinganAlgorithm

Okay,soit’suseful Whatkindofalgorithmshouldweuse? Random?

  • We’veseenhowwellthisworksout

Otherobviousapproachesareless

applicablewithinfinitelymanyarms…

slide-20
SLIDE 20

DevelopinganAlgorithm DevelopinganAlgorithm

WecanreusetheideasfromtheUCB1

algorithm p1 p2 p3 p4

slide-21
SLIDE 21

AdjustmentsNeeded AdjustmentsNeeded

Notdiscretearms,butacontinuum

  • WewillhaveneedaUCBforallarmsover

thearmspace

Wecangetsomeconfidenceaboutany

pulledarm’sneighborsbecauseof Lipschitz

slide-22
SLIDE 22

StumblingAround StumblingAround

Notdiscretearms,butacontinuum…

[0]xD [1]xD

slide-23
SLIDE 23

StumblingAround StumblingAround

Newpointsaffecttheirneighbors

[0]xD [1]xD

slide-24
SLIDE 24

AdjustmentsNeeded AdjustmentsNeeded

Wecanalsosharpenourestimatesfrom

nearbymeasurements

Retain“optimisminthefaceofthe

unknown”

Generalideagotten…buthowdowe

actuallydoit?

slide-25
SLIDE 25

TheAlgorithm! TheAlgorithm!

Splitthearmspaceintoregions Everytimeyoupickanarmfromaregion,

divideintomorepreciseregions

Keeptrackofhowgoodeveryregionis

throughresultsofitselfanditschildren.

slide-26
SLIDE 26

SetupfortheAlgorithm SetupfortheAlgorithm

Torememberregions,usea“Treeof

Coverings”

Anodeinthetreewithheighth androw

indexi isrepresentedasPh,i orjust(h,i)

  • ThechildrenofPh,i arePh+1,2i1 andPh+1,2i
  • ThewholearmspaceX=P0,1

Thechildrenofanodecovertheirparent

slide-27
SLIDE 27

SetupfortheAlgorithm SetupfortheAlgorithm

Wealwayschoosealeafnode,thenadd

itschildrentothetree.

Eachnodehasa“score” – wepickanew

leafbygoingdownthetree,goingtothe sidewiththegreaterscore.

Score:

Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]} whereUh,i(n)istheupperconfidence boundforthetreenode(h,i)

slide-28
SLIDE 28

SetupfortheAlgorithm SetupfortheAlgorithm

Onemorecaveat– Foranynode(h,i),

thediameter(determinedbyd,the dissimilarityfunction)ofthesmallest circlethatboundsthenodeislessthan ν1ρhforsomeparametersν,ρ

Alittlemoreformally,

Uh,i(n)=h,i(n)+Chernoff+ν1ρh (Chernoff=sqrt[(2lnn)/Nh,i(n)] )

slide-29
SLIDE 29

SetupfortheAlgorithm SetupfortheAlgorithm

Score:

Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]}

Whatifyouhavenochildren?

slide-30
SLIDE 30

SetupfortheAlgorithm SetupfortheAlgorithm

Score:

Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]}

Whatifyouhaven’tbeenpickedyet? Optimisminthefaceofuncertainty!

  • SetBtoinfinity
slide-31
SLIDE 31

AlgorithmExample AlgorithmExample

f

slide-32
SLIDE 32

AlgorithmExample AlgorithmExample

f

slide-33
SLIDE 33

AlgorithmExample AlgorithmExample

f

Y=0.5

slide-34
SLIDE 34

AlgorithmExample AlgorithmExample

f

Y=0.5

slide-35
SLIDE 35

AlgorithmExample AlgorithmExample

f

slide-36
SLIDE 36

Observations Observations

Explorationcomesfromthepessimismof

theBscoreandtheoptimismofthe unknown

Exploitationcomesfromtheoptimismof

theBscoreandfasteliminationofbad partsofthefunction

slide-37
SLIDE 37

NumericalResults NumericalResults

Thefollowingistakenfromanothertalk

bytheauthor,SébastienBubeck

slide-38
SLIDE 38

NumericalResults NumericalResults

slide-39
SLIDE 39

RegretAnalysis RegretAnalysis

Notgoingtogothroughallthemath

  • Ifwant,readthepaper...

PrettysimilartoregretanalysisofUCB1

  • Numberoftimesabadarmischosenis

proportionaltolog(n)andinverseto differencetobestarm

  • AddalotofmessfromtheLipschitzness
  • Actually,weonlyrequire“weakLipschitz”,

whichisasortofonesidedLipschitznearthe bestarms

slide-40
SLIDE 40

RegretAnalysis RegretAnalysis

Mainresult: E(Rn)≤ C(d')n(d'+1)/(d'+2)(lnn)1/(d'+2)

  • Cissomeconstant
  • d'isanynumbergreaterthand,andinmost

cases,canbeequaltod

slide-41
SLIDE 41

RegretAnalysis RegretAnalysis

E(Rn)≤ C(d')n(d'+1)/(d'+2)(lnn)1/(d'+2) Forhighd,wegetcloserandcloserto

linear...

  • "TheCurseofDimensionality"

Thisisproventobetight!Tight!

slide-42
SLIDE 42

DissimilarityFunctions DissimilarityFunctions

We’vejustbeenusingstraightdistance d canbeanymetric

  • d(x,y)=0iffx=y
  • dmustbesymmetric
  • Triangleinequality

Withcreativedissimilarityfunctions,this

issurprisinglypowerful!

slide-43
SLIDE 43

PowerfulDissimilarities PowerfulDissimilarities

Supposewegobacktotheexampleof

  • nlineads

Adssellallsortsofproducts(notquite

infinite,butstillmorethanwe’dwantto tryindividually!

Can’twegetinformationfromknowing

thatsomeadsarerelated?

slide-44
SLIDE 44

OnlineProductSales OnlineProductSales

slide-45
SLIDE 45

OnlineProductSales OnlineProductSales

Dissimilarityfunctionshouldmeasure

how,well,dissimilartwoadsare.

Cantakethetree,weighttheedgesas,

say,1/h,andcomputedistance

Cannowusethehierarchicalalgorithm! Newdissimilarityfunctionsaddalotof

mileage...