SLIDE 1 OnlineOptimizationinX OnlineOptimizationinX
ArmedBandits
CS101.2 January20th,2009 PaperbyS.Bubeck,R.Munos,G.Stoltz,C. Szepersvári SlidesbyC.Chang
SLIDE 2 ReviewofBandits ReviewofBandits
Startedwithk arms
- Integral,finitedomainofarms
- Generalidea:Keeptrackofaverageand
confidenceforeacharm
- ExpectedregretusingUCB1 =O(logn)
SLIDE 3 ReviewofBandits ReviewofBandits
Lastweek Banditarmsagainst“adversaries”
O(n2/3)
O(n3/4)
SLIDE 4 ExtendingtheArms ExtendingtheArms
Whataboutinfinitelymanyarms? DrawarmsfromX =[0,1]D
- Ddimensionalvectorofvaluesfrom0to1
Meanpayofffunction,f,mapsfromX
- Noadversaries(fixedpayoffs)
SLIDE 5
ExtendingtheArms ExtendingtheArms
Whatiftherearenorestrictionsonthe
shapeoff?
SLIDE 6
ExtendingtheArms ExtendingtheArms
Whatiftherearenorestrictionsonthe
shapeoff?
Thenwedon’tknowanythingaboutarms
wehaven’tpulled
SLIDE 7
ExtendingtheArms ExtendingtheArms
Whatiftherearenorestrictionsonthe
shapeoff?
Thenwedon’tknowanythingaboutarms
wehaven’tpulled
Withinfinitelymanyarms,thismeanswe
can’tdoanything!
SLIDE 8
ExtendingtheArms ExtendingtheArms
Okay,sonocontinuityatallgoestoofar Generalizethemeanpayofffunction
functiontobe“prettysmooth”
Thatway,wecan(hopefully)get
informationaboutaneighborhoodof armsfromasinglepull
WewilluseLipschitzcontinuity
SLIDE 9
LipschitzContinuity LipschitzContinuity
Intuitively,theslopeofthefunctionis
bounded
Thatis,itneverincreasesordecreases
fasterthanacertainrate
Thisseemslikeitcangiveusinformation
aboutanareawithasinglepull
SLIDE 10
LipschitzContinuity LipschitzContinuity
Formaldefinition: Functionf(x) isLipschitzcontinuousif, Givenadissimilarityfunction,d(x,y), f(x)– f(y)≤ k× d(x,y) kistheLipschitzconstant
SLIDE 11 LipschitzContinuity LipschitzContinuity
Forafunctionf withacertainconstantk,
wecallthefunctionkLipschitz
We’llassume1Lipschitz
- Foranotherk,wecanjustadjustthepayoffs
tomakethefunction1Lipschitz
- We’rereallyjustconcernedwithrelative
performanceversusotherstrategiesonthe samef
SLIDE 12
LipschitzContinuity LipschitzContinuity
Functionwillstayinsidethegreencone
(GraphictakenwithpermissionfromWikipediaunder GNUFreeDocumentationLicense1.2)
SLIDE 13
LipschitzFunctions LipschitzFunctions
ExamplesoffunctionsthatareLipschitz:
SLIDE 14 LipschitzFunctions LipschitzFunctions
ExamplesoffunctionsthatareLipschitz:
- f(x)=sin(x)
- f(x)=|x|
- f(x,y)=x+y
SLIDE 15 LipschitzFunctions LipschitzFunctions
ExamplesoffunctionsthatareLipschitz:
- f(x)=sin(x)
- f(x)=|x|
- f(x,y)=x+y
Andfunctionsthataren’t:
SLIDE 16 LipschitzFunctions LipschitzFunctions
ExamplesoffunctionsthatareLipschitz:
- f(x)=sin(x)
- f(x)=|x|
- f(x,y)=x+y
Andfunctionsthataren’t:
SLIDE 17
Application Application
Whywouldweneedabanditarm
strategyfornonlinearmeanpayoff functions?
SLIDE 18 Application Application
Oneexample:Modelingairflowovera
planewing
Aparametervectorisanarm Pullinganarmiscostly
- Difficulttoactuallycalculate(computer
models,PDEs…)
Stillwanttomaximizesomekindofresult
acrossthearms
SLIDE 19 DevelopinganAlgorithm DevelopinganAlgorithm
Okay,soit’suseful Whatkindofalgorithmshouldweuse? Random?
- We’veseenhowwellthisworksout
Otherobviousapproachesareless
applicablewithinfinitelymanyarms…
SLIDE 20
DevelopinganAlgorithm DevelopinganAlgorithm
WecanreusetheideasfromtheUCB1
algorithm p1 p2 p3 p4
SLIDE 21 AdjustmentsNeeded AdjustmentsNeeded
Notdiscretearms,butacontinuum
- WewillhaveneedaUCBforallarmsover
thearmspace
Wecangetsomeconfidenceaboutany
pulledarm’sneighborsbecauseof Lipschitz
SLIDE 22
StumblingAround StumblingAround
Notdiscretearms,butacontinuum…
[0]xD [1]xD
SLIDE 23
StumblingAround StumblingAround
Newpointsaffecttheirneighbors
[0]xD [1]xD
SLIDE 24
AdjustmentsNeeded AdjustmentsNeeded
Wecanalsosharpenourestimatesfrom
nearbymeasurements
Retain“optimisminthefaceofthe
unknown”
Generalideagotten…buthowdowe
actuallydoit?
SLIDE 25
TheAlgorithm! TheAlgorithm!
Splitthearmspaceintoregions Everytimeyoupickanarmfromaregion,
divideintomorepreciseregions
Keeptrackofhowgoodeveryregionis
throughresultsofitselfanditschildren.
SLIDE 26 SetupfortheAlgorithm SetupfortheAlgorithm
Torememberregions,usea“Treeof
Coverings”
Anodeinthetreewithheighth androw
indexi isrepresentedasPh,i orjust(h,i)
- ThechildrenofPh,i arePh+1,2i1 andPh+1,2i
- ThewholearmspaceX=P0,1
Thechildrenofanodecovertheirparent
SLIDE 27
SetupfortheAlgorithm SetupfortheAlgorithm
Wealwayschoosealeafnode,thenadd
itschildrentothetree.
Eachnodehasa“score” – wepickanew
leafbygoingdownthetree,goingtothe sidewiththegreaterscore.
Score:
Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]} whereUh,i(n)istheupperconfidence boundforthetreenode(h,i)
SLIDE 28
SetupfortheAlgorithm SetupfortheAlgorithm
Onemorecaveat– Foranynode(h,i),
thediameter(determinedbyd,the dissimilarityfunction)ofthesmallest circlethatboundsthenodeislessthan ν1ρhforsomeparametersν,ρ
Alittlemoreformally,
Uh,i(n)=h,i(n)+Chernoff+ν1ρh (Chernoff=sqrt[(2lnn)/Nh,i(n)] )
SLIDE 29
SetupfortheAlgorithm SetupfortheAlgorithm
Score:
Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]}
Whatifyouhavenochildren?
SLIDE 30 SetupfortheAlgorithm SetupfortheAlgorithm
Score:
Bh,i(n)=min{Uh,i(n),maxchildren[Bchild]}
Whatifyouhaven’tbeenpickedyet? Optimisminthefaceofuncertainty!
SLIDE 31
AlgorithmExample AlgorithmExample
f
SLIDE 32
AlgorithmExample AlgorithmExample
f
SLIDE 33 AlgorithmExample AlgorithmExample
f
Y=0.5
SLIDE 34 AlgorithmExample AlgorithmExample
f
Y=0.5
SLIDE 35
AlgorithmExample AlgorithmExample
f
SLIDE 36
Observations Observations
Explorationcomesfromthepessimismof
theBscoreandtheoptimismofthe unknown
Exploitationcomesfromtheoptimismof
theBscoreandfasteliminationofbad partsofthefunction
SLIDE 37
NumericalResults NumericalResults
Thefollowingistakenfromanothertalk
bytheauthor,SébastienBubeck
SLIDE 38
NumericalResults NumericalResults
SLIDE 39 RegretAnalysis RegretAnalysis
Notgoingtogothroughallthemath
PrettysimilartoregretanalysisofUCB1
- Numberoftimesabadarmischosenis
proportionaltolog(n)andinverseto differencetobestarm
- AddalotofmessfromtheLipschitzness
- Actually,weonlyrequire“weakLipschitz”,
whichisasortofonesidedLipschitznearthe bestarms
SLIDE 40 RegretAnalysis RegretAnalysis
Mainresult: E(Rn)≤ C(d')n(d'+1)/(d'+2)(lnn)1/(d'+2)
- Cissomeconstant
- d'isanynumbergreaterthand,andinmost
cases,canbeequaltod
SLIDE 41 RegretAnalysis RegretAnalysis
E(Rn)≤ C(d')n(d'+1)/(d'+2)(lnn)1/(d'+2) Forhighd,wegetcloserandcloserto
linear...
- "TheCurseofDimensionality"
Thisisproventobetight!Tight!
SLIDE 42 DissimilarityFunctions DissimilarityFunctions
We’vejustbeenusingstraightdistance d canbeanymetric
- d(x,y)=0iffx=y
- dmustbesymmetric
- Triangleinequality
Withcreativedissimilarityfunctions,this
issurprisinglypowerful!
SLIDE 43 PowerfulDissimilarities PowerfulDissimilarities
Supposewegobacktotheexampleof
Adssellallsortsofproducts(notquite
infinite,butstillmorethanwe’dwantto tryindividually!
Can’twegetinformationfromknowing
thatsomeadsarerelated?
SLIDE 44
OnlineProductSales OnlineProductSales
SLIDE 45
OnlineProductSales OnlineProductSales
Dissimilarityfunctionshouldmeasure
how,well,dissimilartwoadsare.
Cantakethetree,weighttheedgesas,
say,1/h,andcomputedistance
Cannowusethehierarchicalalgorithm! Newdissimilarityfunctionsaddalotof
mileage...