Online linear optimization and adaptive routing Baruch Awerbuch, - - PowerPoint PPT Presentation
Online linear optimization and adaptive routing Baruch Awerbuch, - - PowerPoint PPT Presentation
Online linear optimization and adaptive routing Baruch Awerbuch, Robert Kleinberg Motivation Overlay network routing Send a packet from source to target using the route with minimum delay The total route delay is revealed Graph
Motivation
- Overlay network routing – Send a packet from
source to target using the route with minimum delay
- The total route delay is revealed
- Graph example
s r 12 1 5 3 12 5 1 10
Using previous algorithms
- We can use EXP3. Each route is a arm. Since
we have n! routes, our regret will be
- We have seen online shortest paths with (full
information)
O(√(K Gmaxln K))→O(√(n!lnn!)) E[cost]≤(1+ϵ)mincostT+O(mnlog n/ϵ)
Problem definition
- G=(V,E) – Directed graph
- For each j = 1, …, T the adaptive adversary
select cost for each edge
- The algorithm select a path of length ≤ H
- Receive cost of the entire path
- Goal to minimize the difference between the
algorithm's expected total cost and the cost of the best single path from source to target
c j: E →[0,1]
Regret
O(H
2(mH logΔ logmHT ) 1/3T 2/3)
Pre-processing
- We will transform the graph G to a leveled
directed acyclic graph
- Start by calculating G x {0, 1, …, H}
– Vertex set V x {0, 1, …, H} – ei from (u, i - 1) to (v, i) for every e=(u, v) in E
- The graph is obtained by:
– Deleting paths that doesn't reach to r
̃ G=( ̃ V , ̃ E) ̃ G
Main idea
- We can traverse the graph by querying BEX
for probabilities on the outgoing edges until we reach r
- To do so we need to feed BEX with
information on all experts
- We will run in phases, at each phase we will
estimate the cost for all experts. At the end of each phase we will update BEX
- We will feed BEX with the total path cost
Sampling experts
- We can sample the experts according to the
distribution BEX returns (according to the previous phases costs)
- The problem – We might ignore some edges
that might be better at next phases
- We will add some exploration steps at each
phase
Exploration
- Will occur with probability δ
- Choose an edge e=(u,v) uniformly at random
- Construct a path by joining prefix(u), e and
suffix(v)
Suffix
- Suffix(v) will return the distribution on s – v
paths
- Implementation – Choose edge by BEX
probabilities, traverse the edge, repeat until r is reached
- Why can't it be random?
v r 1 1000 1 1 1 2 10
Prefix
- Prefix(v) – Will return the distribution on s - v
paths
- Let suffix(u | v) be the distribution on u – v paths
- Obtained by sampling from suffix(u) conditional
to the event that the path passes through v.
Prefix
- Sample from suffix(s | v) with probability
- For all e = (q,u) from , with probability
sample from suffix(u | v) prepend e and then prepend a sample from prefix(q)
- Where PΦ(v) is the probability v is contained in
the suffix of a path in phase Φ
(1−δ)Pr(v∈suffix(s))/Pϕ(v) ̃ E (δ/ ̃ m)Pr(v∈suffix(u))/Pϕ(v)
Updating costs
- Phase length
- At each phase we will sum the costs for each
edge only if the edge wasn't part of the path chosen by prefix
- The reason for that is that we cannot control
the probability those edges came from
τ=⌈2mH log(mH T )/δ⌉
Updating costs
- At the end of each phase
Where
∀e∈ ̃ E , μϕ(e)← E[∑
j∈τϕ
χ j(e)] ̃ cϕ(e)←(∑
j∈τϕ
χ j(e)c j(π j))/μϕ(e) ϕ=1,... ,.⌈T /τ⌉ j=τ(ϕ−1)+1, τ(ϕ−1)+2,... , τϕ
Algorithm analysis
- Let
C
− (v)=∑ j=1 T
E[c j( prefix(v))] C
+ (v)=∑ j=1 T
E[c j(suffix(v))] OPT (v)=min pathsπ:v → r∑
j=1 T
c j(π)
Algorithm analysis
- We know that for BEX
- Let pϕ be the probability distribution supplied
by BEX(v) during phase ϕ
∑
j=1 t
∑
i=1 K
p j(i)c j(i)≤∑
j=1 t
c j(k)+O(ϵt+log K /ϵ)M
∑
ϕ=1 t
∑
e∈Δ(v)
pϕ(e) ̃ cϕ(e)≤∑
ϕ=1 t
̃ cϕ(e0)+O(ϵ H t+H logΔ/ϵ)
Algorithm analysis
- We used the fact that cost of a phase M is
smaller than 3H with high probability. By Chernoff bound
τ=2mHlog(mHT ) δ μϕ>δ τ mH =2log(mHT ) Pr(∑
j∈τϕ
χ j≥3∗2log(mHT ))≤e
−2/32log(mH T )≤1
mHT
Algorithm analysis
- Now by applying union bound over all phases
we get that this low probability event contributes at most HT / (mHT) < 1. So we will ignore this event
Algorithm analysis
- Expanding ̃
cϕ (Eq.12) ∑
ϕ=1 t
∑
e∈Δ(v) ∑ j∈τϕ
pϕ(e)χ j(e)c j(π j)/μϕ(e) ≤∑
ϕ=1 t
∑
j∈τϕ
χ j(e0)c j(π j)/μϕ(e0)+O(ϵ Ht+H ϵ log Δ)
Algorithm analysis
- Claim 3.2.
Pr(π⊂π j∣χ j(e)=1)=Pr( prefix(v)=π) π:s →v
Algorithm analysis
- Proof of claim 3.2
- The first claim is by definition, let's prove the
second claim
χ j(e)=1 →e∈π j
0∨e∈π j +
Pr(π⊆π j∣e∈π j
0)=Pr( prefix(v)=π)
Pr(π⊆π j∣e∈π j
+ )=Pr( prefix(v)=π)
Algorithm analysis
- e is sampled independently from the path
preceding v, so
Pr(π⊆π j∣e∈π j
+ )=Pr(π∈π j∣v∈π j + )
Pr(v∈π j
+ )Pr(π⊆π j∣v∈π j + )=Pr(π⊆π j∩v∈π j + )
=(1−δ)Pr(v∈suffix(s))Pr(π=suffix(s∣v)) + ∑
e=(q ,u)∈ ̃ E
δ ̃ m Pr(v∈suffix(u)) Pr(π= prefix(q)∪{e}∪suffix(u∣v)) =Pr(v∈π j
+ )Pr(π= prefix(v))
Algorithm analysis
- Claim 3.3. If e =(v, w) then
- Follows from claim 3.2 that the portion of the
path preceding e is distributed by prefix(v)
E[χ j(e)c j(π j)]=(μ(e)/τ)(A j(v)+B j(w)+c j(e)) A j(v)=E[c j( prefix(v))] B j(w)=E[c j(suffix(w))]
Algorithm analysis
- Taking the expectation of eq.12
The left side will become
- The right side will become
∑
ϕ=1 t
∑
e∈Δ(v) ∑ j∈τϕ
pϕ(e)(A j(v)+B j(w)+c j(e)) =1 τ ∑
j=1 T
∑
e∈Δ(v)
pϕ(e)(A j(v)+B j(w)+c j(e)) 1 τ ∑
j=1 T
(A j(v)+B j(w0)+c j(e0))
Algorithm analysis
- After removing Aj(v) from both sides and
notice that
- So the left side will become
∑
e∈Δ(v)
pϕ(e)(B j(w)+c j(e))=E [c j(suffix(v))] 1 τ ∑
j=1 T
E[c j(suffix(v))]=c
+ (v)/τ
Algorithm analysis
- The right side will become
- Thus we have derived the local performance
guarantee (Eq.13)
1 τ ∑
j=1 T
E[c j(suffix(v))]+c
+ (w0)/τ+O(ϵ Ht+H
ϵ logΔ) c
+ (v)≤c + (w0)+∑ j=1 T
c j(e0)+O(ϵ HT +τ ϵ H log Δ)
Global performance guarantee
- Claim 3.4
- To prove we can use the following observation
c
+ (v)≤OPT (v)+O(ϵ HT+τ
ϵ H logΔ)h(v) OPT (v)=mine0=(v ,w0){∑
j=1 T
c j(e0)+OPT (w0)}
Global performance guarantee
- Proof – By induction on h(v) and by using the
local performance guarantee
- Lets mark
- Now rewrite the claim and eq.13
F=O(ϵ Ht+τ H ϵ log Δ) c
+ (v)≤c + (w0)+∑ j=1 T
c j(e0)+F c
+ (v)≤OPT (v)+F h(v)
Global performance guarantee
- h(v)=1
It's true by the local performance guarantee
c
+ (v)≤OPT (v)+F=∑ j=1 T
c j(e0)+OPT (r)+F :∀e0=(v ,r) c
+ (v)≤∑ j=1 T
c j(e0)+F :∀e0=(v ,r)
Global performance guarantee
- h(v)=k+1
c
+ (v)≤c + (vk)+∑ j=1 T
c j(ek+1)+F ≤∑
j=1 T
c j(ek+1)+OPT (vk)+kF+F =OPT (vk+1)+(k+1)F
Regret
- Theorem 3.5. The algorithm suffers regret
- The exploration step contributes
- The exploitation contributes
- Also
- Substituting in claim 3.4 we get total
exploitation cost
O(H
2(mH log Δ log mHT ) 1/3T 2/3)
δTH c
+ (s)−OPT (s)
τ=2mH log(mH T )/δ c
+ (s)−OPT (s)=O(ϵT+2mHlogΔlog(mhT )
ϵδ )H
2
Regret
- We can assign
And we will get the desired regret
Regret≤O(δT+ϵT+2mHlogΔlog(mhT ) ϵδ )H
2
ϵ=δ=(2mH log Δlog(mhT ))
1/3T −1/3
O(H
2(mH log Δ log mHT ) 1/3T 2/3)