[PPT] - Online linear optimization and adaptive routing Baruch Awerbuch, PowerPoint Presentation

SLIDE 1

Online linear optimization and adaptive routing

Baruch Awerbuch, Robert Kleinberg

SLIDE 2

Motivation

Overlay network routing – Send a packet from

source to target using the route with minimum delay

The total route delay is revealed
Graph example

s r 12 1 5 3 12 5 1 10

SLIDE 3

Using previous algorithms

We can use EXP3. Each route is a arm. Since

we have n! routes, our regret will be

We have seen online shortest paths with (full

information)

O(√(K Gmaxln K))→O(√(n!lnn!)) E[cost]≤(1+ϵ)mincostT+O(mnlog n/ϵ)

SLIDE 4

Problem definition

G=(V,E) – Directed graph
For each j = 1, …, T the adaptive adversary

select cost for each edge

The algorithm select a path of length ≤ H
Receive cost of the entire path
Goal to minimize the difference between the

algorithm's expected total cost and the cost of the best single path from source to target

c j: E →[0,1]

SLIDE 5

Regret

O(H

2(mH logΔ logmHT ) 1/3T 2/3)

SLIDE 6

Pre-processing

We will transform the graph G to a leveled

directed acyclic graph

Start by calculating G x {0, 1, …, H}

– Vertex set V x {0, 1, …, H} – ei from (u, i - 1) to (v, i) for every e=(u, v) in E

The graph is obtained by:

– Deleting paths that doesn't reach to r

̃ G=( ̃ V , ̃ E) ̃ G

SLIDE 7

Main idea

We can traverse the graph by querying BEX

for probabilities on the outgoing edges until we reach r

To do so we need to feed BEX with

information on all experts

We will run in phases, at each phase we will

estimate the cost for all experts. At the end of each phase we will update BEX

We will feed BEX with the total path cost

SLIDE 8

Sampling experts

We can sample the experts according to the

distribution BEX returns (according to the previous phases costs)

The problem – We might ignore some edges

that might be better at next phases

We will add some exploration steps at each

phase

SLIDE 9

Exploration

Will occur with probability δ
Choose an edge e=(u,v) uniformly at random
Construct a path by joining prefix(u), e and

suffix(v)

SLIDE 10

Suffix

Suffix(v) will return the distribution on s – v

paths

Implementation – Choose edge by BEX

probabilities, traverse the edge, repeat until r is reached

Why can't it be random?

v r 1 1000 1 1 1 2 10

SLIDE 11

Prefix

Prefix(v) – Will return the distribution on s - v

paths

Let suffix(u | v) be the distribution on u – v paths
Obtained by sampling from suffix(u) conditional

to the event that the path passes through v.

SLIDE 12

Prefix

Sample from suffix(s | v) with probability
For all e = (q,u) from , with probability

sample from suffix(u | v) prepend e and then prepend a sample from prefix(q)

Where PΦ(v) is the probability v is contained in

the suffix of a path in phase Φ

(1−δ)Pr(v∈suffix(s))/Pϕ(v) ̃ E (δ/ ̃ m)Pr(v∈suffix(u))/Pϕ(v)

SLIDE 13

Updating costs

Phase length
At each phase we will sum the costs for each

edge only if the edge wasn't part of the path chosen by prefix

The reason for that is that we cannot control

the probability those edges came from

τ=⌈2mH log(mH T )/δ⌉

SLIDE 14

Updating costs

At the end of each phase

Where

∀e∈ ̃ E , μϕ(e)← E[∑

j∈τϕ

χ j(e)] ̃ cϕ(e)←(∑

j∈τϕ

χ j(e)c j(π j))/μϕ(e) ϕ=1,... ,.⌈T /τ⌉ j=τ(ϕ−1)+1, τ(ϕ−1)+2,... , τϕ

SLIDE 15

Algorithm analysis

Let

C

− (v)=∑ j=1 T

E[c j( prefix(v))] C

+ (v)=∑ j=1 T

E[c j(suffix(v))] OPT (v)=min pathsπ:v → r∑

j=1 T

c j(π)

SLIDE 16

Algorithm analysis

We know that for BEX
Let pϕ be the probability distribution supplied

by BEX(v) during phase ϕ

∑

j=1 t

∑

i=1 K

p j(i)c j(i)≤∑

j=1 t

c j(k)+O(ϵt+log K /ϵ)M

∑

ϕ=1 t

∑

e∈Δ(v)

pϕ(e) ̃ cϕ(e)≤∑

ϕ=1 t

̃ cϕ(e0)+O(ϵ H t+H logΔ/ϵ)

SLIDE 17

Algorithm analysis

We used the fact that cost of a phase M is

smaller than 3H with high probability. By Chernoff bound

τ=2mHlog(mHT ) δ μϕ>δ τ mH =2log(mHT ) Pr(∑

j∈τϕ

χ j≥3∗2log(mHT ))≤e

−2/32log(mH T )≤1

mHT

SLIDE 18

Algorithm analysis

Now by applying union bound over all phases

we get that this low probability event contributes at most HT / (mHT) < 1. So we will ignore this event

SLIDE 19

Algorithm analysis

Expanding ̃

cϕ (Eq.12) ∑

ϕ=1 t

∑

e∈Δ(v) ∑ j∈τϕ

pϕ(e)χ j(e)c j(π j)/μϕ(e) ≤∑

ϕ=1 t

∑

j∈τϕ

χ j(e0)c j(π j)/μϕ(e0)+O(ϵ Ht+H ϵ log Δ)

SLIDE 20

Algorithm analysis

Claim 3.2.

Pr(π⊂π j∣χ j(e)=1)=Pr( prefix(v)=π) π:s →v

SLIDE 21

Algorithm analysis

Proof of claim 3.2
The first claim is by definition, let's prove the

second claim

χ j(e)=1 →e∈π j

0∨e∈π j +

Pr(π⊆π j∣e∈π j

0)=Pr( prefix(v)=π)

Pr(π⊆π j∣e∈π j

+ )=Pr( prefix(v)=π)

SLIDE 22

Algorithm analysis

e is sampled independently from the path

preceding v, so

Pr(π⊆π j∣e∈π j

+ )=Pr(π∈π j∣v∈π j + )

Pr(v∈π j

+ )Pr(π⊆π j∣v∈π j + )=Pr(π⊆π j∩v∈π j + )

=(1−δ)Pr(v∈suffix(s))Pr(π=suffix(s∣v)) + ∑

e=(q ,u)∈ ̃ E

δ ̃ m Pr(v∈suffix(u)) Pr(π= prefix(q)∪{e}∪suffix(u∣v)) =Pr(v∈π j

+ )Pr(π= prefix(v))

SLIDE 23

Algorithm analysis

Claim 3.3. If e =(v, w) then
Follows from claim 3.2 that the portion of the

path preceding e is distributed by prefix(v)

E[χ j(e)c j(π j)]=(μ(e)/τ)(A j(v)+B j(w)+c j(e)) A j(v)=E[c j( prefix(v))] B j(w)=E[c j(suffix(w))]

SLIDE 24

Algorithm analysis

Taking the expectation of eq.12

The left side will become

The right side will become

∑

ϕ=1 t

∑

e∈Δ(v) ∑ j∈τϕ

pϕ(e)(A j(v)+B j(w)+c j(e)) =1 τ ∑

j=1 T

∑

e∈Δ(v)

pϕ(e)(A j(v)+B j(w)+c j(e)) 1 τ ∑

j=1 T

(A j(v)+B j(w0)+c j(e0))

SLIDE 25

Algorithm analysis

After removing Aj(v) from both sides and

notice that

So the left side will become

∑

e∈Δ(v)

pϕ(e)(B j(w)+c j(e))=E [c j(suffix(v))] 1 τ ∑

j=1 T

E[c j(suffix(v))]=c

+ (v)/τ

SLIDE 26

Algorithm analysis

The right side will become
Thus we have derived the local performance

guarantee (Eq.13)

1 τ ∑

j=1 T

E[c j(suffix(v))]+c

+ (w0)/τ+O(ϵ Ht+H

ϵ logΔ) c

+ (v)≤c + (w0)+∑ j=1 T

c j(e0)+O(ϵ HT +τ ϵ H log Δ)

SLIDE 27

Global performance guarantee

Claim 3.4
To prove we can use the following observation

c

+ (v)≤OPT (v)+O(ϵ HT+τ

ϵ H logΔ)h(v) OPT (v)=mine0=(v ,w0){∑

j=1 T

c j(e0)+OPT (w0)}

SLIDE 28

Global performance guarantee

Proof – By induction on h(v) and by using the

local performance guarantee

Lets mark
Now rewrite the claim and eq.13

F=O(ϵ Ht+τ H ϵ log Δ) c

+ (v)≤c + (w0)+∑ j=1 T

c j(e0)+F c

+ (v)≤OPT (v)+F h(v)

SLIDE 29

Global performance guarantee

h(v)=1

It's true by the local performance guarantee

c

+ (v)≤OPT (v)+F=∑ j=1 T

c j(e0)+OPT (r)+F :∀e0=(v ,r) c

+ (v)≤∑ j=1 T

c j(e0)+F :∀e0=(v ,r)

SLIDE 30

Global performance guarantee

h(v)=k+1

c

+ (v)≤c + (vk)+∑ j=1 T

c j(ek+1)+F ≤∑

j=1 T

c j(ek+1)+OPT (vk)+kF+F =OPT (vk+1)+(k+1)F

SLIDE 31

Regret

Theorem 3.5. The algorithm suffers regret
The exploration step contributes
The exploitation contributes
Also
Substituting in claim 3.4 we get total

exploitation cost

O(H

2(mH log Δ log mHT ) 1/3T 2/3)

δTH c

+ (s)−OPT (s)

τ=2mH log(mH T )/δ c

+ (s)−OPT (s)=O(ϵT+2mHlogΔlog(mhT )

ϵδ )H

2

SLIDE 32

Regret

We can assign

And we will get the desired regret

Regret≤O(δT+ϵT+2mHlogΔlog(mhT ) ϵδ )H

2

ϵ=δ=(2mH log Δlog(mhT ))

1/3T −1/3

O(H

2(mH log Δ log mHT ) 1/3T 2/3)