[PPT] - Log-optimal Investment in Markovian Environments Csaba Szepesv ari PowerPoint Presentation

SLIDE 1

Log-optimal Investment as MDPs

Log-optimal Investment in Markovian Environments

Csaba Szepesv´ ari

Computer and Automation Research Institute of the Hungarian Academy of Sciences Kende u. 13-17, Budapest 1111, Hungary E-mail: szcsaba@sztaki.hu

Morgen Stanley Quantitative and Financial Mathematics Conference 21 October, 2005

Co-workers: Remi Munos, Andr´ as Antos

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 2

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 3

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 4

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 5

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 6

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 7

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 8

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 9

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 10

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 11

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 12

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Process View

X Xt Xt+1 At, Rt π : X → A V π(x) = E[

∞

t=0 γtRt|X0 = x, π]

0 < γ < 1 Qπ(x, a) = E[

∞

t=0 γtRt|X0 = x, A0 = a, π] Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 13

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 14

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 15

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 16

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 17

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 18

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 19

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 20

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 21

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 22

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 23

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 24

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 25

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 26

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 27

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 28

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 29

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics

αt+1 =

Atρt+1 (1−At)+Atρt+1 def

= f0(At, ρt+1) αt At αt+1 Wt Wt+1 trading market dynamics

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 30

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Rewards

rt = log Wt+1 Wt = log ((1 − At) + Atρt+1) + I(At ≥ αt) log

αt + η2

t+1(1 − αt)

At + η2

t+1(1 − At)

.. if we buy euro: ultimately we will suffer some conversion loss

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 31

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Rewards

rt = log Wt+1 Wt = log ((1 − At) + Atρt+1) + I(At ≥ αt) log

αt + η2

t+1(1 − αt)

At + η2

t+1(1 − At)

.. if we buy euro: ultimately we will suffer some conversion loss

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 32

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 33

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 34

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 35

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 36

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 37

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 38

Log-optimal Investment as MDPs Log-optimal Investment Stock Market

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 39

Log-optimal Investment as MDPs Log-optimal Investment Stock Market

Stock Market

.. similar equations can be given:)

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 40

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 41

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Big Picture

Policies Value functions π V π policy evaluation dominating value function V ∗ π∗ greedy policy

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 42

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value Iteration

Policies Value functions V ∗, Q∗ π∗

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 43

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value Iteration – Algorithmic View

Value function V π Value Improvement Model Policy

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 44

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Policy Iteration

Policies Value functions V ∗, Q∗ π∗

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 45

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Policy Iteration – Algorithmic View

Value function Qπ Policy Improvement (Actor) Model Policy π Policy Evaluation (Critic)

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 46

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 47

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 48

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 49

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 50

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 51

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 52

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 53

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 54

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Fitted Value Iteration

Value Improvement Policy Approximate value function ( ˆ Qπ) Value Projection Generated Samples

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 55

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Fitted Policy Iteration

Policy Improvement (Maximization) Samples Greedy policy over ˆ Qπ Policy Evaluation and Projection Approximate Value Function ˆ Qπ

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 56

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 57

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Fitted Value Iteration for Navigation Problems1

From: Boyan & Moore: “Generalization in Reinforcement Learning: Safely Approximating the Value Function”, NIPS-7, 1995.

0.2 0.4 0.6 0.8 1 x 0.2 0.4 0.6 0.8 1 y Continuous Gridworld J*(x,y) 0.20.40.60.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 0.20.40.60.8 1

1With thanks to Justin Boyan Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 58

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Navigation II.

Iteration 12 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 25 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 40 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1

Value Iteration at Work

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 59

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Navigation II.

Iteration 12 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 25 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 40 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 17 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 2 4 6 8 00.20.40.6 0.8 1 Iteration 43 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1

20
10

10 00.20.40.6 0.8 1 Iteration 127 0.20.40.60.8 10 0.2 0.4 0.6 0.8 1

500
400
300
200

0.20.40.60.8 1

Value Iteration at Work

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 60

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Averagers – A Solution

Vt+1 = ΠFTVt Requirement: ΠFT is sup-norm contraction Averagers (Gordon ’95): Kernel averaging (fixed kernel), weighted k-nearest neighbors, B´ ezier patches, linear interpolation on a triangular (or tetrahedral, etc.) mesh, bilinear interpolation on a square (or cubical, etc.), . . .

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 61

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Averagers – A Solution

Vt+1 = ΠFTVt Requirement: ΠFT is sup-norm contraction Averagers (Gordon ’95): Kernel averaging (fixed kernel), weighted k-nearest neighbors, B´ ezier patches, linear interpolation on a triangular (or tetrahedral, etc.) mesh, bilinear interpolation on a square (or cubical, etc.), . . .

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 62

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Pushing the Edge – a Finite-Time Bound

Theorem2: Assume MDP is regular. Fix δ > 0, ǫ > 0, F, ρ, µ. Assume that V, the “capacity” of F is finite. Assume that Bellman-errors for functions in F can be uniformly bounded: sup

g∈F

inf

f∈F f − Tgp,µ ≤ ǫ.

Then, it is possible to select N, M, K such that after K iterations

f the sampling based FVI algorithm run with (µ, N, M)

V ∗ − V πK p,ρ ≤ 4C1/p (1 − γ)2 ǫ with probability at least 1 − δ. Further, N, M, K are polynomial in V, Rmax, 1/ǫ, log |A|, log(1/δ), 1/(1 − γ).

Here C is a constant related to how quickly future state distributions can concentrate away from ρ relative to µ.

2Munos & Szepesv´

ari, ICML-2005

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 63

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 64

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 65

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 66

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 67

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 68

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 69

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 70

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 71

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 72

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 73

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 74

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 75

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 76

Log-optimal Investment as MDPs Application to Log-optimal Investment

Results

Kernel-regression, φt = ∅, N = 100 samples

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 action state (prop. of wealth in euro) ’val_001_000.vfun’ ’val_010_000.vfun’ ’val_019_000.vfun’

Final yield: 0.0014 Yield of CBAL(0.5): 0.00076

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 77

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 78

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 79

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 80

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 81

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 82

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

SLIDE 83

Log-optimal Investment as MDPs Conclusions

Questions? ???

Csaba Szepesv´ ari Log-optimal Investment as MDPs