Log-optimal Investment in Markovian Environments Csaba Szepesv ari - - PowerPoint PPT Presentation

log optimal investment in markovian environments
SMART_READER_LITE
LIVE PREVIEW

Log-optimal Investment in Markovian Environments Csaba Szepesv ari - - PowerPoint PPT Presentation

Log-optimal Investment as MDPs Log-optimal Investment in Markovian Environments Csaba Szepesv ari Computer and Automation Research Institute of the Hungarian Academy of Sciences Kende u. 13-17, Budapest 1111, Hungary E-mail:


slide-1
SLIDE 1

Log-optimal Investment as MDPs

Log-optimal Investment in Markovian Environments

Csaba Szepesv´ ari

Computer and Automation Research Institute of the Hungarian Academy of Sciences Kende u. 13-17, Budapest 1111, Hungary E-mail: szcsaba@sztaki.hu

Morgen Stanley Quantitative and Financial Mathematics Conference 21 October, 2005

Co-workers: Remi Munos, Andr´ as Antos

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-2
SLIDE 2

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-3
SLIDE 3

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-4
SLIDE 4

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-5
SLIDE 5

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-6
SLIDE 6

Log-optimal Investment as MDPs Outline

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-7
SLIDE 7

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-8
SLIDE 8

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-9
SLIDE 9

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-10
SLIDE 10

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-11
SLIDE 11

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Markovian Decision Problems

Definition (X, A, P, r) MDP: State space X (⊂ Rd) Action space A Transition probabilities P(·|x, a) Reward function r(x, a).

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-12
SLIDE 12

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Process View

X Xt Xt+1 At, Rt π : X → A V π(x) = E[

t=0 γtRt|X0 = x, π]

0 < γ < 1 Qπ(x, a) = E[

t=0 γtRt|X0 = x, A0 = a, π] Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-13
SLIDE 13

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-14
SLIDE 14

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-15
SLIDE 15

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-16
SLIDE 16

Log-optimal Investment as MDPs Introduction Markovian Decision Problems

Reinforcement Learning

Goal: Finding an optimal policy .. in an unknown MDP by just observing a trajectory .. when a generative model of the MDP is given ..large MDP .. when a model of the MDP is given

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-17
SLIDE 17

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-18
SLIDE 18

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-19
SLIDE 19

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-20
SLIDE 20

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-21
SLIDE 21

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-22
SLIDE 22

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-23
SLIDE 23

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Simple FX Example

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

p12(t) – amount of dollar purchased for 1 euro Wt – wealth (calc’ed in dollars) αt – relative portfolio; proportion of wealth in euros

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-24
SLIDE 24

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-25
SLIDE 25

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-26
SLIDE 26

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-27
SLIDE 27

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-28
SLIDE 28

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics and Bid-Ask Spread

2-currency exchange rates:

dollar: p12(t) euro: p21(t)

Dynamics of dollar’s exchange rate: p12(t + 1) p12(t) = ρt+1 Bid-ask spread: p12(t + 1)p21(t + 1) = η2

t+1 < 1

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-29
SLIDE 29

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Dynamics

αt+1 =

Atρt+1 (1−At)+Atρt+1 def

= f0(At, ρt+1) αt At αt+1 Wt Wt+1 trading market dynamics

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-30
SLIDE 30

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Rewards

rt = log Wt+1 Wt = log ((1 − At) + Atρt+1) + I(At ≥ αt) log

  • αt + η2

t+1(1 − αt)

At + η2

t+1(1 − At)

  • .. if we buy euro: ultimately we will suffer some conversion loss

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-31
SLIDE 31

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

FX: Rewards

rt = log Wt+1 Wt = log ((1 − At) + Atρt+1) + I(At ≥ αt) log

  • αt + η2

t+1(1 − αt)

At + η2

t+1(1 − At)

  • .. if we buy euro: ultimately we will suffer some conversion loss

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-32
SLIDE 32

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-33
SLIDE 33

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-34
SLIDE 34

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-35
SLIDE 35

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-36
SLIDE 36

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-37
SLIDE 37

Log-optimal Investment as MDPs Log-optimal Investment FX Markets

Markovian Dynamics

(φt, ρt, η2

t ) – Markovian dynamics

MDP:

State: Xt = (φt, ρt, η2

t , αt)

Actions: A = [0, 1] Rewards: rt = r(αt, at, ρt+1, η2

t+1).

Time-evolution: Xt+1 = f(Xt, At, Wt), Wt “noise”

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-38
SLIDE 38

Log-optimal Investment as MDPs Log-optimal Investment Stock Market

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-39
SLIDE 39

Log-optimal Investment as MDPs Log-optimal Investment Stock Market

Stock Market

.. similar equations can be given:)

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-40
SLIDE 40

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-41
SLIDE 41

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Big Picture

Policies Value functions π V π policy evaluation dominating value function V ∗ π∗ greedy policy

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-42
SLIDE 42

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value Iteration

Policies Value functions V ∗, Q∗ π∗

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-43
SLIDE 43

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value Iteration – Algorithmic View

Value function V π Value Improvement Model Policy

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-44
SLIDE 44

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Policy Iteration

Policies Value functions V ∗, Q∗ π∗

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-45
SLIDE 45

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Policy Iteration – Algorithmic View

Value function Qπ Policy Improvement (Actor) Model Policy π Policy Evaluation (Critic)

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-46
SLIDE 46

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-47
SLIDE 47

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-48
SLIDE 48

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-49
SLIDE 49

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-50
SLIDE 50

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-51
SLIDE 51

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-52
SLIDE 52

Log-optimal Investment as MDPs Solution Methods for MDPs Classics

Value- and Policy Iteration

Good Exact algorithms (asymptotically correct) Geometric convergence rate Bad Requires model (analytic form) Integration over state-space What if model is unknown?

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-53
SLIDE 53

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-54
SLIDE 54

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Fitted Value Iteration

Value Improvement Policy Approximate value function ( ˆ Qπ) Value Projection Generated Samples

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-55
SLIDE 55

Log-optimal Investment as MDPs Solution Methods for MDPs Approximate Methods

Fitted Policy Iteration

Policy Improvement (Maximization) Samples Greedy policy over ˆ Qπ Policy Evaluation and Projection Approximate Value Function ˆ Qπ

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-56
SLIDE 56

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Outline

1

Introduction Markovian Decision Problems

2

Log-optimal Investment FX Markets Stock Market

3

Solution Methods for MDPs Classics Approximate Methods Does it Work?

4

Application to Log-optimal Investment

5

Conclusions

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-57
SLIDE 57

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Fitted Value Iteration for Navigation Problems1

From: Boyan & Moore: “Generalization in Reinforcement Learning: Safely Approximating the Value Function”, NIPS-7, 1995.

0.2 0.4 0.6 0.8 1 x 0.2 0.4 0.6 0.8 1 y Continuous Gridworld J*(x,y) 0.20.40.60.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 0.20.40.60.8 1

1With thanks to Justin Boyan Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-58
SLIDE 58

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Navigation II.

Iteration 12 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 25 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 40 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1

Value Iteration at Work

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-59
SLIDE 59

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Navigation II.

Iteration 12 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 25 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 40 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 5 10 15 20 00.20.40.6 0.8 1 Iteration 17 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1 2 4 6 8 00.20.40.6 0.8 1 Iteration 43 00.20.40.6 0.8 10 0.2 0.4 0.6 0.8 1

  • 20
  • 10

10 00.20.40.6 0.8 1 Iteration 127 0.20.40.60.8 10 0.2 0.4 0.6 0.8 1

  • 500
  • 400
  • 300
  • 200

0.20.40.60.8 1

Value Iteration at Work

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-60
SLIDE 60

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Averagers – A Solution

Vt+1 = ΠFTVt Requirement: ΠFT is sup-norm contraction Averagers (Gordon ’95): Kernel averaging (fixed kernel), weighted k-nearest neighbors, B´ ezier patches, linear interpolation on a triangular (or tetrahedral, etc.) mesh, bilinear interpolation on a square (or cubical, etc.), . . .

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-61
SLIDE 61

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Averagers – A Solution

Vt+1 = ΠFTVt Requirement: ΠFT is sup-norm contraction Averagers (Gordon ’95): Kernel averaging (fixed kernel), weighted k-nearest neighbors, B´ ezier patches, linear interpolation on a triangular (or tetrahedral, etc.) mesh, bilinear interpolation on a square (or cubical, etc.), . . .

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-62
SLIDE 62

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Pushing the Edge – a Finite-Time Bound

Theorem2: Assume MDP is regular. Fix δ > 0, ǫ > 0, F, ρ, µ. Assume that V, the “capacity” of F is finite. Assume that Bellman-errors for functions in F can be uniformly bounded: sup

g∈F

inf

f∈F f − Tgp,µ ≤ ǫ.

Then, it is possible to select N, M, K such that after K iterations

  • f the sampling based FVI algorithm run with (µ, N, M)

V ∗ − V πK p,ρ ≤ 4C1/p (1 − γ)2 ǫ with probability at least 1 − δ. Further, N, M, K are polynomial in V, Rmax, 1/ǫ, log |A|, log(1/δ), 1/(1 − γ).

Here C is a constant related to how quickly future state distributions can concentrate away from ρ relative to µ.

2Munos & Szepesv´

ari, ICML-2005

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-63
SLIDE 63

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-64
SLIDE 64

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-65
SLIDE 65

Log-optimal Investment as MDPs Solution Methods for MDPs Does it Work?

Extension to Fitted Policy Iteration

Previous result required generative model Single sample path?

YES!

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-66
SLIDE 66

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-67
SLIDE 67

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-68
SLIDE 68

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-69
SLIDE 69

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-70
SLIDE 70

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-71
SLIDE 71

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-72
SLIDE 72

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-73
SLIDE 73

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-74
SLIDE 74

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-75
SLIDE 75

Log-optimal Investment as MDPs Application to Log-optimal Investment

Log-optimal Investment – FX

Fitted Value Iteration (with generative model): ⇒ +++ Fitted Policy Iteration (single sample path): ⇒ - - - Trick:

Xt = (φt, ρt, η2

t , αt)

φt, ρt, η2

t – market state: external

αt – portfolio state: internal Systematic sampling of the portfolio-state ⇒ +++

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-76
SLIDE 76

Log-optimal Investment as MDPs Application to Log-optimal Investment

Results

Kernel-regression, φt = ∅, N = 100 samples

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 action state (prop. of wealth in euro) ’val_001_000.vfun’ ’val_010_000.vfun’ ’val_019_000.vfun’

Final yield: 0.0014 Yield of CBAL(0.5): 0.00076

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-77
SLIDE 77

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-78
SLIDE 78

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-79
SLIDE 79

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-80
SLIDE 80

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-81
SLIDE 81

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-82
SLIDE 82

Log-optimal Investment as MDPs Conclusions

Conclusions

MDPs – not only in finite spaces Fitted Value/Policy Iteration Generative Model: OK Single-sample Path: Requires care Good: No “state”, just good enough features Alternatives: Gradient Methods3

3Gerencs´

er et al.: Log-optimal Currency Portfolios and Control Lyapunov Exponent

Csaba Szepesv´ ari Log-optimal Investment as MDPs

slide-83
SLIDE 83

Log-optimal Investment as MDPs Conclusions

Questions? ???

Csaba Szepesv´ ari Log-optimal Investment as MDPs