[PPT] - Dive Deeper in Finance GTC 2017 San Jos California Daniel Egloff PowerPoint Presentation

SLIDE 1

Dive Deeper in Finance

GTC 2017 – San José – California

Daniel Egloff

Dr. sc. math.

Managing Director QuantAlea May 7, 2017

SLIDE 2

Today

▪ Generative models for financial time series

– Sequential latent Gaussian Variational Autoencoder

▪ Implementation in TensorFlow

– Recurrent variational inference using TF control flow operations

▪ Applications to FX data

– 1s to 10s OHLC aggregated data – Event based models for tick data is work in progress

SLIDE 3

Generative Models and GPUs

▪ What I cannot create, I do not understand (Richard Feynman) ▪ Generative models are recent innovation in Deep Learning

– GANs – Generative adversarial networks – VAE – Variational autoencoders

▪ Training is computationally demanding

– Explorative modelling not possible without GPUs

SLIDE 4

Deep Learning

▪ Deep Learning in finance is complementary to existing models and not a

replacement

▪ Deep Learning benefits

– Richer functional relationship between explanatory and response variables – Model complicated interactions – Automatic feature discovery – Capable to handle large amounts of data – Standard training procedures with back propagation and SGD – Frameworks and tooling

SLIDE 5

Latent Variable – Encoding/Decoding

▪ Latent variable can be thought of a encoded representation of x Encoder 𝑞 𝑨 𝑦 𝑦 Decoder 𝑞 𝑦 𝑨 𝑦 𝑨 𝑞 𝑨 ▪ Likelihood serves as decoder ▪ Posterior provides encoder

SLIDE 6

Intractable Maximum Likelihood

▪ Maximum likelihood standard model fitting approach

𝑞 𝑦 = ׬ 𝑞 𝑦 𝑨 𝑞 𝑨 𝑒𝑨 → max

▪ Problem: marginal 𝑞 𝑦 and posterior

𝑞 𝑨 𝑦 =

𝑞 𝑦 𝑨 𝑞 𝑨 𝑞 𝑦

are intractable and their calculation suffers from exponential complexity

▪ Solutions

– Markov Chain MC, Hamiltonian MC – Approximation and variational inference

SLIDE 7

Variational Autoencoders

▪ Assume latent space with prior 𝑞 𝑨 𝑦 𝑦 𝑨 𝑞 𝑨 𝑞 𝑨 𝑦 𝑞 𝑦 𝑨

SLIDE 8

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network 𝑦 𝑦 𝑨 𝑞 𝑨 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨 𝑦

SLIDE 9

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network 𝑦 𝑦 𝑨 𝑞 𝑨 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨 𝑦 𝑟𝜄 𝑨 𝑦 𝜈 𝜏

SLIDE 10

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network ▪ Learn the parameters 𝜄 and 𝜒 with backpropagation 𝑦 𝑦 𝑨 𝑟𝜄 𝑨 𝑦 𝜈 𝜏 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨

SLIDE 11

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

minimizing a distance to real posterior?

Variational Inference

𝑟∗ 𝑨 𝑦 = argmin

𝜄∈𝑅

𝐿𝑀 𝑟𝜄 𝑨 𝑦 ฮ𝑞𝜒 𝑨 𝑦 𝐿𝑀 𝑟𝜄 𝑨 𝑦 ฮ𝑞𝜒 𝑨 𝑦 = 𝐹𝑟𝜄 𝑨 𝑦 log 𝑟𝜄 𝑨 𝑦 − 𝐹𝑟𝜄 𝑨 𝑦 log 𝑞𝜒 𝑦, 𝑨 + log 𝑞𝜒 𝑦 ≥ 0 Can be made small if Q is flexible enough

▪ Problem: not computable because it involves marginal 𝑞𝜒 𝑦

SLIDE 12

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

minimizing a distance to real posterior?

Variational Inference

𝑟∗ 𝑨 𝑦 = argmin

𝜄∈𝑅

𝐿𝑀 𝑟𝜄 𝑨 𝑦 ฮ𝑞𝜒 𝑨 𝑦 0 ≤ 𝐹𝑟𝜄 𝑨 𝑦 log 𝑟𝜄 𝑨 𝑦 − 𝐹𝑟𝜄 𝑨 𝑦 log 𝑞𝜒 𝑦, 𝑨 + log 𝑞𝜒 𝑦 −𝐹𝑀𝐶𝑃(𝜄, 𝜒)

▪ Drop left hand side because positive

SLIDE 13

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

minimizing a distance to real posterior?

Variational Inference

𝑟∗ 𝑨 𝑦 = argmin

𝜄∈𝑅

𝐿𝑀 𝑟𝜄 𝑨 𝑦 ฮ𝑞𝜒 𝑨 𝑦 𝐹𝑀𝐶𝑃(𝜄, 𝜒) ≤ log 𝑞𝜒 𝑦

▪ Obtain tractable lower bound for marginal ▪ Training criterion: maximize evidence lower bound

SLIDE 14

▪ To interpret lower bound, write it as

Variational Inference

= 𝐹𝑟𝜄(𝑨|𝑦) log 𝑞𝜒 𝑦 𝑨 − 𝐿𝑀 𝑟𝜄 𝑨 𝑦 ԡ𝑞 𝑨 Reconstruction score

𝑨~𝑟𝜄 𝑨 𝑦 𝑦

𝑞𝜒 𝑦 𝑨 Penalty of deviation from prior log 𝑞𝜒 𝑦 ≥ 𝐹𝑀𝑃𝐶 𝜄, 𝜒

▪ The smaller the tighter the lower bound

𝐿𝑀 𝑟𝜄 𝑨 𝑦 ฮ𝑞𝜒 𝑨 𝑦

SLIDE 15

Applications to Time Series

▪ Sequence structure for observable and latent factor ▪ Model setup

– Gaussian distributions with parameters calculated from deep recurrent neural

network

– Prior standard Gaussian – Model training with variational inference

SLIDE 16

Inference and Training

𝜈𝑢 𝜏𝑢 𝜈𝑢−1 𝜏𝑢−1

ℎ𝑢+1 ℎ𝑢+1 𝑨𝑢+1 𝑦𝑢+1

𝜈𝑢+1 𝜏𝑢+1 𝜈𝑢 𝜏𝑢

𝑦𝑢−1 𝑨𝑢−1 ℎ𝑢−1 ℎ𝑢−1

𝜈𝑢−1 𝜏𝑢−1 𝜈𝑢+1 𝜏𝑢+1

𝑦𝑢 ℎ𝑢 ℎ𝑢 𝑨𝑢

𝑟𝜄 𝑨 𝑦

SLIDE 17

▪ Probability distributions factorize

Implied Factorization

𝑞𝜒 𝑦≤𝑈 𝑨≤𝑈 = ෑ

𝑢=1 𝑈

𝑞𝜒 𝑦𝑢 𝑦<𝑢, 𝑨≤𝑢 = ෑ

𝑢=1 𝑈

𝑂 𝑦𝑢 𝜈𝜒 𝑦<𝑢, 𝑨≤𝑢 , 𝜏𝜒 𝑦<𝑢, 𝑨≤𝑢

▪ Loss calculation

– Distributions can be easily simulated to calculate expectation term – Kullback Leibler term can be calculated analytically

𝑟𝜄 𝑨≤𝑈 𝑦≤𝑈 = ෑ

𝑢=1 𝑈

𝑟𝜄 𝑨𝑢 𝑦<𝑢, 𝑨<𝑢 = ෑ

𝑢=1 𝑈

𝑂 𝑨𝑢 𝜈𝜄 𝑦<𝑢, 𝑨<𝑢 , 𝜏𝜄 𝑦<𝑢, 𝑨<𝑢

SLIDE 18

▪ Loss calculation

– Kullback Leibler term can be calculated analytically – For fixed 𝑢 the quantities 𝜈𝜒, 𝜈𝜄, 𝜏𝜒, 𝜏𝜄 depend on

𝑨𝑢~𝑂 𝑨𝑢 𝜈𝜄 𝑦<𝑢, 𝑨<𝑢 , 𝜏𝜄 𝑦<𝑢, 𝑨<𝑢

– Simulate from this distribution to estimate expectation with a sample mean

Calculating ELBO

𝐹𝑀𝐶𝑃 𝜄, 𝜒 = −𝐹𝑟 ቂ ቃ σ𝑢 ቄ ቅ 𝑦𝑢 − 𝜈𝜒

𝑈𝜏𝜒−1 𝑦𝑢 − 𝜈𝜒 + log det 𝜏𝜒 +

𝜈𝜄𝑈𝜈𝜄 + 𝑢𝑠𝜏𝜄 − log det 𝜏𝜄 Approximate with Monte Carlo sampling from 𝑟𝜄 𝑨≤𝑈 𝑦≤𝑈

SLIDE 19

Generation

𝜈𝑢 𝜏𝑢

ℎ𝑢+1 𝑨𝑢+1 𝑦𝑢+1

𝜈𝑢+1 𝜏𝑢+1

𝑦𝑢−1 𝑨𝑢−1 ℎ𝑢−1

𝜈𝑢−1 𝜏𝑢−1

𝑨𝑢

𝑞(𝑨)

ℎ𝑢 𝑦𝑢

𝑞𝜒 𝑦 𝑨

SLIDE 20

Time Series Embedding

▪ Single historical value not predictive enough ▪ Embedding

– Use lag of ~20 historical observations at every time step

Time steps Batch

t t +1 t +2

SLIDE 21

Implementation

▪ Implementation in TensorFlow ▪ Running on P100 GPUs for model training ▪ Long time series and large batch sizes require substantial GPU memory

SLIDE 22

TensorFlow Dynamic RNN

▪ Unrolling rnn with tf.nn.dynamic_rnn

– Simple to use – Can handle variable sequence length

▪ Not flexible enough for generative networks

SLIDE 23

TensorFlow Control Structures

▪ Using tf.while_loop

– More to program, need to understand control structures in more detail – Much more flexible

SLIDE 24

Implementation

▪ Notations

SLIDE 25

Implementation

▪ Variable and Weight Setup

Recurrent neural network definition

SLIDE 26

Implementation

▪ Allocate TensorArray objects ▪ Fill input TensorArray objects with data

SLIDE 27

Implementation

▪ While loop body inference part

Update inference rnn state

SLIDE 28

Implementation

▪ While loop body inference part

Update generator rnn state

SLIDE 29

Implementation

▪ Call while loop ▪ Stacking TensorArray objects

SLIDE 30

Implementation

▪ Loss Calculation

SLIDE 31

FX Market

▪ FX market is largest and most liquid market in the world ▪ Decentralized over the counter market

– Not necessary to go through a centralized exchange – No single price for a currency at a given point in time

▪ Fierce competition between market participants ▪ 24 hours, 5 ½ days per week

– As one major forex market closes, another one opens

SLIDE 32

FX Data

▪ Collect tick data from major liquidity provider e.g. LMAX ▪ Aggregation to OHLC bars (1s, 10s, …) ▪ Focus on US trading session

8am – 5pm EST 3am – 12am EST 5pm – 2am EST (Sidney)

London session US session Asian session

7pm – 4am EST (Tokyo)

5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1 12 11 10 9 8 7 6

SLIDE 33

EURUSD 2016

SLIDE 34

Single Day

SLIDE 35

One Hour

SLIDE 36

10 Min Sampled at 1s

5 pips 1/10 pips = 1 deci-pip

At high frequency FX prices fluctuate in range

f deci-pips

Larger jumps in the order of multiple pips and more

SLIDE 37

Setup

▪ Normalize data with std deviation ො

𝜏 over training interval

▪ 260 trading days in 2016, one model per day ▪ 60 dim embedding, 2 dim latent space

ො 𝜏 Training Out of sample test

SLIDE 38

Results

Training

SLIDE 39

Out of Sample

SLIDE 40

Volatility of Prediction

SLIDE 41

Latent Variables

SLIDE 42

Pricing in E-Commerce

▪ Attend our talk on our latest work on AI and GPU accelerated genetic

algorithms with Jet.com

SLIDE 43

Daniel Egloff

Dr. sc. math.

Phone: +41 79 430 03 61 daniel.egloff@quantalea.net

Dive Deeper in Finance

Today

▪ Generative models for financial time series

Generative Models and GPUs

▪ What I cannot create, I do not understand (Richard Feynman) ▪ Generative models are recent innovation in Deep Learning

Deep Learning

▪ Deep Learning in finance is complementary to existing models and not a

Latent Variable – Encoding/Decoding

▪ Latent variable can be thought of a encoded representation of x Encoder 𝑞 𝑨 𝑦 𝑦 Decoder 𝑞 𝑦 𝑨 𝑦 𝑨 𝑞 𝑨 ▪ Likelihood serves as decoder ▪ Posterior provides encoder

Intractable Maximum Likelihood

▪ Maximum likelihood standard model fitting approach

▪ Solutions

Variational Autoencoders

▪ Assume latent space with prior 𝑞 𝑨 𝑦 𝑦 𝑨 𝑞 𝑨 𝑞 𝑨 𝑦 𝑞 𝑦 𝑨

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network 𝑦 𝑦 𝑨 𝑞 𝑨 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨 𝑦

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network 𝑦 𝑦 𝑨 𝑞 𝑨 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨 𝑦 𝑟𝜄 𝑨 𝑦 𝜈 𝜏

Variational Autoencoders

▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network ▪ Learn the parameters 𝜄 and 𝜒 with backpropagation 𝑦 𝑦 𝑨 𝑟𝜄 𝑨 𝑦 𝜈 𝜏 𝜈 𝜏 𝑞𝜒 𝑦 𝑨 𝑞 𝑨

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

Variational Inference

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

Variational Inference

▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by

Variational Inference

Variational Inference

Applications to Time Series

▪ Sequence structure for observable and latent factor ▪ Model setup

Inference and Training

▪ Probability distributions factorize

Implied Factorization

▪ Loss calculation

Calculating ELBO

Generation

Time Series Embedding

▪ Single historical value not predictive enough ▪ Embedding

Implementation

▪ Implementation in TensorFlow ▪ Running on P100 GPUs for model training ▪ Long time series and large batch sizes require substantial GPU memory

TensorFlow Dynamic RNN

▪ Unrolling rnn with tf.nn.dynamic_rnn

TensorFlow Control Structures

▪ Using tf.while_loop

Implementation

▪ Notations

Implementation

▪ Variable and Weight Setup

Implementation

▪ Allocate TensorArray objects ▪ Fill input TensorArray objects with data

Implementation

▪ While loop body inference part

Implementation

▪ While loop body inference part

Implementation

▪ Call while loop ▪ Stacking TensorArray objects

Implementation

▪ Loss Calculation

FX Market

▪ FX market is largest and most liquid market in the world ▪ Decentralized over the counter market

FX Data

▪ Collect tick data from major liquidity provider e.g. LMAX ▪ Aggregation to OHLC bars (1s, 10s, …) ▪ Focus on US trading session

EURUSD 2016

Single Day

One Hour

10 Min Sampled at 1s

Setup

▪ Normalize data with std deviation ො

Results

Out of Sample

Volatility of Prediction

Latent Variables

Pricing in E-Commerce

▪ Attend our talk on our latest work on AI and GPU accelerated genetic

Contact details