GTC 2018 Silicon Valley, California Predictive Learning of Factor - - PowerPoint PPT Presentation

gtc 2018 silicon valley california
SMART_READER_LITE
LIVE PREVIEW

GTC 2018 Silicon Valley, California Predictive Learning of Factor - - PowerPoint PPT Presentation

GTC 2018 Silicon Valley, California Predictive Learning of Factor Based Strategies using Deep Neural Networks for Investment and Risk Management Yigal Jhirad and Blay Tarnoff March 27, 2018 GTC 2018: Table of Contents I. Deep Learning in


slide-1
SLIDE 1

Predictive Learning of Factor Based Strategies using Deep Neural Networks for Investment and Risk Management

Yigal Jhirad and Blay Tarnoff March 27, 2018

GTC 2018 Silicon Valley, California

slide-2
SLIDE 2

GTC 2018: Table of Contents

I. Deep Learning in Finance

— Forecasting Factor Regimes — Machine Learning Landscape — Deep Learning + Neural Networks — Neural Networks – ANN, RNN, LSTM — Optimization

  • II. Parallel Implementation
  • III. Summary
  • IV. Author Biographies

DISCLAIMER: This presentation is for information purposes only. The presenter accepts no liability for the content of this presentation, or for the consequences of any actions taken on the basis of the information provided. Although the information in this presentation is considered to be accurate, this is not a representation that it is complete or should be relied upon as a sole resource, as the information contained herein is subject to change.

2

slide-3
SLIDE 3

3

GTC 2018: Deep Learning

 Investment & Risk Management

— Forecast Market Returns, Volatility Regimes, Factor Trends, Liquidity, Economic Cycles — Big Data including Time Series Data, Interday, and Intraday — Neural Networks: Black Box/Pattern Recognition — Complement existing quantitative and qualitative signals

 Challenges include state dependency and stochastic nature of markets

— Time series — Overfitting/Underfitting — Stochastic Nature of Data

3

slide-4
SLIDE 4

GTC 2018: Factor Analysis

 Factor Analysis — Identify factors that are driving the market and predict relative factor performance — Establish a portfolio of sectors or stocks that benefits from factor performance — Align risk management with forecasts of volatility  Identifying and Assessing factors driving performance — Look at factors such as Value vs. Growth, Large Cap vs. Small Cap, Volatility

Period:12/2016-12/2017

4

slide-5
SLIDE 5

Artificial Intelligence

Data: Structured/Unstructured

Asset Prices, Volatility Fundamentals ( P/E,PCE, Debt to Equity) Macro (GDP Growth, Interest Rates, Oil prices) Technical(Momentum) News Events

Machine Learning

Unsupervised Learning

Cluster Analysis Principal Components Expectation Maximization

Supervised Learning (Linear/Nonlinear)

Deep Learning

Neural Networks

Support Vector Machines Classification & Regression Trees K-Nearest Neighbors Regression

Reinforcement Learning

Deep Learning Q-Learning Trial & Error

Jhirad, Yigal (2017)

5

slide-6
SLIDE 6

Inputs:

Fundamental/Macro/Technical Price/Earnings Momentum/RSI Realized & Implied Volatility Value vs Growth GDP Growth/Interest Rates Dollar Strength Credit Spreads

Feature(Factor)Identification & Regularization Forecast:

Market Returns Risk/Volatility Liquidity

∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂

𝑦2 𝑦1 𝑦3 𝑦4 𝑦5

Supervised Learning: Neural Networks

Jhirad, Yigal (2017)

6

slide-7
SLIDE 7

Supervised Learning: Neural Networks

Jhirad, Yigal (2018)

Forecast: Market Returns Risk/Volatility Liquidity

∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂

𝑦2 𝑦1 𝑦3 𝑦4 𝑦5

Forecast: Market Returns Risk/Volatility Liquidity

∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂ ∑|∂

𝑦2 𝑦1 𝑦3 𝑦4 𝑦5

Simple Feed-Forward Neural Network Recurrent Neural Network

7

slide-8
SLIDE 8

Neural Network Work Flow

Input Data: Prices, Fundamentals, Macro, Technical Structured/Unstructured Data Pre-Processing Normalization & Determine Model Parameters Forecast Outcome Training/Validation/Test Feedforward/Back Propagation/Genetic Algorithm

Jhirad, Yigal (2018)

8

slide-9
SLIDE 9

Next Hidden Layer/Output

𝒙𝒈 𝒙𝒑 𝒙𝒈𝒋 𝒙𝒋

𝒊𝒖

𝐘𝒋

𝒅𝒖 𝒊𝒖−𝟐 𝒊𝒖

Forget Gate Input Gate Output Gate

𝒊𝒖 𝒅𝒖−𝟐

Long Term Memory Short Term Memory

Next Time Period t+1 Previous Time Period t-1 Time +

GTC 2018: LSTM

Jhirad, Yigal (2018)

○ ○ ○

({h1,…,hm}(t-1)), ({xi,…, xn}(t)) {c1,…,cm}(t-1) f1,…,fm i1,…,im g1,…,gm

  • 1,…,om

({x1,…, xn}(t)) {h1,…,hm}(t-1)

Economic GDP Interest Rates Currency Style/Factor Momentum Value/Growth Volatility Fundamental P/E Debt/Equity

Yield

X∈ℝfactors×timeperiods

Inputs/Factors

t t+1 t-1

9

slide-10
SLIDE 10

GTC 2018: Predicting Volatility Regimes with LSTM

10

slide-11
SLIDE 11

11

GTC 2018: Neural Networks

 Neural Networks

— Feed-Forward vs. Recurrent Neural Networks — LSTM captures the temporal nature of financial data — Complement existing quantitative and qualitative signals  Advantages — Captures non-linearity that are prevalent in financial data — Time Sequencing, Pattern Recognition — Modularity — Parallel Processing  Considerations — Black Box — Overfitting/Underfitting — Optimization/Local Minima

11

slide-12
SLIDE 12

GTC 2018: Genetic Algorithms

  • Gradient Descent may not be efficient
  • Local Minimums pose a challenge
  • Genetic Algorithms complement traditional optimization techniques
  • Apply the computational power within CUDA to create a more robust

evolutionary algorithm to drive multi-layer Neural Networks

Local Maximum Local Minimum Local Maximum

12

slide-13
SLIDE 13

Neural Architecture

Feed forward: 4 layers: input, 2 hidden, output

t = 1 t = 2 t = 3

  • Output

Layer Hidden Layer 2 Hidden Layer 1 Input Layer

Normal Normal Normal LSTM LSTM LSTM LSTM LSTM LSTM

Long-term Short-term Long-term Short-term Long-term Short-term Long-term Short-term Memory Memory Memory Memory

13

slide-14
SLIDE 14

Neural Architecture

Transitional layer: Normal Weights (-1 to 1)

×

14

slide-15
SLIDE 15

Neural Architecture

Transitional layer: LSTM

Forget Remember Input Output

  • +

 

15

slide-16
SLIDE 16

Training

 4 matrices of weights in each LSTM layer plus one in normal layer equals 9 weight matrices  Goal of training is to find weights to populate those matrices that convert the input values to output values which most accurately reflect reality  Output values are computed from input values period-by-period and compared to actual values to yield mean squared error  Weight matrices are modified and process is re-run repeatedly until mean squared error ceases to improve Supervised

16

slide-17
SLIDE 17

Training

Supervised: feed forward

Compute mean squared error Modify weights

17

slide-18
SLIDE 18

Training

Genetic algorithm: terminology  Gene: one matrix of weights  Organism: a set of 9 weight matrices  Fitness: the mean squared error generated by an organism over the timeframe  Breeding population: set of organisms that have the lowest mean squared errors  Mating: process of splicing the corresponding genes of two organisms in randomly selected locations to produce new organisms  Mutation: the re-setting of randomly selected weights to new random values during the mating process  Generation: one iteration in which the breeding population mates and produces

  • ffspring

18

slide-19
SLIDE 19

Training

Genetic algorithm  Create a set of organisms (population) by creating a set of weight matrices for each, populated with random weights  Evaluate the fitness of each organism by feeding forward the input matrix through the neural network period-by-period and comparing the outputs to the matrix of actual values, yielding a mean squared error  Rank the organisms by their mean squared errors  Select mates for the fittest organisms and produce offspring: two new organisms  Add the offspring to the population, evaluate their fitness and re-rank the population  Drop the least fit organisms from the population to maintain the population size  Repeat the previous three steps until no offspring survive the previous step for some number of generations  Fittest organism is now a trained neural network

19

slide-20
SLIDE 20

Training

Genetic algorithm: mating  For each member of the breeding population, randomly select one of the remaining members of the population as a mate  For each weight matrix (gene), randomly select a splice point between 1 and half the size of the matrix  Swap the section of each mate’s matrix that begins at the splice point and ends at twice the splice point with the other mate, yielding two offspring  Randomly pick a set number of weights and change them to new random values (mutate)

20

slide-21
SLIDE 21

CUDA Architecture

Genetic algorithm: parallelism  Each organism can be run in parallel at grid level  Each matrix multiplication can be run in parallel at thread block level

Network composed entirely of non-recurrent layers enables 3 levels of parallelism

 Each period can also be run in parallel at grid level, since periods are independent

21

slide-22
SLIDE 22

CUDA Architecture

Genetic algorithm: parallelism  Each matrix multiplication can be run in parallel at thread block level  Each organism can be run in parallel at grid level

Network that contains a recurrent layer loses period-level parallelism at that layer

22

slide-23
SLIDE 23

CUDA Architecture

Genetic algorithm  Create population: For each of the 9 weight matrices:  Generate enough random numbers to populate all the organisms  For each organism, 2D pitch copy the random numbers  Launch grid of one block per organism to convert the random numbers to weights (good enough)  Evaluate initial population: Launch grid of one block per organism to evaluate its fitness, each block doing the following:  For each period, feed forward the period’s factors through the network and sum the squares of the differences between the output and the period’s actual values  Write resulting mean squared error and final 4 long and short-term memory vectors to global  Rank initial population by their mean squared errors

23

slide-24
SLIDE 24

CUDA Architecture

Genetic algorithm  Generational loop: Continues until some number of generations pass in which no

  • ffspring ranks better than the least fit organism:

 Prepare to mate: Generate enough random numbers for each weight matrix

  • f each breeding organism to select a mate, select a gene splice location,

select which weights to mutate, and produce the mutated weights  Mate: For each of the 9 weight matrices, launch grid of one block per breeding organism that randomly selects a mate organism, swaps a randomly selected section of their weights, mutates some weights, and writes those resulting weight matrices in place of two of the lowest ranked organisms  Evaluate offspring: Launch grid of one block per breeding organism to evaluate its fitness, as in step prior to generational loop, writing the resulting mean squared errors and memory vectors in place of two of the lowest ranking organisms  Re-rank the population by their mean squared errors  Write the weight matrices and memory vectors of the fittest organism to the host

24

slide-25
SLIDE 25

CUDA Architecture

Neural network: weight matrix structure Data actually resides in “short-term memory” variable in shared memory Data actually resides in “input” variable in shared memory Does not actually exist (hard-coded) Each column multiplied by “short-term memory” variable and summed to

  • utput

Each column multiplied by “input” variable and summed to output Each element added to output Output variable in shared memory

25

slide-26
SLIDE 26

CUDA Architecture

Organisms Neural network: weight matrix structure

26

slide-27
SLIDE 27

CUDA Architecture

Neural network: weight matrix structure Thread block size: 128 (binary number, multiple of warp size)

Wide matrix: spans many thread block widths

27

slide-28
SLIDE 28

CUDA Architecture

Neural network: weight matrix structure Thread block size: 128 (binary number, multiple of warp size)

Narrow matrix: spans a fraction of one thread block width

 Maximal coalescence  No bank conflicts (broadcast, except in rare circumstances)  No __syncthreads necessary (until final summation of narrow matrix)

28

slide-29
SLIDE 29

CUDA Architecture

Neural network: matrix multiplication simplified code

npk = number of logical rows of weight matrix per block (always 1 for wide matrices) nk = total number of blocks, i.e. rows of Weights (last block will usually be short for narrow matrices) nO = # output neurons (= # elements in Memory vector, = logical width of weight matrix) nI = # input neurons (= # elements in Input vector) // Multiplication of logical input vector by logical weight matrix k: loop vertically, i.e. by one block of logical matrix rows at a time

  • : loop horizontally (strided), i.e. by one thread block width at a time, starting at threadIdx.x

i = k * npk + o / nO // index to input vectors if i < nO: n = Memory[i] else if i < nO + nI: n = Input[i - nO] else: n = 1 Output[o] = fmaf(Weights[k][o], n, Output[o]) // Final summation of Output s: loop by diminishing strides (start at nO * blockDim.x / 2 and halve on each loop until 0) if s < npk * nO && s * 2 > warpSize: __syncthreads if threadIdx.x + s < npk * nO: Output[threadIdx.x] = fadd(Output[threadIdx.x], Output[threadIdx.x + s])]) Note that the outer (k) loop is trivial for wide matrices and the inner (o) loop is trivial for narrow Note that the second term is always 0 for wide matrices, in which case i = k, the

  • uter loop counter

29

slide-30
SLIDE 30

Summary

 Utilize an LSTM Neural Network to identify market regimes  Propose an Augmented LSTM Process that can help drive deep learning by identifying appropriate factors across market regimes

— Enhance construction by utilizing Optimization with Constraints function instead of penalty function — Utilize Genetic algorithms

 CUDA leverages GPU Hardware providing computational power to drive optimization algorithms and Deep Learning  Application in Investment and Risk Management

30

slide-31
SLIDE 31

Author Biographies

 Yigal D. Jhirad, Senior Vice President, is Director of Quantitative and Derivatives Strategies and a Portfolio Manager for Cohen & Steers’ options and real assets strategies. Mr. Jhirad heads the firm’s Investment Risk

  • Committee. He has 30 years of experience. Prior to joining the firm in 2007, Mr. Jhirad was an executive

director in the institutional equities division of Morgan Stanley, where he headed the company’s portfolio and derivatives strategies effort. He was responsible for developing, implementing and marketing quantitative and derivatives products to a broad array of institutional clients, including hedge funds, active and passive funds, pension funds and endowments. Mr. Jhirad holds a BS from the Wharton School. He is a Financial Risk Manager (FRM), as Certified by the Global Association of Risk Professionals.  Blay A. Tarnoff is a senior applications developer and database architect. He specializes in array programming and database design and development. He has developed equity and derivatives applications for program trading, proprietary trading, quantitative strategy, and risk management. He is currently a consultant at Cohen & Steers and was previously at Morgan Stanley.

31