Improving Forecasts of Extreme Values By Machine Learning Models - - PowerPoint PPT Presentation

▶

Oct 25, 2023 131 likes •259 views

Improving Forecasts of Extreme Values By Machine Learning Models Using Occam's Razor William W. Hsieh University of British Columbia (Visiting Scientist at Univ. of Victoria) American Meteorological Society Annual Meeting January 2018, Austin

SLIDE 1

Improving Forecasts of Extreme Values By Machine Learning Models Using Occam's Razor

William W. Hsieh

University of British Columbia (Visiting Scientist at Univ. of Victoria) American Meteorological Society Annual Meeting January 2018, Austin Tx

SLIDE 2

Introduction

l Machine learning (ML) methods developed mostly

for discrete data.

l In Environmental Sc.:

§Mostly continuous data. §Importance of extreme values. §Are ML methods not suited for extreme values?

l Continuous data: Wait long enough, a new

predictor value will lie outside the training range => ML model doing extrapolation!

l Extreme learning machine (ELM) – 1 hidden layer

artificial neural network (ANN) with random weights at hidden nodes.

§Ensemble average output from 100 runs. §3 choices of activation functions at hidden layer: (a) sigmoidal, (b) Gaussian (RadBas), (c) softplus.

SLIDE 3

Dash = true signal

y = x + 0.2 x2

x = training data Line = linear regr. Black = ELM with different activation

fn. in (a), (b), (c)

+ = with extrapolat. (d) ELM solutions in extended domain

2 4 6

5 10 15 20

(a) sigmoidal

2 4 6

5 10 15 20

(b) radial basis

2 4 6

5 10 15 20

50 100

20 40 60 80 100

(d) extended range (a) (b) (c)

SLIDE 4

l Occam’s razor: among competing hypotheses, the

ne with the fewest assumptions should be
selected. (Parsimony)

l In the extrapolation region, Occam would avoid

nonlinear ML models with many parameters -- but instead use linear model??

l New idea:

1) In predictor space, determine which test data points involve extrapolation (based on Mahalanobis distance to training dataset). 2) Use nonlinear ML solution to perform linear extrapolation.

l E.g.: predict Vancouver airport (YVR) prcp. amount

(on precip. days). 3 predictors: SLP , humidity, Z500 (NCEP Reanalysis), 1971-76 training, 1978-2000 testing.

SLIDE 5

0.5 1 1.5

x3 test data training data

utlier

centre

Use ML model to compute gradient to extrapolate

Extrapolate from nearest neighbour

SLIDE 6

0.5 1 1.5

x3 test data training data

utlier

centre

Extrapolate from centre of cluster

Use these 2 points to compute gradient for extrapolation

SLIDE 7

l Use both extrapolation schemes (each with a fine

and coarse finite difference estimate of the gradient for extrapolation) => 4 extrap. schemes

§Take median (of 4 extrap.schemes & original value)

l Compute mean absolute error (MAE), get skill score

(SS) relative to original ML model’s MAE.

l 4 datasets: YVR prcp, streamflow at Englishman

River (ENG) and Stave River (STA), sediment concentration at Fraser River (FRA).

§Also reversed training and testing data (rev).

l Ran ELM:

§200 trials with different random no. sequences.

SLIDE 8

ENG ENG(rev) STA STA(rev) YVR YVR(rev) FRA(rev)

0.2 0.4 0.6 0.8 1

MAE SS (extrapolated data)

sigmoid radbas softplus

SLIDE 9

ENG ENG(rev) STA STA(rev) YVR YVR(rev) FRA(rev)

0.2 0.4 0.6 0.8 1

MAE SS (extrapolated data using MLR)

sigmoid radbas softplus

Simple alternative: Train MLR (multiple linear regression) & use its output for the extrapolation pts.

SLIDE 10

MAE SS RMSE SS

corr. SS
0.8
0.6
0.4
0.2

0.2 0.4 0.6 0.8

Medians of skill scores

ELM with lin. extrap. MLR

Boxplot the 21 medians (of SS over 200 trials) for MLR and ELM (with linear extrap.) over the extrapolated data.

SLIDE 11

Conclusion & future work

l For extreme values, ML models often do nonlinear

extrapolation.

l Following Occam, proposed using linear

extrapolation instead of nonlinear extrapolation:

§Use nonlinear ML solution to linearly extrapolate. §Or simply use MLR model for the extrapolation points.

l Future improvements:

§Determination of outliers by Mahalanobis distance is not robust – replace with more robust method. §Some predictors may be discrete variables – will need to modify the current linear extrapolation schemes.