Incorporating Knowledge into DNN for Financial Numeral - - PowerPoint PPT Presentation

incorporating knowledge into dnn for financial numeral
SMART_READER_LITE
LIVE PREVIEW

Incorporating Knowledge into DNN for Financial Numeral - - PowerPoint PPT Presentation

ASNLU at NTCIR-14 Finnum Task: Incorporating Knowledge into DNN for Financial Numeral Classification ChaoChun Liang Institute of Information Science Academia Sinica, Taipei June 12, 2019 0 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019


slide-1
SLIDE 1

ASNLU at NTCIR-14 Finnum Task:

Incorporating Knowledge into DNN for Financial Numeral Classification

ChaoChun Liang

Institute of Information Science Academia Sinica, Taipei June 12, 2019

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-2
SLIDE 2
  • Propos
  • posed

ed A Appr proa

  • aches

hes

  • Exper

perim imen ental R al Result ults

  • Discu

scussi ssion

  • Conc

nclu lusion ion

1

Outlin line

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-3
SLIDE 3
  • Purpos

pose: e: To unde derstand t and the f fine ne-gr grai ained ned numer eral al infor

  • rmat

atio ion i n in financ nancia ial T l Tweet et

2

Ta Task O Overview ew

(T1) 8 breakouts: $CHMT (stop: $17.99), $FLO (200-day MA), $OMX (gap), $SIRO (gap). One sub-$1 stock. Modest selection on attempted swing low. ”8” is a numeral about quantity “200” is a indicator of technical indicator ”17.99” is about stop loss price

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-4
SLIDE 4
  • Model

del the Numer eral al C Clas assific icat ation ion a as a a Sequen quence L e Labeli beling P ng Process

  • Input Word Sequence: W1, W2, … Wn
  • Output Label Sequence: T1, T2, … Tn

3

Propo posed A App pproac ach 1/5

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

M: main category class set, S: sub-category class set O: Not a target word to be classified

slide-5
SLIDE 5
  • Propos
  • pose a

e a token en r repr pres esent ntat ation ion w with h exter ernal k al knowle

  • wledge

dge to

  • rep

represent t the w he wor

  • rd

meanin aning i g in Tweet eet s sent ntenc ences es

  • Imple

lement ent t three v ee vanill nilla n a neur ural n al networ work models dels

4

Propo posed A App pproac ach 2/5

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-6
SLIDE 6

5

Propo posed A App pproac ach 3/5

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  • Token

en R Repr present entat atio ion

  • W: Pre-trained Word Embedding
  • P: Part-of-Speech, N: Named entity Type
  • C: Category-Pattern Feature (#=6)
  • Company. (‘$NTNX’)
  • Money. (‘$20

20’ or ’13 13$’)

  • Product number. (‘PS4’)
  • Date. (’11

11/09 09/17 17’ or ’11 11-09 09-17 17’)

  • Time. (‘6:45

45’ or ‘3:25 25 p.m.’)

  • Number. (’68

68’)

slide-7
SLIDE 7
  • CNN

CNN (det (detect l loc

  • cal pat

patterns, e. e.g.

  • g. ’85

85%’)

  • RNN

RNN (capt ptur ure c cont ntex ext i infor

  • rmat

atio ion) n)

  • RNN+

N+CN CNN (capt ptur ure l local al i info.

  • . i

in RNN) N)

6

Propo posed A App pproac ach 4/5

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-8
SLIDE 8
  • Rescor
  • rin

ing g in P Predic ediction ion T Time: e:

  • Exclude the Out-of-Category (‘O’) label from

the candidate set for each target numeral to avoid inconsistency.

7

Propo posed A App pproac ach 5/5

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-9
SLIDE 9
  • Pr

Pre-trai ained ned E Embedd bedding ing

  • GLOVE 840.300D
  • CNN

CNN

  • Kernel sizes of 2,3,4 and 5
  • 32 filters for each kernel
  • RNN

RNN

  • Bi-GRUs with 128 hidden nodes
  • Dropou
  • pout 0

0.5

8

Expe perime ment nt S Setting

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-10
SLIDE 10

9

Over erall P Performa mance

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE +Pattern 87.73 78.47 88.76 83.55 89.24 81.50 Task-1 Test Set Performance CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 69.88 58.66 75.22 71.72 73.94 65.54 +POS&NE 75.14 65.77 78.49 72.37 78.17 70.16 +POS&NE +Pattern 76.41 68.5 79.36 70.5 79.12 72.51 Task-2 Test Set Performance

“None” denotes the NN models without incorporating any knowledge. “POS&NE” denotes the NN models with both POS and NE information. “Pattern” denotes the NN models that incorporate category patterns specified by handcrafted rules.

slide-11
SLIDE 11
  • Divis

isio ion o n of clas assif ific ication r ion result ults betwee ween n CNN a N and d RNN m N models dels

10

Expe perime ment ntal Res Results 1/3

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63

Task-1 testing set performance

slide-12
SLIDE 12

11

Expe perime ment ntal Res Results 2/3

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63

  • OOVs p

prov

  • vide

ide no u useful ul Infor

  • rmation

ion

  • OOVs: 30+% on Development and Test sets

Task-1 testing set performance

  • Lingu

nguis istic ic I Infor

  • rmat

ation ion (POS&NE NE) a attac ache hed d to

  • OOVs i

impr mproved t the per he performance signif gnific icant ntly ly ( (4% ~ ~ 10%) %).

CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93

slide-13
SLIDE 13

12

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE +Pattern 87.73 78.47 88.76 83.55 89.24 81.50

Task-1 testing set performance

  • Categor

egory-pat atter ern f n featur ures es of

  • ffer s

smal mall impr prov

  • vem

ement nts or even d en degr grad ade e perf rforma rmance ce.

  • Not
  • t c

cov

  • ver eno

enough pat patterns for man

  • r manually-

encoded

  • ded r

rules les.

Expe perime ment ntal Res Results 3/3

slide-14
SLIDE 14
  • Issu

ssue-1: 1: High OOV OOV rate te

  • Issu

ssue-2: 2: Dive verse rse p pattern rns in T n Twee eet ( (Not

  • t

enough enough cover

  • verage w

age with handc h handcraf afted pat ed patter erns) ns)

  • Solut

lution ion: Nume mera ral-Spli plittin ing

  • Most OOVs are concatenations of a numeral

and other characters.

  • Split each token with numbers into individual

sub-tokens.

  • e.g., “FY22” -> ”FY” and “22”
  • e.g., “12/3/2017” -> “12”, “/”, “3”, ”/”, “2017”

13

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

Di Discussion 1/2

OOV Rate Dev Test Before 36% 39% After 22% 23%

slide-15
SLIDE 15

14

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

Di Discussion 2/2

  • Per

erforman ance i ce impr mprove ves s s signi gnificant cantly.

  • y. E.g.

g., 9% 9% (mi micr cro), 18%( 18%(ma macr cro) i in n RNN+CNN(“None

  • ne”).
  • Out

utper performs t the he handc handcraf afted pat ed patter erns. s.

Task-1 Test Set Performance (after Numeral Splitting) Task-1 Test Set Performance (before Numeral Splitting) CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 89.56 83.17 92.27 86.60 92.11 88.18 +POS&NE 90.68 83.60 91.95 88.36 92.99 88.25 CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE +Pattern 87.73 78.47 88.76 83.55 89.24 81.50

slide-16
SLIDE 16
  • The p

propos

  • posed

d token r en repr pres esen entat ation ( ion (wit ith h lingu nguis istic ic k knowled

  • wledge)

ge) impro rove ves s perfor

  • rmanc

ance s signi gnific icant antly ly.

  • A suitab

able p le pre-pro roce cessi ssing ( (split plitting ing nume mera rals) t to red

  • reduce OOV rat

rates is essent ential ial.

  • Joint

intly ly a adopt

  • pting

ing both a h appr proac

  • aches

es c could uld

  • ffer

er a addit dition ional b al benef nefit its.

15

Co Conc nclusion

  • n

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-17
SLIDE 17

16

Q & A Thanks

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-18
SLIDE 18
  • Erro

rrors mad made by by RNN wer ere due due t to

  • the

he model m del missing l ng local p al patter erns

  • E.g., “num/num” (Temporal) in “10/24”

“num%” (Percentage) in “7.8%”

  • Erro

rrors mad made by by CNN wer ere due due t to

  • the

he model m del missing c ng cont ntex ext i infor

  • rmat

ation ion

  • E.g., “You sol
  • ld ESPR at 11

11 and CLVS at 29 29 but thanks for this tip.”

17

App ppen endix – P10 10 1/ 1/2

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

slide-19
SLIDE 19
  • Erro

rrors mad made by by RNN and and C CNN bot both w wer ere due due t to

  • the

he num number can an not not be be cat ategorized explic plicitly ly (i.e.

  • e. n

need m ed more e infor

  • rmat

atio ion) n).

  • E.g., “$NGAS Buy on dips on $UGAZ $UNG.

Dip to 3.075, NG is on wave 3 move to 3.27

  • n 8HR chart.”

18

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

App ppen endix – P10 10 2/ 2/2

slide-20
SLIDE 20
  • Categor

egory F F-Scor

  • re o

e of RNN N with P h POS&NE NE and C d Categ egor

  • ry-Pat

atter erns ns

19

ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

App ppen endix – P12 12

+POS&NE +POS&NE +Pattern Monetary 0.9107 0.9085 Quantity 0.7727 0.7857 Percentage 0.9882 0.9882 Temporal 0.8978 0.8903 Product Number 0.3182 0.6818 Option 0.7727 0.7727 Indicator 0.7778 0.7037