[PPT] - Recursive Neural Networks and Its Applications LU Yangyang PowerPoint Presentation

SLIDE 1

Outline RNNs RNNs-FQA RNNs-NEM

Recursive Neural Networks and Its Applications

LU Yangyang

luyy11@sei.pku.edu.cn

KERE Seminar

Oct. 29, 2014

SLIDE 2

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 3

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Anormal Event Detection in Newswire

SLIDE 4

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Artificial Neural Networks:

For a single neuron:

Input x, Output y, Parameters W, b, Activation Function f z = Wx + b, y = f(z)

For a simple ANN:

zl = Wlxl + bl, yl+1 = f(zl)

SLIDE 5

Outline RNNs RNNs-FQA RNNs-NEM

Introduction (cont.)

Using neural networks: learning word vector representations

SLIDE 6

Outline RNNs RNNs-FQA RNNs-NEM

Introduction (cont.)

Using neural networks: learning word vector representations Word-level Representation → Sentence-level Representation?

SLIDE 7

Outline RNNs RNNs-FQA RNNs-NEM

Introduction (cont.)

Using neural networks: learning word vector representations Word-level Representation → Sentence-level Representation? A kind of solutions:

Composition: Using syntactical information

SLIDE 8

Outline RNNs RNNs-FQA RNNs-NEM

Recursive AutoEncoder1

Given a sentence s, we can get its binary parsing tree.

1R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for

Paraphrase Detection. NIPS’11

SLIDE 9

Outline RNNs RNNs-FQA RNNs-NEM

Recursive AutoEncoder1

Given a sentence s, we can get its binary parsing tree.

Children nodes: c1, c2
Parent nodes: p = f(We[c1; c2] + b)
We: Encoding weight, f: Activation function,

b: Bias weight

1R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for

Paraphrase Detection. NIPS’11

SLIDE 10

Outline RNNs RNNs-FQA RNNs-NEM

Recursive AutoEncoder1

Given a sentence s, we can get its binary parsing tree.

Children nodes: c1, c2
Parent nodes: p = f(We[c1; c2] + b)
We: Encoding weight, f: Activation function,

b: Bias weight Training: Encouraging decoding results to be near the original representations

1R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for

Paraphrase Detection. NIPS’11

SLIDE 11

Outline RNNs RNNs-FQA RNNs-NEM

Dependency Tree based RNNs2

Given a sentence s, we can get its dependency tree. Then we add hidden nodes for each word node and get the reformed tree d.

2R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images

with sentences. Transactions of the Association for Computational Linguistics’14

SLIDE 12

Outline RNNs RNNs-FQA RNNs-NEM

Dependency Tree based RNNs2

Given a sentence s, we can get its dependency tree. Then we add hidden nodes for each word node and get the reformed tree d.

2R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images

with sentences. Transactions of the Association for Computational Linguistics’14

SLIDE 13

Outline RNNs RNNs-FQA RNNs-NEM

Dependency Tree based RNNs2

Given a sentence s, we can get its dependency tree. Then we add hidden nodes for each word node and get the reformed tree d.

For each node hi in the tree t: hi = f(zi) (1) zi = 1 l(i) (Wvxi + ∑︂

j∈C(i)

l(j)Wpos(i,j)hj)) (2) where xi, hi, zi ∈ Rn, Wv, Wpos(i,j) ∈ Rn×n l(i) : the number of leaf nodes under hi C(i) : the set of hidden nodes under hi pos(i, j) : the position of hj respect to hi, such as l1, r1 Wl = (Wl1, Wl2, ..., Wlkl) ∈ Rkl×n×n, Wr = (Wr1, Wr2, ..., Wrkr ) ∈ Rkr×n×n kl, kr : the max left, right width in the dataset

2R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images

with sentences. Transactions of the Association for Computational Linguistics’14

SLIDE 14

Outline RNNs RNNs-FQA RNNs-NEM

Tasks using RNNs I

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. 2014. Grounded com-

positional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics.

M. Luong, R. Socher, and C. D. Manning. 2013. Better word representations with recursive

neural networks for morphology. In CoNLL.

R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng. 2013a. Parsing With Compositional

Vector Grammars. In ACL.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts.

2013d. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP.

E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving Word Representations

via Global Context and Multiple Word Prototypes. In ACL.

R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012a. Convolutional- Recursive

Deep Learning for 3D Object Classification. In NIPS.

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng.

2012b. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In EMNLP.

SLIDE 15

Outline RNNs RNNs-FQA RNNs-NEM

Tasks using RNNs II

R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. 2011a. Dynamic

Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS.

R. Socher, C. Lin, A. Y. Ng, and C.D. Manning. 2011b. Parsing Natural Scenes and Natural

Language with Recursive Neural Networks. In ICML.

R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning.

2011c. Semi- Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP.

R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase representa-

tions and syntactic parsing with recursive neural networks. In NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.

R. Socher and L. Fei-Fei. 2010. Connecting modalities: Semi-supervised segmentation and

annotation of images using unaligned text corpora. In CVPR.

L-J. Li, R. Socher, and L. Fei-Fei. 2009. Towards total scene understanding: classification,

annotation and segmentation in an automatic framework. In CVPR.

SLIDE 16

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire

SLIDE 17

Outline RNNs RNNs-FQA RNNs-NEM

RNN for Factoid Question Answering

A Neural Network for Factoid Question Answering over Paragraphs
EMNLP’14
Mohit Iyyer1, Jordan Boyd-Graber2, Leonardo Claudino1, Richard

Socher3, Hal Daum´ e III1

1 University of Maryland, Department of Computer Science and UMIACS 2 University of Colorado, Department of Computer Science 3 Stanford University, Department of Computer Science

SLIDE 18

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Factoid Question Answering:

Given a description of an entity,

identify the person, place, or thing discussed. Quiz Bowl:

A Task: Mapping natural language text to entities
A challenging natural language problem with large amounts of diverse

and compositional data

Answer: the Holy Roman Empire

→ QANTA: A question answering neural network with trans-sentential averaging

SLIDE 19

Outline RNNs RNNs-FQA RNNs-NEM

Quiz Bowl

A Game: Mapping raw text to a large set of well-known entities
Questions: 4 ∼ 6 sentences
Every sentence in a quiz bowl question is guaranteed to contain clues

that uniquely identify its answer, even without the context of previous sentences.

A property “pyramidality”: sentences early in a question contain harder,

more obscure clues, while later sentences are “giveaways”.

Answering the question correctly requires an actual understanding of

the sentence.

Factoid answers:

e.g., history questions ask players to identify specific battles, presi- dents, or events

SLIDE 20

Outline RNNs RNNs-FQA RNNs-NEM

Quiz Bowl

A Game: Mapping raw text to a large set of well-known entities
Questions: 4 ∼ 6 sentences
Every sentence in a quiz bowl question is guaranteed to contain clues

that uniquely identify its answer, even without the context of previous sentences.

A property “pyramidality”: sentences early in a question contain harder,

more obscure clues, while later sentences are “giveaways”.

Answering the question correctly requires an actual understanding of

the sentence.

Factoid answers:

e.g., history questions ask players to identify specific battles, presi- dents, or events Solutions: Bag-of-Words V.S. Recursive Neural Networks

SLIDE 21

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 22

Outline RNNs RNNs-FQA RNNs-NEM

How to represent question sentences?

For a single sentence:

A sentence – A dependency tree
Each node n: associated with a word w,

a word vector xw ∈ Rd, and a hidden vector hn ∈ Rd

Weights:

Wr ∈ Rd×d: with each dependency relation r Wv ∈ Rd×d: to incorporate xw at a node into the node vector hn (d = 100 in the experiments)

SLIDE 23

Outline RNNs RNNs-FQA RNNs-NEM

How to represent a single sentence?

SLIDE 24

Outline RNNs RNNs-FQA RNNs-NEM

How to represent a single sentence?

For any node n with children K(n) and word vector xw: hn = f(Wv · xw + b + ∑︂

k∈K(n)

WR(n,k)·hk) where R(n, k) : the dependency relation between node n and child node k

SLIDE 25

Outline RNNs RNNs-FQA RNNs-NEM

Training

Goal: Mapping questions to their corresponding answer entities A limited number of possible answers

3Different from multimodal text-to-image mapping problem

SLIDE 26

Outline RNNs RNNs-FQA RNNs-NEM

Training

Goal: Mapping questions to their corresponding answer entities A limited number of possible answers → A multi-class classification task

3Different from multimodal text-to-image mapping problem

SLIDE 27

Outline RNNs RNNs-FQA RNNs-NEM

Training

Goal: Mapping questions to their corresponding answer entities A limited number of possible answers → A multi-class classification task

A softmax layer over every node in the tree could predict answers
Observation:

Most answers are themselves words (features) 3 in other questions (e.g., a question on World War II might mention the Battle of the Bulge and vice versa)

Improving upon the exsiting DT-RNNs:

Jointly learning answer and question representations in the same vec- tor space rather than learning them separately

3Different from multimodal text-to-image mapping problem

SLIDE 28

Outline RNNs RNNs-FQA RNNs-NEM

Training (cont.)

Intuition: Encourage the vectors of question sentences to be near their correct answers and far away from incorrect answers

SLIDE 29

Outline RNNs RNNs-FQA RNNs-NEM

Training (cont.)

Intuition: Encourage the vectors of question sentences to be near their correct answers and far away from incorrect answers The Error For A Single Sentence: C(S, θ) = ∑︂

s∈S

∑︂

z∈Z

L(rank(c, s, Z))max(0, 1 − xc · hs + xz · hs) where S : the set of all nodes in the sentence’s dependency tree s : an individual node in S c : the correct answer Z : the set of randomly selected incorrect answers (|Z| = 100 in experiments) z : an individual incorrect answer in Z rank(c, s, Z) : the rank of correct answer c with respect to the incorrect answers Z L(r) =

r

∑︂

i=1

1/i

SLIDE 30

Outline RNNs RNNs-FQA RNNs-NEM

Training (cont.)

Intuition: Encourage the vectors of question sentences to be near their correct answers and far away from incorrect answers The Error For A Single Sentence: C(S, θ) = ∑︂

s∈S

∑︂

z∈Z

L(rank(c, s, Z))max(0, 1 − xc · hs + xz · hs) where S : the set of all nodes in the sentence’s dependency tree s : an individual node in S c : the correct answer Z : the set of randomly selected incorrect answers (|Z| = 100 in experiments) z : an individual incorrect answer in Z rank(c, s, Z) : the rank of correct answer c with respect to the incorrect answers Z L(r) =

r

∑︂

i=1

1/i

SLIDE 31

Outline RNNs RNNs-FQA RNNs-NEM

Training (cont.)

The objective function of whole model: J(θ) = 1 N ∑︂

t∈T

C(t, θ) where T : all sentences in the training set N : the number of nodes in the training set θ = (Wr∈R, Wv, We, b)

SLIDE 32

Outline RNNs RNNs-FQA RNNs-NEM

From Sentences to Questions

The model just described: considers each sentence in a quiz bowl question independently

SLIDE 33

Outline RNNs RNNs-FQA RNNs-NEM

From Sentences to Questions

The model just described: considers each sentence in a quiz bowl question independently Previously-heard sentences within the same question contain useful information that we do not want our model to ignore.

SLIDE 34

Outline RNNs RNNs-FQA RNNs-NEM

From Sentences to Questions

The model just described: considers each sentence in a quiz bowl question independently Previously-heard sentences within the same question contain useful information that we do not want our model to ignore. Sentence-level representation → Larger paragraph-level representation

The simplest and best aggregation method:

Averaging the representations of each sentence seen so far in a par- ticular question

SLIDE 35

Outline RNNs RNNs-FQA RNNs-NEM

From Sentences to Questions

The model just described: considers each sentence in a quiz bowl question independently Previously-heard sentences within the same question contain useful information that we do not want our model to ignore. Sentence-level representation → Larger paragraph-level representation

The simplest and best aggregation method:

Averaging the representations of each sentence seen so far in a par- ticular question → QANTA:

A question answering neural network with trans-sentential averaging

SLIDE 36

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 37

Outline RNNs RNNs-FQA RNNs-NEM

Datasets

1. Expand previous4: 46, 842 questions in 14 different categories
2. NAQT5: 65, 212 questions
3. Selected: 21, 041 literature and 22, 956 history questions(> 40% of the coprus)

4Jordan Boyd-Graber, et al. 2012. Besting the quiz master: Crowdsourcing incremental classification games. In EMNLP. 5Running quiz bowl tournaments and generously shared with us all of their questions from 1998-2013 6https://pypi.python.org/pypi/Whoosh/

SLIDE 38

Outline RNNs RNNs-FQA RNNs-NEM

Datasets

1. Expand previous4: 46, 842 questions in 14 different categories
2. NAQT5: 65, 212 questions
3. Selected: 21, 041 literature and 22, 956 history questions(> 40% of the coprus)
4. Only consider a limited set of the most popular quiz bowl answers
5. Wikipedia titles as training labels:

Mappping all raw answer strings to a canonical set (By Woosh6)

6. Filtering out all answers that do not occur at least 6 times:

451/4, 460 history and 595/5, 685 literature answers-questions (aver. 12 times)

7. Replacing all occurrences of answers in the question with single entities (NER)

4Jordan Boyd-Graber, et al. 2012. Besting the quiz master: Crowdsourcing incremental classification games. In EMNLP. 5Running quiz bowl tournaments and generously shared with us all of their questions from 1998-2013 6https://pypi.python.org/pypi/Whoosh/

SLIDE 39

Outline RNNs RNNs-FQA RNNs-NEM

Datasets

1. Expand previous4: 46, 842 questions in 14 different categories
2. NAQT5: 65, 212 questions
3. Selected: 21, 041 literature and 22, 956 history questions(> 40% of the coprus)
4. Only consider a limited set of the most popular quiz bowl answers
5. Wikipedia titles as training labels:

Mappping all raw answer strings to a canonical set (By Woosh6)

6. Filtering out all answers that do not occur at least 6 times:

451/4, 460 history and 595/5, 685 literature answers-questions (aver. 12 times)

7. Replacing all occurrences of answers in the question with single entities (NER)

Final Datasets: Training/Test Word Embedding Initialization: word2vec Category Questions Sentences History 3, 761/699 14, 217/2, 768 Literature 4, 777/908 17, 972/3, 577

4Jordan Boyd-Graber, et al. 2012. Besting the quiz master: Crowdsourcing incremental classification games. In EMNLP. 5Running quiz bowl tournaments and generously shared with us all of their questions from 1998-2013 6https://pypi.python.org/pypi/Whoosh/

SLIDE 40

Outline RNNs RNNs-FQA RNNs-NEM

Baselines

BOW: a logistic regression classifier trained on binary unigram indicators
BOW-DT: BOW + the feature set with dependency relation indicators
IR-QB: using the state-of-the-art Whoosh IR engine + KB that contains “pages”

associated with each answer

IR-WIKI: IR-QB + Wikipedia KB

SLIDE 41

Outline RNNs RNNs-FQA RNNs-NEM

Human Comparision

Human records: 1, 201 history guesses and 1, 715 literature guesses from 22 of

the quiz bowl players who answered the most questions

Standard scoring system for quiz bowl: 10 points - correct, −5 points - incorrect

SLIDE 42

Outline RNNs RNNs-FQA RNNs-NEM

Discussion

Making prediction on early sentence positions
Where the Attribute Space Helps Answer Questions
Where all Models Struggle
Visualizing the Attribute Space

SLIDE 43

Outline RNNs RNNs-FQA RNNs-NEM

Discussion(cont.)

Making prediction on early sentence positions
Where the Attribute Space Helps Answer Questions
Where all Models Struggle
Visualizing the Attribute Space

SLIDE 44

Outline RNNs RNNs-FQA RNNs-NEM

Discussion(cont.)

Making prediction on early sentence positions
Where the Attribute Space Helps Answer Questions
Where all Models Struggle
Visualizing the Attribute Space

SLIDE 45

Outline RNNs RNNs-FQA RNNs-NEM

Summary

Problem: Factoid Question Answering
Quiz Bowl Competition: question sentences → entities

SLIDE 46

Outline RNNs RNNs-FQA RNNs-NEM

Summary

Problem: Factoid Question Answering
Quiz Bowl Competition: question sentences → entities
Approach: QANTA
A question answering neural network with trans-sentential averaging
A limited number of possible answers

→ A multi-class classification task

Single Sentence Representation: DT-RNN
Jointly learning answer and question representations in the same vector

space

SLIDE 47

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 48

Outline RNNs RNNs-FQA RNNs-NEM

RNN for Anormal Event Detection in Newswire

Modeling Newswire Events using Neural Networks for Anomaly De-

tection

COLING’14
Pradeep Dasigi, Eduard Hovy

Carnegie Mellon University, Language Technologies Institute

SLIDE 49

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

SLIDE 50

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

What are anomalous events (in Newswire)?

SLIDE 51

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

What are anomalous events (in Newswire)?

SLIDE 52

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

What are anomalous events (in Newswire)?

Defining anomalous events as those that are unusual compared to the general state of affairs and might invoke surprise when reported

SLIDE 53

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

What are anomalous events (in Newswire)?

Defining anomalous events as those that are unusual compared to the general state of affairs and might invoke surprise when reported

Understanding events: requiring our knowledge about the role fillers

Hypothesis: Anomaly is a result of unexpected or unusual combination of semantic role fillers. → Encoding the goodness of semantic role filler coherence

SLIDE 54

Outline RNNs RNNs-FQA RNNs-NEM

Introduction

Problem: Automatic anomalous event detection in Newswire (normal - anomalous)

What are anomalous events (in Newswire)?

Defining anomalous events as those that are unusual compared to the general state of affairs and might invoke surprise when reported

Understanding events: requiring our knowledge about the role fillers

Hypothesis: Anomaly is a result of unexpected or unusual combination of semantic role fillers. → Encoding the goodness of semantic role filler coherence

Event level anomaly is not the same as semantic incoherence

Defining anomalous events to be the sub class of those that are semantically coherent, but are unusual only based on real world knowledge

SLIDE 55

Outline RNNs RNNs-FQA RNNs-NEM

What are Semantic Roles? 8

Semantic Roles (i.e. Thematic Roles):

Used to indicate the role played by each entity in a sentence
Ranging from very specific to very general.
The entities that are labelled should have participated in an event.

7http://nlp.stanford.edu/projects/shallow-parsing.shtml 8http://language.worldofcomputing.net/semantics/semantic-roles.html

SLIDE 56

Outline RNNs RNNs-FQA RNNs-NEM

What are Semantic Roles? 8

Semantic Roles (i.e. Thematic Roles):

Used to indicate the role played by each entity in a sentence
Ranging from very specific to very general.
The entities that are labelled should have participated in an event.

7http://nlp.stanford.edu/projects/shallow-parsing.shtml 8http://language.worldofcomputing.net/semantics/semantic-roles.html

SLIDE 57

Outline RNNs RNNs-FQA RNNs-NEM

What are Semantic Roles? 8

Semantic Roles (i.e. Thematic Roles):

Used to indicate the role played by each entity in a sentence
Ranging from very specific to very general.
The entities that are labelled should have participated in an event.

7

7http://nlp.stanford.edu/projects/shallow-parsing.shtml 8http://language.worldofcomputing.net/semantics/semantic-roles.html

SLIDE 58

Outline RNNs RNNs-FQA RNNs-NEM

Semantic Roles (Semantic Arguments) 9

AGENT One who performs some actions Joe played well and won the price. CAUSE One that causes something or A reason for some happenings Rain makes me happy. EXPERIENCER One who experienced Johan felt very painful when heard of the sudden demise of his friend. BENEFICIARY One who gets benefits I prayed early in the morning for Susan. LOCATION The location Steve was swimming in theriver. MANNER The way in which one behave and talk when he or she is with other people Tom behaved very gently even when he was insulted. INSTR The instrument Tom broke the wooden box withthe hammer. FROM-LOC From location John received the prize from the President. TO-LOC To location Susan threw a pen to John. AT-LOC At location The box contains a ball. AT-TIME At time I woke up at 5 clock to prepare for the examination.

AGENT only: Joe walked.
AGENT + INSTR: Joe flies with a parchute.
AGENT + INSTR + BENEFICIARY: Joe flies with a parachute for charity.

9http://language.worldofcomputing.net/semantics/semantic-roles.html

SLIDE 59

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 60

Outline RNNs RNNs-FQA RNNs-NEM

Neural Event Model(NEM)

An Event: the pair (V, A)

V : the predicate or a semantic verb10
A: the set of its semantic arguments11

10“attacks” in “Terrorist attacks on the World Trade Center..” 11e.g. agent, patient, time, location

SLIDE 61

Outline RNNs RNNs-FQA RNNs-NEM

Neural Event Model(NEM)

An Event: the pair (V, A)

V : the predicate or a semantic verb10
A: the set of its semantic arguments11

→ Learning event embeddings explicitly guided by the semantic role structure

10“attacks” in “Terrorist attacks on the World Trade Center..” 11e.g. agent, patient, time, location

SLIDE 62

Outline RNNs RNNs-FQA RNNs-NEM

Neural Event Model(NEM)

An Event: the pair (V, A)

V : the predicate or a semantic verb10
A: the set of its semantic arguments11

→ Learning event embeddings explicitly guided by the semantic role structure

Neural Event Model: Recursive AutoEncoder (Binary Tree)

10“attacks” in “Terrorist attacks on the World Trade Center..” 11e.g. agent, patient, time, location

SLIDE 63

Outline RNNs RNNs-FQA RNNs-NEM

Training

Unsupervised: Argument Composition
Contrastive estimation fashion:

s: the original entire argument sc: randomly replacing one of the words in the argument at a time V : the set of representations of all the words in the vocabulary

Supervised: Event Composition
Labeled training set: whether the event is normal or anomalous

k: the number of semantic arguments per event Levent: the label operator

SLIDE 64

Outline RNNs RNNs-FQA RNNs-NEM

Outline

Recursive Neural Networks RNNs for Factoid Question Answering RNNs for Quiz Bowl Experiments RNNs for Anormal Event Detection in Newswire Neural Event Model (NEM) Experiments

SLIDE 65

Outline RNNs RNNs-FQA RNNs-NEM

Datasets

Event Extraction:

Using the Semantic Role Labeling (SRL) tool in SENNA12 13
Considering only the roles A0(AGENT), A1(AGENT),AM-TMP(TIME) and AM-

LOC(LOCATION) as the arguments of the events Data:

NBC Weird Events (NWE):

Crawling 3684 “weird news”headlines available publicly on the website of NBC news14→ 4, 271 events extracted by SENNA → 3, 771 as negative training data

Gigaword Events (GWE):

Extracting events from headlines in the AFE section of Gigaword Sampling roughly 3, 771 GWE events as positive data

Augment composition: 100k whole sentences from AFE headlines and the weird

news headlines from which NWE are extracted Word Embedding Initialization: SENNA

12Using tags in CoNLL 2005: http://www.lsi.upc.edu/?$\sim$rlconll 13http://ml.nec-labs.com/senna/ 14http://www.nbcnews.com

SLIDE 66

Outline RNNs RNNs-FQA RNNs-NEM

Evaluation

Test Set Annotation: 1003 events as HIT on AMT 15

3-way: highly unusual, strange, normal
4-way: highly unusual, strange, normal and cannot say

True Label Predictions: Generated by MACE16

Merging the two anomaly classes → binary classification

15Human Intelligence Tasks on Amazon Mechanical Turk 16Dirk Hovy,et al. 2013. Learning whom to trust with mace. In NAACL-HLT.

SLIDE 67

Outline RNNs RNNs-FQA RNNs-NEM

Summary

Problem:

Automatic anomalous event detection in Newswire (normal - anomalous)

Approach:
Semantic Role Labeling: Event Extraction and Semantic Structure
Neural Event Model: Recursive AutoEncoder (Binary Tree)
Unsupervised Argument Composition Training

+ Supervised Event Composition Training

Data: NBC Weird News + Gigaword AFE Section