[PPT] - Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% PowerPoint Presentation

SLIDE 1

Wrapup: IE, QA, and Dialog

Mausam

SLIDE 2

Grading

50% 40% project
20% final exam
15% 20% regular reviews
15% 10% midterm survey
10% presentation
Extra credit: participation

SLIDE 3

Plan (1st half of the course)

Classical papers/problems in IE: Bootstrapping, NELL, Open IE
Important techniques for IE: CRFs, tree kernels, distant supervision,

joint inference, deep learning, reinforcement learning

IE++
coreference
paraphrases
inference

Plan (2nd half of the course)

QA:
Conversational agents:

SLIDE 4

Plan (1st half++ of the course)

Classical papers/problems in IE: Bootstrapping, NELL, Open IE
Important techniques for IE: Semi-CRFs, tree kernels, distant

supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning

IE++:

coreference

paraphrases
Inference: random walks, neural models

Plan (2nd half of the course)

QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN,

deep feature fusion network

Conversational agents: Gen. Hierarchical nets, GANs, MemNets

SLIDE 5

NLP (or any application course)

Techniques/Models
Bootstrapping
(coupled) Semi-SSL
PGMs: semi-CRF, MultiR, LDA
Tree Kernels
Multi-instance learning
Random walks over graphs
Reinforcement learning
CNN, LSTM, Bi-LSTM, Recursive NN
Attention, MemNets,
GANs
Problems
NER
Entity/Rel/Event Extraction
Open Rel/Event Extraction
Multi-task learning
KB inference
Open QA
Machine comprehension
Task-oriented dialog w/ KB
General dialog

SLIDE 6

How much data?

Large supervised dataset: supervised learning
Trick to compute large supervised dataset w/o noise
Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA,

random walks… (negative data can be artificial)

Small supervised dataset: semi-supervised learning
Bootstrapping, co-training, Graph-based SSL
No supervised dataset: unsupervised learning/rules
TwitIE
ReVerb
Trick to compute large supervised dataset with noise: distant supervision
MultiR, PCNNs

SLIDE 7

Non-deep L Ideas: Semi-supervised

Bootstrapping
(in a loop) automatic generation of training data by matching known facts
Multi-view / Multi-task co-training
Constraints between tasks; Agreement between multiple classifiers for same

concept

Graph-based SSL
Agreement between nodes of the graph

SLIDE 8

Non-deep L Ideas: distant supervision

KB of facts: known. Extraction supervision: unknown
Bootstrap a training dataset: matching sentences with facts
Hypothesis 1: all such sentences are positive training for a fact: NOISY
Hypothesis 2: all such sentences form a bag. Each bag must have a unique

relation: BETTER

Hypothesis 3: each bag can have multiple labels: EVEN BETTER
Multi-Instance Learning
Noisy OR in PGMs
maximize the max probability in the bag

SLIDE 9

Non-deep L Ideas: No Intermediate Supervision

QA tasks: (Question, Answer) pairs known; inference chain: unknown
Distant Supervision: KB fact known; which sentence to extract from: unknown
OQA (which proof is better is not known)
Random walk inference (which path is better is not known)
MultiR (which sentence in corpus is not known)
Approach
create a model for scoring each path/proof using weights on properties of each constituent
train using known supervision (perceptron style updates)
Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.

SLIDE 10

Non-deep L Ideas: Sparsity

Tree Kernels: two features (paths) are similar if one has many

constituent elements with the other. Similarity weighted by penalty to non-similar elements

Paraphrase dataset for QA
Open relations as supplements in KB inference

SLIDE 11

Deep Learning Models

Convolutional NNs
Handle fixed length contexts
Recurrent NNs
Handle small variable length histories
LSTMs/GRUs
Handle larger variable length histories
Bi-LSTMs
Handle larger variable length histories and futures
Recursive NNs
Handle variable length partially ordered histories

SLIDE 12

Deep Learning Models (contd)

Hierarchical Recurrent NNs
RNN over RNNs
Attention models
attach non-uniform importance to histories based on evidence (question)
Co-attention models
attach non-uniform importances to histories in two different NNs
MemNets
add an external storage with explicit read, write, updates
Generative Adversarial Nets
a better training procedure using actor-critic architecture

SLIDE 13

Hierarchical Models

Semi-CRFs: joint segmentation and labeling
Sentence is a sequence of segments, which are sequence of words
Allows segment level features to be added
HRED: LSTM over LSTM
Document is a sequence of sentences, which is a sequence of words
Conversation is a sequence of utterances, which is a sequence of words

SLIDE 14

RL for Text

Two uses
Use 1: search the Web to find easy documents for IE
Use 2: Policy gradient algorithm for updating weights for generator in

GANs.

SLIDE 15

Bootstrapping

[Akshay] Fuzzy matching between seed tuples and text
[Shantanu] Named entity tags in patterns
[Gagan, Barun] Confidence level for each pattern and fact
Semantic drift

SLIDE 16

NELL

Never-ending/lifelong learning
Human supervision to guide the learning
[many] multi-view multi-task co-training
[many] coupling constraints for high precision.
[Dinesh] ontology to define the constraints

SLIDE 17

Open IE

[many] ontology-free, scalablity
[Surag] data-driven research through extensive error analysis
[Dinesh] reusing datasets from one task to another
[Partha] open relations as supplementary knowledge to reduce

sparsity

SLIDE 18

Tree Kernels

[Shantanu] major info about the relation lies in the shortest path of

the dependency parse

SLIDE 19

Semi-CRFs

[many] segment level features in CRF
[Dinesh] joint segmentation and labeling ?
Order L CRFs vs Semi-CRFs

SLIDE 20

MultiR

[Rishab] Use of KB to create a training set
[Surag] multi-instance learning in PGMs
[Akshay] relationship between sentence-level and aggregate extractions
[Gagan] Vitterbi approximation (replace expectation with max)

SLIDE 21

PCNNs

[Haroun] Max pooling to make layers independent of sentence size
[Akshay] Piecewise max pooling to capture arg1, rel, arg2
[Akshay] Multi-instance learning in neural nets
Positional embeddings

SLIDE 22

TwitIE

[Haroun] tweets are challenging, but redundancy is good
[Dinesh] G2 test for ranking entities for a given date
[Shantanu] event type discovery using topic models

SLIDE 23

RL for IE

[many] active querying for gathering external evidence

SLIDE 24

PRA for KB inference

[Haroun, Akshay] low variance sampling
[Arindam] learning non-functional relations
[Nupur] paths as features in a learning model

SLIDE 25

Joint MF-TF

[Akshay, Shantanu] OOV handling
[Nupur] loss function in joint modeling

SLIDE 26

Open QA

[Surag] structured perceptron in a pipeline model
[Akshay] paraphrase corpus for question rewriting
[Shantanu] mining paraphrase operators from corpus
[Arindam] decomposition of scoring over derivation steps

SLIDE 27

LSTMs

[Haroun] attention > depth
[Akshay] cool way to construct the dataset
[Dinesh] two types of readers

SLIDE 28

Co-attention

[many] iterative refinement of answer span selection*

SLIDE 29

HRED

[Akshay] pretraining dialog model with a QA dataset
[Arindam] passing intermediate context improves coherence?
[Barun] split of local dialog generator and global state tracker

SLIDE 30

MSQU

[many] partially annotated data
[many] natural language -> SQL

SLIDE 31

GANs

[many] teacher forcing
[Akshay] interesting heuristics
[Arindam] discriminator feedback can be backpropagated despite

being non-differentiable

SLIDE 32

MemNets

[Surag] typed OOVs
[Haroun] hops
[Shantanu, Gagan] subtask-styled evaluation

SLIDE 33

Open/Next Issues

IE: mature?
Event extraction
Temporal extraction
Rapid retargettability
KB Inference
Long way to go
Combining DL and path-based models

SLIDE 34

Open/Next Issues

QA systems
Dataset driven research: [MC] SQUaD – tremendous progress
Answering in the wild: not clear (large answer spaces?)
Deep learning for large-scale QA
Conversational agents
[Task driven] how to get DL model to issue a variety of queries
[General] how to get the system to say something interesting?
DL: what are the systems really capturing!?

SLIDE 35

Conclusions

Learn key historical developments in IE
Learn (some) state of the art in IE, inference, QA and dialog
Learn how to critique strengths and weaknesses of a paper
Learn how to brainstorm next steps and future directions
Learn how to summarize an advanced area of research
Learn to do research at the cutting edge

SLIDE 36

Exam

Bring a laptop
Internet enabled
PDFLatex enabled
Bring a mobile
Taking a picture
Extension cords
It is ok even if you have not deeply understood every paper

SLIDE 37

Project Presentations

Motivation & Problem definition
1 Slide of Contribution
Background
Technical Approach
Experiments
Analysis
Conclusions
Future Work