Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - - PowerPoint PPT Presentation

wrapup ie qa and dialog
SMART_READER_LITE
LIVE PREVIEW

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation Plan (1 st half of the course) Classical


slide-1
SLIDE 1

Wrapup: IE, QA, and Dialog

Mausam

slide-2
SLIDE 2

Grading

  • 50% 40% project
  • 20% final exam
  • 15% 20% regular reviews
  • 15% 10% midterm survey
  • 10% presentation
  • Extra credit: participation
slide-3
SLIDE 3

Plan (1st half of the course)

  • Classical papers/problems in IE: Bootstrapping, NELL, Open IE
  • Important techniques for IE: CRFs, tree kernels, distant supervision,

joint inference, deep learning, reinforcement learning

  • IE++
  • coreference
  • paraphrases
  • inference

Plan (2nd half of the course)

  • QA:
  • Conversational agents:
slide-4
SLIDE 4

Plan (1st half++ of the course)

  • Classical papers/problems in IE: Bootstrapping, NELL, Open IE
  • Important techniques for IE: Semi-CRFs, tree kernels, distant

supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning

  • IE++:

coreference

  • paraphrases
  • Inference: random walks, neural models

Plan (2nd half of the course)

  • QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN,

deep feature fusion network

  • Conversational agents: Gen. Hierarchical nets, GANs, MemNets
slide-5
SLIDE 5

NLP (or any application course)

  • Techniques/Models
  • Bootstrapping
  • (coupled) Semi-SSL
  • PGMs: semi-CRF, MultiR, LDA
  • Tree Kernels
  • Multi-instance learning
  • Random walks over graphs
  • Reinforcement learning
  • CNN, LSTM, Bi-LSTM, Recursive NN
  • Attention, MemNets,
  • GANs
  • Problems
  • NER
  • Entity/Rel/Event Extraction
  • Open Rel/Event Extraction
  • Multi-task learning
  • KB inference
  • Open QA
  • Machine comprehension
  • Task-oriented dialog w/ KB
  • General dialog
slide-6
SLIDE 6

How much data?

  • Large supervised dataset: supervised learning
  • Trick to compute large supervised dataset w/o noise
  • Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA,

random walks… (negative data can be artificial)

  • Small supervised dataset: semi-supervised learning
  • Bootstrapping, co-training, Graph-based SSL
  • No supervised dataset: unsupervised learning/rules
  • TwitIE
  • ReVerb
  • Trick to compute large supervised dataset with noise: distant supervision
  • MultiR, PCNNs
slide-7
SLIDE 7

Non-deep L Ideas: Semi-supervised

  • Bootstrapping
  • (in a loop) automatic generation of training data by matching known facts
  • Multi-view / Multi-task co-training
  • Constraints between tasks; Agreement between multiple classifiers for same

concept

  • Graph-based SSL
  • Agreement between nodes of the graph
slide-8
SLIDE 8

Non-deep L Ideas: distant supervision

  • KB of facts: known. Extraction supervision: unknown
  • Bootstrap a training dataset: matching sentences with facts
  • Hypothesis 1: all such sentences are positive training for a fact: NOISY
  • Hypothesis 2: all such sentences form a bag. Each bag must have a unique

relation: BETTER

  • Hypothesis 3: each bag can have multiple labels: EVEN BETTER
  • Multi-Instance Learning
  • Noisy OR in PGMs
  • maximize the max probability in the bag
slide-9
SLIDE 9

Non-deep L Ideas: No Intermediate Supervision

  • QA tasks: (Question, Answer) pairs known; inference chain: unknown
  • Distant Supervision: KB fact known; which sentence to extract from: unknown
  • OQA (which proof is better is not known)
  • Random walk inference (which path is better is not known)
  • MultiR (which sentence in corpus is not known)
  • Approach
  • create a model for scoring each path/proof using weights on properties of each constituent
  • train using known supervision (perceptron style updates)
  • Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.
slide-10
SLIDE 10

Non-deep L Ideas: Sparsity

  • Tree Kernels: two features (paths) are similar if one has many

constituent elements with the other. Similarity weighted by penalty to non-similar elements

  • Paraphrase dataset for QA
  • Open relations as supplements in KB inference
slide-11
SLIDE 11

Deep Learning Models

  • Convolutional NNs
  • Handle fixed length contexts
  • Recurrent NNs
  • Handle small variable length histories
  • LSTMs/GRUs
  • Handle larger variable length histories
  • Bi-LSTMs
  • Handle larger variable length histories and futures
  • Recursive NNs
  • Handle variable length partially ordered histories
slide-12
SLIDE 12

Deep Learning Models (contd)

  • Hierarchical Recurrent NNs
  • RNN over RNNs
  • Attention models
  • attach non-uniform importance to histories based on evidence (question)
  • Co-attention models
  • attach non-uniform importances to histories in two different NNs
  • MemNets
  • add an external storage with explicit read, write, updates
  • Generative Adversarial Nets
  • a better training procedure using actor-critic architecture
slide-13
SLIDE 13

Hierarchical Models

  • Semi-CRFs: joint segmentation and labeling
  • Sentence is a sequence of segments, which are sequence of words
  • Allows segment level features to be added
  • HRED: LSTM over LSTM
  • Document is a sequence of sentences, which is a sequence of words
  • Conversation is a sequence of utterances, which is a sequence of words
slide-14
SLIDE 14

RL for Text

  • Two uses
  • Use 1: search the Web to find easy documents for IE
  • Use 2: Policy gradient algorithm for updating weights for generator in

GANs.

slide-15
SLIDE 15

Bootstrapping

  • [Akshay] Fuzzy matching between seed tuples and text
  • [Shantanu] Named entity tags in patterns
  • [Gagan, Barun] Confidence level for each pattern and fact
  • Semantic drift
slide-16
SLIDE 16

NELL

  • Never-ending/lifelong learning
  • Human supervision to guide the learning
  • [many] multi-view multi-task co-training
  • [many] coupling constraints for high precision.
  • [Dinesh] ontology to define the constraints
slide-17
SLIDE 17

Open IE

  • [many] ontology-free, scalablity
  • [Surag] data-driven research through extensive error analysis
  • [Dinesh] reusing datasets from one task to another
  • [Partha] open relations as supplementary knowledge to reduce

sparsity

slide-18
SLIDE 18

Tree Kernels

  • [Shantanu] major info about the relation lies in the shortest path of

the dependency parse

slide-19
SLIDE 19

Semi-CRFs

  • [many] segment level features in CRF
  • [Dinesh] joint segmentation and labeling ?
  • Order L CRFs vs Semi-CRFs
slide-20
SLIDE 20

MultiR

  • [Rishab] Use of KB to create a training set
  • [Surag] multi-instance learning in PGMs
  • [Akshay] relationship between sentence-level and aggregate extractions
  • [Gagan] Vitterbi approximation (replace expectation with max)
slide-21
SLIDE 21

PCNNs

  • [Haroun] Max pooling to make layers independent of sentence size
  • [Akshay] Piecewise max pooling to capture arg1, rel, arg2
  • [Akshay] Multi-instance learning in neural nets
  • Positional embeddings
slide-22
SLIDE 22

TwitIE

  • [Haroun] tweets are challenging, but redundancy is good
  • [Dinesh] G2 test for ranking entities for a given date
  • [Shantanu] event type discovery using topic models
slide-23
SLIDE 23

RL for IE

  • [many] active querying for gathering external evidence
slide-24
SLIDE 24

PRA for KB inference

  • [Haroun, Akshay] low variance sampling
  • [Arindam] learning non-functional relations
  • [Nupur] paths as features in a learning model
slide-25
SLIDE 25

Joint MF-TF

  • [Akshay, Shantanu] OOV handling
  • [Nupur] loss function in joint modeling
slide-26
SLIDE 26

Open QA

  • [Surag] structured perceptron in a pipeline model
  • [Akshay] paraphrase corpus for question rewriting
  • [Shantanu] mining paraphrase operators from corpus
  • [Arindam] decomposition of scoring over derivation steps
slide-27
SLIDE 27

LSTMs

  • [Haroun] attention > depth
  • [Akshay] cool way to construct the dataset
  • [Dinesh] two types of readers
slide-28
SLIDE 28

Co-attention

  • [many] iterative refinement of answer span selection*
slide-29
SLIDE 29

HRED

  • [Akshay] pretraining dialog model with a QA dataset
  • [Arindam] passing intermediate context improves coherence?
  • [Barun] split of local dialog generator and global state tracker
slide-30
SLIDE 30

MSQU

  • [many] partially annotated data
  • [many] natural language -> SQL
slide-31
SLIDE 31

GANs

  • [many] teacher forcing
  • [Akshay] interesting heuristics
  • [Arindam] discriminator feedback can be backpropagated despite

being non-differentiable

slide-32
SLIDE 32

MemNets

  • [Surag] typed OOVs
  • [Haroun] hops
  • [Shantanu, Gagan] subtask-styled evaluation
slide-33
SLIDE 33

Open/Next Issues

  • IE: mature?
  • Event extraction
  • Temporal extraction
  • Rapid retargettability
  • KB Inference
  • Long way to go
  • Combining DL and path-based models
slide-34
SLIDE 34

Open/Next Issues

  • QA systems
  • Dataset driven research: [MC] SQUaD – tremendous progress
  • Answering in the wild: not clear (large answer spaces?)
  • Deep learning for large-scale QA
  • Conversational agents
  • [Task driven] how to get DL model to issue a variety of queries
  • [General] how to get the system to say something interesting?
  • DL: what are the systems really capturing!?
slide-35
SLIDE 35

Conclusions

  • Learn key historical developments in IE
  • Learn (some) state of the art in IE, inference, QA and dialog
  • Learn how to critique strengths and weaknesses of a paper
  • Learn how to brainstorm next steps and future directions
  • Learn how to summarize an advanced area of research
  • Learn to do research at the cutting edge
slide-36
SLIDE 36

Exam

  • Bring a laptop
  • Internet enabled
  • PDFLatex enabled
  • Bring a mobile
  • Taking a picture
  • Extension cords
  • It is ok even if you have not deeply understood every paper
slide-37
SLIDE 37

Project Presentations

  • Motivation & Problem definition
  • 1 Slide of Contribution
  • Background
  • Technical Approach
  • Experiments
  • Analysis
  • Conclusions
  • Future Work