Episodic Memory in Lifelong Language Learning NIPS 19 Cyprien de - - PowerPoint PPT Presentation

▶

Dec 11, 2023 167 likes •354 views

Episodic Memory in Lifelong Language Learning NIPS 19 Cyprien de Masson dAutume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama DeepMind Xiachong Feng Outline Author Background Task Model Experiment Result Author

SLIDE 1

Episodic Memory in Lifelong Language Learning

NIPS 19 Cyprien de Masson d’Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama DeepMind

Xiachong Feng

SLIDE 2

Outline

Author
Background
Task
Model
Experiment
Result

SLIDE 3

Author

Sebastian Ruder DeepMind Lingpeng Kong (孔令鹏) DeepMind Dani Yogatama DeepMind Cyprien de Masson d’Autume

SLIDE 4

Background

Life long learning

SLIDE 5

Background

Catastrophic Forgetting

SLIDE 6

Task

Text classification
Question answering

SLIDE 7

Model

Example encoder
Task decoder
Episodic memory module.

SLIDE 8

Example encoder && Task decoder

SLIDE 9

Episodic Memory

key-value memory block

key value

Pretrained BERT Model (freeze)

Text Classification
[CLS]
Question Answering
The first token of question
Text classification, 𝒚𝒖 is a document to be

classified

Question answering, 𝒚𝒖 is a concatenation of a

context paragraph and a question separated by [SEP].

Label

SLIDE 10

Episodic Memory

Episodic Memory Model Sparse experience replay Local adaptation

SLIDE 11

Model - Training

Write
Based on random write
Read sparse experience replay
Uniformly random sampling
Perform gradient updates

based on the retrieved examples

Sparsely : randomly retrieve

100 examples every 10,000 new examples

SLIDE 12

Model - Inference

Read local adaptation
Key netà query vector
K-nearest neighbors using the

Euclidean distance function

1 𝐿

SLIDE 13

Experiments

Text classification
News classification (AGNews), sentiment analysis (Yelp, Amazon), Wikipedia article

classification (DBPedia), and questions and answers categorization (Yahoo).

AGNews (4 classes), Yelp (5 classes), DBPedia (14 classes), Amazon (5 classes), and

Yahoo (10 classes) datasets.

Yelp and Amazon datasets have similar semantics (product ratings), we merge the

classes for these two datasets.

Question answering
SQuAD 1.1 ,TriviaQA, QuAC
Create a balanced version all datasets

SLIDE 14

Results

Text classification QA

randomly retrieved examples for local adaptation multitask model

SLIDE 15

Result

SLIDE 16

Result

store only 50% and 10% of training examples.

SLIDE 17

Result

SLIDE 18

Episodic Memory in Lifelong Language Learning

Outline

Author

Background

Background

Task

Model

Example encoder && Task decoder

Episodic Memory

Episodic Memory

Model - Training

Model - Inference

Experiments

Results

Result

Result

Result

Thanks!