Episodic Memory in Lifelong Language Learning NIPS 19 Cyprien de - - PowerPoint PPT Presentation

episodic memory in lifelong language learning
SMART_READER_LITE
LIVE PREVIEW

Episodic Memory in Lifelong Language Learning NIPS 19 Cyprien de - - PowerPoint PPT Presentation

Episodic Memory in Lifelong Language Learning NIPS 19 Cyprien de Masson dAutume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama DeepMind Xiachong Feng Outline Author Background Task Model Experiment Result Author


slide-1
SLIDE 1

Episodic Memory in Lifelong Language Learning

NIPS 19 Cyprien de Masson d’Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama DeepMind

Xiachong Feng

slide-2
SLIDE 2

Outline

  • Author
  • Background
  • Task
  • Model
  • Experiment
  • Result
slide-3
SLIDE 3

Author

Sebastian Ruder DeepMind Lingpeng Kong (孔令鹏) DeepMind Dani Yogatama DeepMind Cyprien de Masson d’Autume

slide-4
SLIDE 4

Background

  • Life long learning
slide-5
SLIDE 5

Background

  • Catastrophic Forgetting
slide-6
SLIDE 6

Task

  • Text classification
  • Question answering
slide-7
SLIDE 7

Model

  • Example encoder
  • Task decoder
  • Episodic memory module.
slide-8
SLIDE 8

Example encoder && Task decoder

slide-9
SLIDE 9

Episodic Memory

  • key-value memory block

key value

Pretrained BERT Model (freeze)

  • Text Classification
  • [CLS]
  • Question Answering
  • The first token of question
  • Text classification, 𝒚𝒖 is a document to be

classified

  • Question answering, 𝒚𝒖 is a concatenation of a

context paragraph and a question separated by [SEP].

Label

slide-10
SLIDE 10

Episodic Memory

Episodic Memory Model Sparse experience replay Local adaptation

slide-11
SLIDE 11

Model - Training

  • Write
  • Based on random write
  • Read sparse experience replay
  • Uniformly random sampling
  • Perform gradient updates

based on the retrieved examples

  • Sparsely : randomly retrieve

100 examples every 10,000 new examples

slide-12
SLIDE 12

Model - Inference

  • Read local adaptation
  • Key netà query vector
  • K-nearest neighbors using the

Euclidean distance function

1 𝐿

slide-13
SLIDE 13

Experiments

  • Text classification
  • News classification (AGNews), sentiment analysis (Yelp, Amazon), Wikipedia article

classification (DBPedia), and questions and answers categorization (Yahoo).

  • AGNews (4 classes), Yelp (5 classes), DBPedia (14 classes), Amazon (5 classes), and

Yahoo (10 classes) datasets.

  • Yelp and Amazon datasets have similar semantics (product ratings), we merge the

classes for these two datasets.

  • Question answering
  • SQuAD 1.1 ,TriviaQA, QuAC
  • Create a balanced version all datasets
slide-14
SLIDE 14

Results

Text classification QA

randomly retrieved examples for local adaptation multitask model

slide-15
SLIDE 15

Result

slide-16
SLIDE 16

Result

store only 50% and 10% of training examples.

slide-17
SLIDE 17

Result

slide-18
SLIDE 18

Thanks!