Incremental Classification: First Step into Lifelong Learning PAN - - PowerPoint PPT Presentation

incremental classification first step into lifelong
SMART_READER_LITE
LIVE PREVIEW

Incremental Classification: First Step into Lifelong Learning PAN - - PowerPoint PPT Presentation

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE Multi-task Incremental Classification: Setup Training Data ... Target Model Multi-task Incremental Classification: Setup Training


slide-1
SLIDE 1

Incremental Classification: First Step into Lifelong Learning

PAN Xinyu MMLab, Department of IE

slide-2
SLIDE 2

Multi-task Incremental Classification: Setup

Target Model

Training Data

...

… …

slide-3
SLIDE 3

Multi-task Incremental Classification: Setup

Target Model

Training Data

...

… …

slide-4
SLIDE 4

Multi-task Incremental Classification: Setup

Target Model

Training Data

...

… …

slide-5
SLIDE 5

Feature Extraction

… …

Re-training

Sub-optimal for the new task Time consuming Finetuning

Catastrophic forgetting

Multi-task Incremental Classification: Baseline

slide-6
SLIDE 6

Potential Application Scenarios

  • Limited storage budget that can not keep

all sequential data.

  • The collected data will expire due to privacy

issues.

  • Efficient deployment of the model for

incremental data.

  • ...
slide-7
SLIDE 7

Lifelong Learning via Progressive Distillation and Retrospection

Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2

1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University

[* indicates joint first authorship] (Accepted in ECCV 2018)

slide-8
SLIDE 8

Finetuning

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task?

slide-9
SLIDE 9

Finetuning

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator.

slide-10
SLIDE 10

Finetuning

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data?

slide-11
SLIDE 11

Finetuning

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data? Take new data as fake old data.

slide-12
SLIDE 12

Learning without Forgetting

Feature Extractor

𝑮∗ 𝑮

𝑼𝒐

𝑼𝒑

𝑼𝒑

New Data Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

(Accepted in ECCV 2016)

slide-13
SLIDE 13

Expert CNN New Data Feature Extractor

𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐

𝑼𝒐

𝑼𝒑

𝑼𝒑

Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

Adaptation by Distillation

What if we reserve a small faction of

  • ld data?
slide-14
SLIDE 14

Expert CNN New Data + Retrospection Feature Extractor

𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐

𝑼𝒐

𝑼𝒑

𝑼𝒑

Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

Adaptation by Distillation + Retrospection

slide-15
SLIDE 15

Retrospection from old tasks Adaptation by distillation to new tasks Lifelong learning

… … … … …

Training Data Expert CNN for Task 1

... ... ...

Expert CNN for Task 2

Overview of Distillation and Retrospection

slide-16
SLIDE 16

Dataset

slide-17
SLIDE 17

Some Results

slide-18
SLIDE 18

Ablation Study on #Reserved Samples

slide-19
SLIDE 19

Learning a Unified Classifier Incrementally via Rebalancing

Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2

1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University

[* indicates joint first authorship] (To appear in CVPR 2019)

slide-20
SLIDE 20

Multi-task Setting

… …

From Multi-task to Multi-class

There is an oracle to tell which classifier should be used at inference time.

slide-21
SLIDE 21

Multi-class Setting

… …

Multi-task Setting

… …

From Multi-task to Multi-class

slide-22
SLIDE 22

Multi-class Setting

… …

Multi-task Setting

… …

From Multi-task to Multi-class

There is no oracle here. But can we simply adapt distillation and retrospection to this setup?

slide-23
SLIDE 23

A Toy Example to Visualize Imbalance

slide-24
SLIDE 24

Cosine Normalization Imbalanced Magnitudes

  • ld class

embeddings new classes embeddings

Handle the Imbalance

(We will use embedding and the weights of last fully-connected layer alternatively in the following.)

slide-25
SLIDE 25

Deviation Less-Forget Constraint previous knowledge

Handle the Imbalance

slide-26
SLIDE 26

Inter-Class Separation Ambiguities Anchor Positive Negative

  • ld class

embeddings new classes embeddings

Handle the Imbalance

slide-27
SLIDE 27

𝒈𝐩𝐦𝐞

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

slide-28
SLIDE 28

𝒈𝐩𝐦𝐞

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

slide-29
SLIDE 29

𝒈𝐩𝐦𝐞

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

slide-30
SLIDE 30

𝒈𝐩𝐦𝐞

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

slide-31
SLIDE 31

𝒈𝐩𝐦𝐞

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

slide-32
SLIDE 32

10 phases 5-phase ablation study

Some Results

slide-33
SLIDE 33

Thank you!