Incremental Classification: First Step into Lifelong Learning PAN - - PowerPoint PPT Presentation

▶

Nov 18, 2022 412 likes •747 views

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE Multi-task Incremental Classification: Setup Training Data ... Target Model Multi-task Incremental Classification: Setup Training

SLIDE 1

Incremental Classification: First Step into Lifelong Learning

PAN Xinyu MMLab, Department of IE

SLIDE 2

Multi-task Incremental Classification: Setup

Target Model

…

Training Data

...

… …

SLIDE 3

Multi-task Incremental Classification: Setup

Target Model

…

Training Data

...

… …

SLIDE 4

Multi-task Incremental Classification: Setup

Target Model

…

Training Data

...

… …

SLIDE 5

Feature Extraction

… …

Re-training

…

Sub-optimal for the new task Time consuming Finetuning

…

Catastrophic forgetting

Multi-task Incremental Classification: Baseline

SLIDE 6

Potential Application Scenarios

Limited storage budget that can not keep

all sequential data.

The collected data will expire due to privacy

issues.

Efficient deployment of the model for

incremental data.

SLIDE 7

Lifelong Learning via Progressive Distillation and Retrospection

Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2

1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University

[* indicates joint first authorship] (Accepted in ECCV 2018)

SLIDE 8

…

Finetuning

…

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task?

SLIDE 9

…

Finetuning

…

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator.

SLIDE 10

…

Finetuning

…

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data?

SLIDE 11

…

Finetuning

…

Catastrophic forgetting

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data? Take new data as fake old data.

SLIDE 12

Learning without Forgetting

Feature Extractor

𝑮∗ 𝑮

𝑼𝒐

∗

𝑼𝒑

∗

New Data Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

(Accepted in ECCV 2016)

SLIDE 13

Expert CNN New Data Feature Extractor

𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐

𝑼𝒐

∗

𝑼𝒑

∗

Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

Adaptation by Distillation

What if we reserve a small faction of

ld data?

SLIDE 14

Expert CNN New Data + Retrospection Feature Extractor

𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐

𝑼𝒐

∗

𝑼𝒑

∗

Training Data Original CNN Task-specific Classifiers

Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆

Adaptation by Distillation + Retrospection

SLIDE 15

Retrospection from old tasks Adaptation by distillation to new tasks Lifelong learning

… … … … …

Training Data Expert CNN for Task 1

... ... ...

Expert CNN for Task 2

Overview of Distillation and Retrospection

SLIDE 16

Dataset

SLIDE 17

Some Results

SLIDE 18

Ablation Study on #Reserved Samples

SLIDE 19

Learning a Unified Classifier Incrementally via Rebalancing

Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2

1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University

[* indicates joint first authorship] (To appear in CVPR 2019)

SLIDE 20

Multi-task Setting

… …

From Multi-task to Multi-class

There is an oracle to tell which classifier should be used at inference time.

SLIDE 21

Multi-class Setting

… …

Multi-task Setting

… …

From Multi-task to Multi-class

SLIDE 22

Multi-class Setting

… …

Multi-task Setting

… …

From Multi-task to Multi-class

There is no oracle here. But can we simply adapt distillation and retrospection to this setup?

SLIDE 23

A Toy Example to Visualize Imbalance

SLIDE 24

Cosine Normalization Imbalanced Magnitudes

ld class

embeddings new classes embeddings

Handle the Imbalance

(We will use embedding and the weights of last fully-connected layer alternatively in the following.)

SLIDE 25

Deviation Less-Forget Constraint previous knowledge

Handle the Imbalance

SLIDE 26

Inter-Class Separation Ambiguities Anchor Positive Negative

ld class

embeddings new classes embeddings

Handle the Imbalance

SLIDE 27

𝒈𝐩𝐦𝐞

∗

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

SLIDE 28

𝒈𝐩𝐦𝐞

∗

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

SLIDE 29

𝒈𝐩𝐦𝐞

∗

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

SLIDE 30

𝒈𝐩𝐦𝐞

∗

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

SLIDE 31

𝒈𝐩𝐦𝐞

∗

𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱

New Samples Reserved Samples

𝑮∗ 𝑮 𝑮

Class Embedding

𝑴𝐝𝐟 𝑴𝐞𝐣𝐭

𝑯

𝑴𝐧𝐬

Old Model New Model CNN Features

Overview

SLIDE 32

10 phases 5-phase ablation study

Some Results

SLIDE 33

Incremental Classification: First Step into Lifelong Learning

PAN Xinyu MMLab, Department of IE

Multi-task Incremental Classification: Setup

Multi-task Incremental Classification: Setup

Multi-task Incremental Classification: Setup

Multi-task Incremental Classification: Baseline

Potential Application Scenarios

all sequential data.

issues.

incremental data.

Lifelong Learning via Progressive Distillation and Retrospection

Handle Catastrophic Forgetting

How to prevent performance drop in the old task?

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator.

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data?

Handle Catastrophic Forgetting

How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data? Take new data as fake old data.

Learning without Forgetting

Adaptation by Distillation

Adaptation by Distillation + Retrospection

Overview of Distillation and Retrospection

Dataset

Some Results

Ablation Study on #Reserved Samples

Learning a Unified Classifier Incrementally via Rebalancing

From Multi-task to Multi-class

There is an oracle to tell which classifier should be used at inference time.

From Multi-task to Multi-class

From Multi-task to Multi-class

There is no oracle here. But can we simply adapt distillation and retrospection to this setup?

A Toy Example to Visualize Imbalance

Handle the Imbalance

Handle the Imbalance

Handle the Imbalance

Overview

Overview

Overview

Overview

Overview

Some Results

Thank you!