Incremental Classification: First Step into Lifelong Learning PAN - - PowerPoint PPT Presentation
Incremental Classification: First Step into Lifelong Learning PAN - - PowerPoint PPT Presentation
Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE Multi-task Incremental Classification: Setup Training Data ... Target Model Multi-task Incremental Classification: Setup Training
Multi-task Incremental Classification: Setup
Target Model
…
Training Data
...
… …
Multi-task Incremental Classification: Setup
Target Model
…
Training Data
...
… …
Multi-task Incremental Classification: Setup
Target Model
…
Training Data
...
… …
Feature Extraction
… …
Re-training
…
Sub-optimal for the new task Time consuming Finetuning
…
Catastrophic forgetting
Multi-task Incremental Classification: Baseline
Potential Application Scenarios
- Limited storage budget that can not keep
all sequential data.
- The collected data will expire due to privacy
issues.
- Efficient deployment of the model for
incremental data.
- ...
Lifelong Learning via Progressive Distillation and Retrospection
Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2
1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University
[* indicates joint first authorship] (Accepted in ECCV 2018)
…
Finetuning
…
Catastrophic forgetting
Handle Catastrophic Forgetting
How to prevent performance drop in the old task?
…
Finetuning
…
Catastrophic forgetting
Handle Catastrophic Forgetting
How to prevent performance drop in the old task during training? We need an indicator.
…
Finetuning
…
Catastrophic forgetting
Handle Catastrophic Forgetting
How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data?
…
Finetuning
…
Catastrophic forgetting
Handle Catastrophic Forgetting
How to prevent performance drop in the old task during training? We need an indicator. How to construct an indicator if we do not reserve any of old data? Take new data as fake old data.
Learning without Forgetting
Feature Extractor
𝑮∗ 𝑮
𝑼𝒐
∗
𝑼𝒑
𝑼𝒑
∗
New Data Training Data Original CNN Task-specific Classifiers
Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆
(Accepted in ECCV 2016)
Expert CNN New Data Feature Extractor
𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐
𝑼𝒐
∗
𝑼𝒑
𝑼𝒑
∗
Training Data Original CNN Task-specific Classifiers
Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆
Adaptation by Distillation
What if we reserve a small faction of
- ld data?
Expert CNN New Data + Retrospection Feature Extractor
𝑮𝒐 𝑮∗ 𝑮 𝑼𝒐
𝑼𝒐
∗
𝑼𝒑
𝑼𝒑
∗
Training Data Original CNN Task-specific Classifiers
Loss𝒐𝒇𝒙 Loss𝒑𝒎𝒆
Adaptation by Distillation + Retrospection
Retrospection from old tasks Adaptation by distillation to new tasks Lifelong learning
… … … … …
Training Data Expert CNN for Task 1
... ... ...
Expert CNN for Task 2
Overview of Distillation and Retrospection
Dataset
Some Results
Ablation Study on #Reserved Samples
Learning a Unified Classifier Incrementally via Rebalancing
Saihui Hou1* Xinyu Pan2* Chen Change Loy3 Zilei Wang1 Dahua Lin2
1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University
[* indicates joint first authorship] (To appear in CVPR 2019)
Multi-task Setting
… …
From Multi-task to Multi-class
There is an oracle to tell which classifier should be used at inference time.
Multi-class Setting
… …
Multi-task Setting
… …
From Multi-task to Multi-class
Multi-class Setting
… …
Multi-task Setting
… …
From Multi-task to Multi-class
There is no oracle here. But can we simply adapt distillation and retrospection to this setup?
A Toy Example to Visualize Imbalance
Cosine Normalization Imbalanced Magnitudes
- ld class
embeddings new classes embeddings
Handle the Imbalance
(We will use embedding and the weights of last fully-connected layer alternatively in the following.)
Deviation Less-Forget Constraint previous knowledge
Handle the Imbalance
Inter-Class Separation Ambiguities Anchor Positive Negative
- ld class
embeddings new classes embeddings
Handle the Imbalance
𝒈𝐩𝐦𝐞
∗
𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱
New Samples Reserved Samples
𝑮∗ 𝑮 𝑮
Class Embedding
𝑴𝐝𝐟 𝑴𝐞𝐣𝐭
𝑯
𝑴𝐧𝐬
Old Model New Model CNN Features
Overview
𝒈𝐩𝐦𝐞
∗
𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱
New Samples Reserved Samples
𝑮∗ 𝑮 𝑮
Class Embedding
𝑴𝐝𝐟 𝑴𝐞𝐣𝐭
𝑯
𝑴𝐧𝐬
Old Model New Model CNN Features
Overview
𝒈𝐩𝐦𝐞
∗
𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱
New Samples Reserved Samples
𝑮∗ 𝑮 𝑮
Class Embedding
𝑴𝐝𝐟 𝑴𝐞𝐣𝐭
𝑯
𝑴𝐧𝐬
Old Model New Model CNN Features
Overview
𝒈𝐩𝐦𝐞
∗
𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱
New Samples Reserved Samples
𝑮∗ 𝑮 𝑮
Class Embedding
𝑴𝐝𝐟 𝑴𝐞𝐣𝐭
𝑯
𝑴𝐧𝐬
Old Model New Model CNN Features
Overview
𝒈𝐩𝐦𝐞
∗
𝒈𝐩𝐦𝐞 𝒈𝐨𝐟𝐱
New Samples Reserved Samples
𝑮∗ 𝑮 𝑮
Class Embedding
𝑴𝐝𝐟 𝑴𝐞𝐣𝐭
𝑯
𝑴𝐧𝐬
Old Model New Model CNN Features
Overview
10 phases 5-phase ablation study