[PPT] - CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised PowerPoint Presentation

SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Moving beyond supervised learning

SLIDE 2

Administrativia

Projects!

– Due April 30th – Template online – Can use MS Word but follow the organization/rubric!

No posters/presentations

(C) Zsolt Kira 2

SLIDE 3

Project Note

Important note:

– Your project should include doing something beyond just downloading open-source code, fine-tuning, and showing the result – This can include:

implementation of additional approaches (even if leveraging open-

source code),

a thorough analysis/investigation of some phenomena or hypothesis
theoretical analysis, or
When using external resources, provide references to

anything you used in the write-up!

(C) Zsolt Kira 3

SLIDE 4

Supervised Learning

4

Supervised Learning

ML has been focused largely on this
Lots of other problem settings are now coming up:

○

What if we have unlabeled data?

○

What if we have many datasets?

○

What if we only have one example per (new) class?

SLIDE 5

But wait, there’s more!

Transfer Learning
Semi-supervised learning
One/Few-shot learning
Un/Self-Supervised Learning
Domain adaptation
Meta-Learning
Zero-shot learning
Continual / Lifelong-learning
Multi-modal learning
Multi-task learning
Active learning
…

(C) Zsolt Kira 5 Setting Source Target Shift Type Semi-supervised Single labeled Single unlabeled None Domain Adaptation Single labeled Single unlabeled Non- semantic Domain Generalization Multiple labeled Unknown Non- semantic Cross-Task Transfer Single labeled Single unlabeled Semantic Few-Shot Learning Single labeled Single few- labeled Semantic Un/Self- Supervised Single unlabeled Many labeled Both/Task

SLIDE 6

An Entire Class on this!

Deep Unsupervised Learning class (UC Berkeley)
Link:

– https://sites.google.com/view/berkeley-cs294-158-sp20/home

(C) Zsolt Kira 6

SLIDE 7

But wait, there’s more!

Transfer Learning
Semi-supervised learning
One/Few-shot learning
Un/Self-Supervised Learning
Domain adaptation
Meta-Learning
Zero-shot learning
Continual / Lifelong-learning
Multi-modal learning
Multi-task learning
Active learning
…

(C) Zsolt Kira 7

SLIDE 8

What is Semi-Supervised Learning?

8

Supervised Learning Semi-Supervised Learning

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

SLIDE 9

What is Semi-Supervised Learning?

9

Supervised Learning Semi-Supervised Learning

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

SLIDE 10

Semi-Supervised Learning

Classification: Fully Supervised

○ Training data: (image, label), predict label for new images.

What if we have a few labeled samples and many unlabeled samples? Labeling

is generally time-consuming and expensive in certain domains.

Semi-Supervised Learning

○ Training data: Labeled data (image, label) and Unlabeled data (image) ○ Goal: Use the unlabeled data to make supervised learning better ○ Note: If we have lots of labeled data, this goal is much harder

10

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

SLIDE 11

Why Semi-Supervised Learning?

Slide: Thang Luong

11

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

My take: Reality might be in-between:
Might be able to improve upon high-labeled data regime but with exponentially increasing

unlabeled data (of the proper type)

See

SLIDE 12

Agenda

■ Core concepts

■

Confidence vs Entropy

■

Pseudo Labeling

■

Entropy minimization

■

Virtual Adversarial Training

■

Label Consistency

■

Make sure augmentations of the sample have the same class

■

Pi-Model, Temporal Ensembling, Mean Teacher

■

Regularization

■

Weight decay

■

Dropout

■

Data-Augmentation (MixUp, CutOut)

■

Unsupervised Data Augmentation (UDA), MixMatch

■

Co-Training / Self-Training / Pseudo Labeling (Noisy Student)

12

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

SLIDE 13

Pseudo Labeling

Simple idea:

○ Train on labeled data ○ Make predictions on unlabeled data ○ Add confident predictions to training data ○ Can do these both end-to-end (no need to separate stages) 13

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

SLIDE 14

Issue: Confidences on New Data

Predictions on unlabeled data

may be too flat (high entropy)

Solution: Entropy

minimization

Several ways to achieve this

○ Explicit loss ○ Sharpening function (e.g. temperature scaling) 14

Image Credit: Figure modified from MixMatch paper

SLIDE 15

Label Consistency with Data Augmentation

15

SLIDE 16

Label Consistency with Data Augmentation

Could be Unlabeled or Labeled

16

SLIDE 17

Label Consistency with Data Augmentation

17

SLIDE 18

Label Consistency with Data Augmentation

Make sure that the logits are similar

18

SLIDE 19

More Data Augmentation -> Regularization

19

SLIDE 20

Realistic Evaluation of Semi-Supervised Learning

20

SLIDE 21

Outline

■ Realistic Evaluation of Semi-Supervised Learning

■

pi-model

■

Temporal Ensembling

■

Mean Teacher

■

Virtual Adversarial Training

21

SLIDE 22

pi-Model

22

Temporal Ensembling for Semi-Supervised Learning

SLIDE 23

pi-Model

23

Temporal Ensembling for Semi-Supervised Learning

SLIDE 24

Comparison

24

SLIDE 25

Comparison

25

SLIDE 26

Varying number of labels

26

SLIDE 27

Class Distribution Mismatch

27

SLIDE 28

MixMatch

28

SLIDE 29

MixMatch

29

Transfer Learning
Semi-supervised learning
One/Few-shot learning
Un/Self-Supervised Learning
Domain adaptation
Meta-Learning
Zero-shot learning
Continual / Lifelong-learning
Multi-modal learning
Multi-task learning
Active learning
…

(C) Zsolt Kira 37

SLIDE 38

Few-Shot Learning

(C) Zsolt Kira 38

Slide Credit: Hugo Larochelle

SLIDE 39

Few-Shot Learning

(C) Zsolt Kira 39

Slide Credit: Hugo Larochelle

SLIDE 40

Normal Approach

Do what we always do: Fine-tuning

– Train classifier on base classes – Freeze features – Learn classifier weights for new classes using few amounts

f labeled data (during “query” time!)

(C) Zsolt Kira 40

A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

SLIDE 41

Cons of Normal Approach

The training we do on the base classes does not

factor the task into account

No notion that we will be performing a bunch of N-

way tests

Idea: simulate what we will see during test time

(C) Zsolt Kira 41

SLIDE 42

Meta-Training Approach

Set up a set of smaller tasks during training which

simulates what we will be doing during testing

– Can optionally pre-train features on held-out base classes (not typical)

Testing stage is now the same, but with new classes

(C) Zsolt Kira 42

https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/

SLIDE 43

Meta-Learning Approaches

Learning a model conditioned on support set

(C) Zsolt Kira 43

SLIDE 44

More Sophisticated Meta-Learning Approaches

Learn gradient descent:

– Parameter initialization and update rules – Output:

Parameter initialization
Meta-learner that decides how to update parameters
Learn just an initialization and use normal gradient

descent (MAML)

– Output:

Just parameter initialization!
We are using SGD

(C) Dhruv Batra & Zsolt Kira 44

SLIDE 45

Meta-Learner

How to parametrize learning algorithms?
Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

kNN/kernel machine: Matching networks (Vinyals et al. 2016)
Gaussian classifier: Prototypical Networks (Snell et al. 2017)
Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

MANN (Santoro et al. 2016)
SNAIL (Mishra et al. 2018)

(C) Zsolt Kira 45

Slide Credit: Hugo Larochelle

SLIDE 46

More Sophisticated Meta-Learning Approaches

Learn gradient descent:

– Parameter initialization and update rules – Output:

Parameter initialization
Meta-learner that decides how to update parameters
Learn just an initialization and use normal gradient

descent (MAML)

– Output:

Just parameter initialization!
We are using SGD

Deep Unsupervised Learning:

1.

Learn representations without labels

2.

Subset of Deep Learning, which is a subset of Representation Learning, which is a subset of Machine Learning Self-Supervised Learning

1.

Often used interchangeably with unsupervised learning

2.

Self-Supervised: Create your own supervision through pretext tasks

SLIDE 57

Motivation

60

Yann LeCun’s cake

SLIDE 58

Motivation

61

Yann LeCun’s cake Slide: LeCun

SLIDE 59

Current List of Tasks

■ Reconstruct from a corrupted (or partial) version

■

Denoising Autoencoder

■

In-painting

■

Colorization, Split-Brain Autoencoder ■ Visual common sense tasks

■

Relative patch prediction

■

Jigsaw puzzles

■

Rotation ■ Contrastive Learning

■

word2vec

■

Contrastive Predictive Coding (CPC)

■

Instance Discrimination

■

Recent State-of-the-art progress

62

SLIDE 60

Relative Position of Image Patches

63

Task: Predict the relative position of the second patch with respect to the first

Slide: Zisserman

SLIDE 61

Relative Position of Image Patches

64

attract repel

1. MoCo
2. SimCLR

Results – Object Detection

83

SLIDE 80

We have come a long way

ML background – error decomposition, overfitting, features, etc.
Linear classifiers

– & softmax

Computation Graph
Gradient Descent
Adding layers
Backpropagation, automatic differentiation
Optimization – regularization/normalization (batch norm, dropout), augmentation,

different optimizers (adam, adagrad, etc.)

Convolution and Pooling layers
Modern CNNs - AlexNet, VGG, Inception, ResNet
3D CNNs
Recurrent Neural Networks and LSTMs

– NLP, word/sentence vectors, attention, etc.

Unsupervised feature learning
Generative models (GANs, VAEs)
Deep Reinforcement Learning
Other applications: Few-Shot Learning, structure

(C) Zsolt Kira 84

SLIDE 81

Things to Watch out For

Research is cyclical

– SVMs, boosting, probabilistic graphical models & Bayes Nets, Structural Learning, Sparse Coding, Deep Learning – Deep learning is unique in its depth and breadth, but... – Deep learning may be improved, reinvented, combined, overtaken

Learn fundamentals for techniques across the field:

– Know the span of ML techniques and choose the ones that fit your problem! – Be responsible in 1) how you use it, 2) promises you make and how you convey it

Try to understand landscape of the field

– Look out for what is coming up next, not where we are

Have fun!

SLIDE 82

Some current/upcoming topics

Current / Recent Past

– AutoML – Meta-learning – Unsupervised, semi-supervised, domain adaptation, zero/one/few-shot learning – Continual/lifelong learning without forgetting – Memory – Visual question answering, embodied question answering – Adversarial Examples

More recent

– Deep Learning and logic! – Deep Learning and SAT problems – World modeling, learning intuitive/physics models – Visual dialogue, agents, chatbots – Fixing reinforcement learning

First you have to admit you have a problem
Exploration and world modeling

– Simulation frameworks, joint perception, planning, and action

Navigation, mapping

– Just scaling everything up and watching the magic!