CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia Tech Administrativia Projects! Due April 30 th Template online Can use MS Word but follow the organization/rubric! No


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Moving beyond supervised learning

slide-2
SLIDE 2

Administrativia

  • Projects!

– Due April 30th – Template online – Can use MS Word but follow the organization/rubric!

  • No posters/presentations

(C) Zsolt Kira 2

slide-3
SLIDE 3

Project Note

  • Important note:

– Your project should include doing something beyond just downloading open-source code, fine-tuning, and showing the result – This can include:

  • implementation of additional approaches (even if leveraging open-

source code),

  • a thorough analysis/investigation of some phenomena or hypothesis
  • theoretical analysis, or
  • When using external resources, provide references to

anything you used in the write-up!

(C) Zsolt Kira 3

slide-4
SLIDE 4

Supervised Learning

4

Supervised Learning

  • ML has been focused largely on this
  • Lots of other problem settings are now coming up:

What if we have unlabeled data?

What if we have many datasets?

What if we only have one example per (new) class?

slide-5
SLIDE 5

But wait, there’s more!

  • Transfer Learning
  • Semi-supervised learning
  • One/Few-shot learning
  • Un/Self-Supervised Learning
  • Domain adaptation
  • Meta-Learning
  • Zero-shot learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Zsolt Kira 5 Setting Source Target Shift Type Semi-supervised Single labeled Single unlabeled None Domain Adaptation Single labeled Single unlabeled Non- semantic Domain Generalization Multiple labeled Unknown Non- semantic Cross-Task Transfer Single labeled Single unlabeled Semantic Few-Shot Learning Single labeled Single few- labeled Semantic Un/Self- Supervised Single unlabeled Many labeled Both/Task

slide-6
SLIDE 6

An Entire Class on this!

  • Deep Unsupervised Learning class (UC Berkeley)
  • Link:

– https://sites.google.com/view/berkeley-cs294-158-sp20/home

(C) Zsolt Kira 6

slide-7
SLIDE 7

But wait, there’s more!

  • Transfer Learning
  • Semi-supervised learning
  • One/Few-shot learning
  • Un/Self-Supervised Learning
  • Domain adaptation
  • Meta-Learning
  • Zero-shot learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Zsolt Kira 7

slide-8
SLIDE 8

What is Semi-Supervised Learning?

8

Supervised Learning Semi-Supervised Learning

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

slide-9
SLIDE 9

What is Semi-Supervised Learning?

9

Supervised Learning Semi-Supervised Learning

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

slide-10
SLIDE 10

Semi-Supervised Learning

  • Classification: Fully Supervised

○ Training data: (image, label), predict label for new images.

  • What if we have a few labeled samples and many unlabeled samples? Labeling

is generally time-consuming and expensive in certain domains.

  • Semi-Supervised Learning

○ Training data: Labeled data (image, label) and Unlabeled data (image) ○ Goal: Use the unlabeled data to make supervised learning better ○ Note: If we have lots of labeled data, this goal is much harder

10

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

slide-11
SLIDE 11

Why Semi-Supervised Learning?

Slide: Thang Luong

11

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  • My take: Reality might be in-between:
  • Might be able to improve upon high-labeled data regime but with exponentially increasing

unlabeled data (of the proper type)

  • See
slide-12
SLIDE 12

Agenda

■ Core concepts

Confidence vs Entropy

Pseudo Labeling

Entropy minimization

Virtual Adversarial Training

Label Consistency

Make sure augmentations of the sample have the same class

Pi-Model, Temporal Ensembling, Mean Teacher

Regularization

Weight decay

Dropout

Data-Augmentation (MixUp, CutOut)

Unsupervised Data Augmentation (UDA), MixMatch

Co-Training / Self-Training / Pseudo Labeling (Noisy Student)

12

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

slide-13
SLIDE 13

Pseudo Labeling

  • Simple idea:

○ Train on labeled data ○ Make predictions on unlabeled data ○ Add confident predictions to training data ○ Can do these both end-to-end (no need to separate stages) 13

Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

slide-14
SLIDE 14

Issue: Confidences on New Data

  • Predictions on unlabeled data

may be too flat (high entropy)

  • Solution: Entropy

minimization

  • Several ways to achieve this

○ Explicit loss ○ Sharpening function (e.g. temperature scaling) 14

Image Credit: Figure modified from MixMatch paper

slide-15
SLIDE 15

Label Consistency with Data Augmentation

15

slide-16
SLIDE 16

Label Consistency with Data Augmentation

Could be Unlabeled or Labeled

16

slide-17
SLIDE 17

Label Consistency with Data Augmentation

17

slide-18
SLIDE 18

Label Consistency with Data Augmentation

Make sure that the logits are similar

18

slide-19
SLIDE 19

More Data Augmentation -> Regularization

19

slide-20
SLIDE 20

Realistic Evaluation of Semi-Supervised Learning

20

slide-21
SLIDE 21

Outline

■ Realistic Evaluation of Semi-Supervised Learning

pi-model

Temporal Ensembling

Mean Teacher

Virtual Adversarial Training

21

slide-22
SLIDE 22

pi-Model

22

Temporal Ensembling for Semi-Supervised Learning

slide-23
SLIDE 23

pi-Model

23

Temporal Ensembling for Semi-Supervised Learning

slide-24
SLIDE 24

Comparison

24

slide-25
SLIDE 25

Comparison

25

slide-26
SLIDE 26

Varying number of labels

26

slide-27
SLIDE 27

Class Distribution Mismatch

27

slide-28
SLIDE 28

MixMatch

28

slide-29
SLIDE 29

MixMatch

29

slide-30
SLIDE 30

MixMatch

MixUp

30

slide-31
SLIDE 31

MixMatch

31

slide-32
SLIDE 32

MixMatch

32

slide-33
SLIDE 33

MixMatch

33

slide-34
SLIDE 34

MixMatch

34

slide-35
SLIDE 35

FixMatch

35

slide-36
SLIDE 36

FixMatch - Results

36

slide-37
SLIDE 37

But wait, there’s more!

  • Transfer Learning
  • Semi-supervised learning
  • One/Few-shot learning
  • Un/Self-Supervised Learning
  • Domain adaptation
  • Meta-Learning
  • Zero-shot learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Zsolt Kira 37

slide-38
SLIDE 38

Few-Shot Learning

(C) Zsolt Kira 38

Slide Credit: Hugo Larochelle

slide-39
SLIDE 39

Few-Shot Learning

(C) Zsolt Kira 39

Slide Credit: Hugo Larochelle

slide-40
SLIDE 40

Normal Approach

  • Do what we always do: Fine-tuning

– Train classifier on base classes – Freeze features – Learn classifier weights for new classes using few amounts

  • f labeled data (during “query” time!)

(C) Zsolt Kira 40

A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

slide-41
SLIDE 41

Cons of Normal Approach

  • The training we do on the base classes does not

factor the task into account

  • No notion that we will be performing a bunch of N-

way tests

  • Idea: simulate what we will see during test time

(C) Zsolt Kira 41

slide-42
SLIDE 42

Meta-Training Approach

  • Set up a set of smaller tasks during training which

simulates what we will be doing during testing

– Can optionally pre-train features on held-out base classes (not typical)

  • Testing stage is now the same, but with new classes

(C) Zsolt Kira 42

https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/

slide-43
SLIDE 43

Meta-Learning Approaches

  • Learning a model conditioned on support set

(C) Zsolt Kira 43

slide-44
SLIDE 44

More Sophisticated Meta-Learning Approaches

  • Learn gradient descent:

– Parameter initialization and update rules – Output:

  • Parameter initialization
  • Meta-learner that decides how to update parameters
  • Learn just an initialization and use normal gradient

descent (MAML)

– Output:

  • Just parameter initialization!
  • We are using SGD

(C) Dhruv Batra & Zsolt Kira 44

slide-45
SLIDE 45

Meta-Learner

  • How to parametrize learning algorithms?
  • Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

  • kNN/kernel machine: Matching networks (Vinyals et al. 2016)
  • Gaussian classifier: Prototypical Networks (Snell et al. 2017)
  • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

  • MANN (Santoro et al. 2016)
  • SNAIL (Mishra et al. 2018)

(C) Zsolt Kira 45

Slide Credit: Hugo Larochelle

slide-46
SLIDE 46

More Sophisticated Meta-Learning Approaches

  • Learn gradient descent:

– Parameter initialization and update rules – Output:

  • Parameter initialization
  • Meta-learner that decides how to update parameters
  • Learn just an initialization and use normal gradient

descent (MAML)

– Output:

  • Just parameter initialization!
  • We are using SGD

(C) Zsolt Kira 46

slide-47
SLIDE 47

Meta-Learner LSTM

(C) Zsolt Kira 47

Slide Credit: Hugo Larochelle

slide-48
SLIDE 48

Meta-Learner LSTM

(C) Zsolt Kira 48

Slide Credit: Hugo Larochelle

slide-49
SLIDE 49

Meta-Learner LSTM

(C) Zsolt Kira 49

Slide Credit: Hugo Larochelle

slide-50
SLIDE 50

Meta-Learner LSTM

(C) Zsolt Kira 50

Slide Credit: Hugo Larochelle

slide-51
SLIDE 51

Model-Agnostic Meta-Learning (MAML)

(C) Zsolt Kira 53

Slide Credit: Hugo Larochelle

slide-52
SLIDE 52

Model-Agnostic Meta-Learning (MAML)

(C) Zsolt Kira 55

Slide Credit: Sergey Levine

slide-53
SLIDE 53

Model-Agnostic Meta-Learning (MAML)

(C) Zsolt Kira 56

Slide Credit: Sergey Levine

slide-54
SLIDE 54

Comparison

(C) Zsolt Kira 57

Slide Credit: Sergey Levine

slide-55
SLIDE 55

But wait, there’s more!

  • Transfer Learning
  • Semi-supervised learning
  • One/Few-shot learning
  • Un/Self-Supervised Learning
  • Domain adaptation
  • Meta-Learning
  • Zero-shot learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Zsolt Kira 58

slide-56
SLIDE 56

Deep Learning is one way to learn features

59

Deep Unsupervised Learning:

1.

Learn representations without labels

2.

Subset of Deep Learning, which is a subset of Representation Learning, which is a subset of Machine Learning Self-Supervised Learning

1.

Often used interchangeably with unsupervised learning

2.

Self-Supervised: Create your own supervision through pretext tasks

slide-57
SLIDE 57

Motivation

60

Yann LeCun’s cake

slide-58
SLIDE 58

Motivation

61

Yann LeCun’s cake Slide: LeCun

slide-59
SLIDE 59

Current List of Tasks

■ Reconstruct from a corrupted (or partial) version

Denoising Autoencoder

In-painting

Colorization, Split-Brain Autoencoder ■ Visual common sense tasks

Relative patch prediction

Jigsaw puzzles

Rotation ■ Contrastive Learning

word2vec

Contrastive Predictive Coding (CPC)

Instance Discrimination

Recent State-of-the-art progress

62

slide-60
SLIDE 60

Relative Position of Image Patches

63

Task: Predict the relative position of the second patch with respect to the first

Slide: Zisserman

slide-61
SLIDE 61

Relative Position of Image Patches

64

Slide: Zisserman Doersch, Gupta, Efros

slide-62
SLIDE 62

Relative Position of Image Patches

65

slide-63
SLIDE 63

Solving Jigsaw Puzzles

66

slide-64
SLIDE 64

Solving Jigsaw Puzzles

67

slide-65
SLIDE 65

Rotation

68

slide-66
SLIDE 66

Rotation

69

slide-67
SLIDE 67

Rotation

70

slide-68
SLIDE 68

Rotation

71

slide-69
SLIDE 69

Rotation

72

slide-70
SLIDE 70

Instance Discrimination

attract repel

slide-71
SLIDE 71

Instance Discrimination

attract repel

  • 1. MoCo
  • 2. SimCLR
slide-72
SLIDE 72

Momentum Contrast (MoCo)

slide-73
SLIDE 73

Momentum Contrast (MoCo)

slide-74
SLIDE 74

Momentum Contrast (MoCo)

slide-75
SLIDE 75

Momentum Contrast (MoCo)

slide-76
SLIDE 76

Momentum Contrast (MoCo)

slide-77
SLIDE 77

Momentum Contrast (MoCo)

slide-78
SLIDE 78

Results

82

slide-79
SLIDE 79

Results – Object Detection

83

slide-80
SLIDE 80

We have come a long way

  • ML background – error decomposition, overfitting, features, etc.
  • Linear classifiers

– & softmax

  • Computation Graph
  • Gradient Descent
  • Adding layers
  • Backpropagation, automatic differentiation
  • Optimization – regularization/normalization (batch norm, dropout), augmentation,

different optimizers (adam, adagrad, etc.)

  • Convolution and Pooling layers
  • Modern CNNs - AlexNet, VGG, Inception, ResNet
  • 3D CNNs
  • Recurrent Neural Networks and LSTMs

– NLP, word/sentence vectors, attention, etc.

  • Unsupervised feature learning
  • Generative models (GANs, VAEs)
  • Deep Reinforcement Learning
  • Other applications: Few-Shot Learning, structure

(C) Zsolt Kira 84

slide-81
SLIDE 81

Things to Watch out For

  • Research is cyclical

– SVMs, boosting, probabilistic graphical models & Bayes Nets, Structural Learning, Sparse Coding, Deep Learning – Deep learning is unique in its depth and breadth, but... – Deep learning may be improved, reinvented, combined, overtaken

  • Learn fundamentals for techniques across the field:

– Know the span of ML techniques and choose the ones that fit your problem! – Be responsible in 1) how you use it, 2) promises you make and how you convey it

  • Try to understand landscape of the field

– Look out for what is coming up next, not where we are

  • Have fun!
slide-82
SLIDE 82

Some current/upcoming topics

  • Current / Recent Past

– AutoML – Meta-learning – Unsupervised, semi-supervised, domain adaptation, zero/one/few-shot learning – Continual/lifelong learning without forgetting – Memory – Visual question answering, embodied question answering – Adversarial Examples

  • More recent

– Deep Learning and logic! – Deep Learning and SAT problems – World modeling, learning intuitive/physics models – Visual dialogue, agents, chatbots – Fixing reinforcement learning

  • First you have to admit you have a problem
  • Exploration and world modeling

– Simulation frameworks, joint perception, planning, and action

  • Navigation, mapping

– Just scaling everything up and watching the magic!