ECE 6504: Deep Learning for Perception Topics: Recurrent Neural - - PowerPoint PPT Presentation

ece 6504 deep learning for perception
SMART_READER_LITE
LIVE PREVIEW

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural - - PowerPoint PPT Presentation

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial Dhruv Batra Virginia Tech Administrativia HW3


slide-1
SLIDE 1

ECE 6504: Deep Learning for Perception

Dhruv Batra Virginia Tech

Topics:

– Recurrent Neural Networks (RNNs) – BackProp Through Time (BPTT) – Vanishing / Exploding Gradients – [Abhishek:] Lua / Torch Tutorial

slide-2
SLIDE 2

Administrativia

  • HW3

– Out today – Due in 2 weeks – Please please please please please start early – https://computing.ece.vt.edu/~f15ece6504/homework3/

(C) Dhruv Batra 2

slide-3
SLIDE 3

Plan for Today

  • Model

– Recurrent Neural Networks (RNNs)

  • Learning

– BackProp Through Time (BPTT) – Vanishing / Exploding Gradients

  • [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 3

slide-4
SLIDE 4

New Topic: RNNs

(C) Dhruv Batra 4

Image Credit: Andrej Karpathy

slide-5
SLIDE 5

Synonyms

  • Recurrent Neural Networks (RNNs)
  • Recursive Neural Networks

– General familty; think graphs instead of chains

  • Types:

– Long Short Term Memory (LSTMs) – Gated Recurrent Units (GRUs) – Hopfield network – Elman networks – …

  • Algorithms

– BackProp Through Time (BPTT) – BackProp Through Structure (BPTS)

(C) Dhruv Batra 5

slide-6
SLIDE 6

What’s wrong with MLPs?

  • Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs – No temporal structure

  • Problem 2: Pure feed-forward processing

– No “memory”, no feedback

(C) Dhruv Batra 6

Image Credit: Alex Graves, book

slide-7
SLIDE 7

Sequences are everywhere…

(C) Dhruv Batra 7

Image Credit: Alex Graves and Kevin Gimpel

slide-8
SLIDE 8

(C) Dhruv Batra 8

Even where you might not expect a sequence…

Image Credit: Vinyals et al.

slide-9
SLIDE 9

Even where you might not expect a sequence…

9

Image Credit: Ba et al.; Gregor et al

  • Input ordering = sequence

(C) Dhruv Batra

slide-10
SLIDE 10

(C) Dhruv Batra 10

Image Credit: [Pinheiro and Collobert, ICML14]

slide-11
SLIDE 11

Why model sequences?

Figure Credit: Carlos Guestrin

slide-12
SLIDE 12

Why model sequences?

(C) Dhruv Batra 12

Image Credit: Alex Graves

slide-13
SLIDE 13

Name that model

Hidden Markov Model (HMM)

Y1 = {a,…z} X1 = Y5 = {a,…z} Y3 = {a,…z} Y4 = {a,…z} Y2 = {a,…z} X2 = X3 = X4 = X5 =

Figure Credit: Carlos Guestrin (C) Dhruv Batra 13

slide-14
SLIDE 14

How do we model sequences?

  • No input

(C) Dhruv Batra 14

Image Credit: Bengio, Goodfellow, Courville

slide-15
SLIDE 15

How do we model sequences?

  • With inputs

(C) Dhruv Batra 15

Image Credit: Bengio, Goodfellow, Courville

slide-16
SLIDE 16

How do we model sequences?

  • With inputs and outputs

(C) Dhruv Batra 16

Image Credit: Bengio, Goodfellow, Courville

slide-17
SLIDE 17

How do we model sequences?

  • With Neural Nets

(C) Dhruv Batra 17

Image Credit: Alex Graves

slide-18
SLIDE 18

How do we model sequences?

  • It’s a spectrum…

(C) Dhruv Batra 18

Input: No sequence Output: No sequence Example: “standard” classification / regression problems Input: No sequence Output: Sequence Example: Im2Caption Input: Sequence Output: No sequence Example: sentence classification, multiple-choice question answering Input: Sequence Output: Sequence Example: machine translation, video captioning, open- ended question answering, video question answering Image Credit: Andrej Karpathy

slide-19
SLIDE 19

Things can get arbitrarily complex

(C) Dhruv Batra 19

Image Credit: Herbert Jaeger

slide-20
SLIDE 20

Key Ideas

  • Parameter Sharing + Unrolling

– Keeps numbers of parameters in check – Allows arbitrary sequence lengths!

  • “Depth”

– Measured in the usual sense of layers – Not unrolled timesteps

  • Learning

– Is tricky even for “shallow” models due to unrolling

(C) Dhruv Batra 20

slide-21
SLIDE 21

Plan for Today

  • Model

– Recurrent Neural Networks (RNNs)

  • Learning

– BackProp Through Time (BPTT) – Vanishing / Exploding Gradients

  • [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 21

slide-22
SLIDE 22

BPTT

  • a

(C) Dhruv Batra 22

Image Credit: Richard Socher

slide-23
SLIDE 23

Illustration [Pascanu et al]

  • Intuition
  • Error surface of a single hidden unit RNN; High curvature walls
  • Solid lines: standard gradient descent trajectories
  • Dashed lines: gradient rescaled to fix problem

(C) Dhruv Batra 23

slide-24
SLIDE 24

Fix #1

  • Pseudocode

(C) Dhruv Batra 24

Image Credit: Richard Socher

slide-25
SLIDE 25

Fix #2

  • Smart Initialization and ReLus

– [Socher et al 2013] – A Simple Way to Initialize Recurrent Networks of Rectified Linear Units, Le et al. 2015

(C) Dhruv Batra 25