Lecture 5: Convolution Princeton University COS 495 Instructor: - - PowerPoint PPT Presentation

โ–ถ
lecture 5 convolution
SMART_READER_LITE
LIVE PREVIEW

Lecture 5: Convolution Princeton University COS 495 Instructor: - - PowerPoint PPT Presentation

Deep Learning Basics Lecture 5: Convolution Princeton University COS 495 Instructor: Yingyu Liang Convolutional neural networks Strong empirical application performance Convolutional networks: neural networks that use convolution in


slide-1
SLIDE 1

Deep Learning Basics Lecture 5: Convolution

Princeton University COS 495 Instructor: Yingyu Liang

slide-2
SLIDE 2

Convolutional neural networks

  • Strong empirical application performance
  • Convolutional networks: neural networks that use convolution in

place of general matrix multiplication in at least one of their layers for a specific kind of weight matrix ๐‘‹ โ„Ž = ๐œ(๐‘‹๐‘ˆ๐‘ฆ + ๐‘)

slide-3
SLIDE 3

Convolution

slide-4
SLIDE 4

Convolution: math formula

  • Given functions ๐‘ฃ(๐‘ข) and ๐‘ฅ(๐‘ข), their convolution is a function ๐‘ก ๐‘ข
  • Written as

๐‘ก ๐‘ข = โˆซ ๐‘ฃ ๐‘ ๐‘ฅ ๐‘ข โˆ’ ๐‘ ๐‘’๐‘ ๐‘ก = ๐‘ฃ โˆ— ๐‘ฅ

  • r

๐‘ก ๐‘ข = (๐‘ฃ โˆ— ๐‘ฅ)(๐‘ข)

slide-5
SLIDE 5

Convolution: discrete version

  • Given array ๐‘ฃ๐‘ข and ๐‘ฅ๐‘ข, their convolution is a function ๐‘ก๐‘ข
  • Written as
  • When ๐‘ฃ๐‘ข or ๐‘ฅ๐‘ข is not defined, assumed to be 0

๐‘ก๐‘ข = เท

๐‘=โˆ’โˆž +โˆž

๐‘ฃ๐‘๐‘ฅ๐‘ขโˆ’๐‘ ๐‘ก = ๐‘ฃ โˆ— ๐‘ฅ

  • r

๐‘ก๐‘ข = ๐‘ฃ โˆ— ๐‘ฅ ๐‘ข

slide-6
SLIDE 6

Illustration 1

a b c d e f x y z xb+yc+zd ๐‘ฅ = [z, y, x] ๐‘ฃ = [a, b, c, d, e, f]

slide-7
SLIDE 7

Illustration 1

a b c d e f x y z xc+yd+ze

slide-8
SLIDE 8

Illustration 1

a b c d e f x y z xd+ye+zf

slide-9
SLIDE 9

Illustration 1: boundary case

a b c d e f x y xe+yf

slide-10
SLIDE 10

Illustration 1 as matrix multiplication

y z x y z x y z x y z x y z x y a b c d e f

slide-11
SLIDE 11

Illustration 2: two dimensional case

a b c d e f g h i j k l w x y z wa + bx + ey + fz

slide-12
SLIDE 12

Illustration 2

a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz

slide-13
SLIDE 13

Illustration 2

a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz Kernel (or filter) Feature map Input

slide-14
SLIDE 14

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Fully connected layer, ๐‘› ร— ๐‘œ edges ๐‘› output nodes ๐‘œ input nodes

slide-15
SLIDE 15

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Convolutional layer, โ‰ค ๐‘› ร— ๐‘™ edges ๐‘› output nodes ๐‘œ input nodes ๐‘™ kernel size

slide-16
SLIDE 16

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Multiple convolutional layers: larger receptive field

slide-17
SLIDE 17

Advantage: parameter sharing

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

The same kernel are used repeatedly. E.g., the black edge is the same weight in the kernel.

slide-18
SLIDE 18

Advantage: equivariant representations

  • Equivariant: transforming the input = transforming the output
  • Example: input is an image, transformation is shifting
  • Convolution(shift(input)) = shift(Convolution(input))
  • Useful when care only about the existence of a pattern, rather than

the location

slide-19
SLIDE 19

Pooling

slide-20
SLIDE 20

Terminology

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-21
SLIDE 21

Pooling

  • Summarizing the input (i.e., output the max of the input)

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-22
SLIDE 22

Advantage

Induce invariance

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-23
SLIDE 23

Motivation from neuroscience

  • David Hubel and Torsten Wiesel studied early visual system in human

brain (V1 or primary visual cortex), and won Nobel prize for this

  • V1 properties
  • 2D spatial arrangement
  • Simple cells: inspire convolution layers
  • Complex cells: inspire pooling layers
slide-24
SLIDE 24

Variants of convolution and pooling

slide-25
SLIDE 25

Variants of convolutional layers

  • Multiple dimensional convolution
  • Input and kernel can be 3D
  • E.g., images have (width, height, RBG channels)
  • Multiple kernels lead to multiple feature maps (also called channels)
  • Mini-batch of images have 4D: (image_id, width, height, RBG

channels)

slide-26
SLIDE 26

Variants of convolutional layers

  • Padding: valid

a b c d e f x y z xd+ye+zf

slide-27
SLIDE 27

Variants of convolutional layers

  • Padding: same

a b c d e f x y xe+yf

slide-28
SLIDE 28

Variants of convolutional layers

  • Stride

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-29
SLIDE 29

Variants of convolutional layers

  • Others:
  • Tiled convolution
  • Channel specific convolution
  • โ€ฆโ€ฆ
slide-30
SLIDE 30

Variants of pooling

  • Stride and padding

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-31
SLIDE 31

Variants of pooling

  • Max pooling ๐‘ง = max{๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘™}
  • Average pooling ๐‘ง = mean{๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘™}
  • Others like max-out