Abstractions and Frameworks for Deep Learning: a Discussion - - PowerPoint PPT Presentation

abstractions and frameworks for deep learning a
SMART_READER_LITE
LIVE PREVIEW

Abstractions and Frameworks for Deep Learning: a Discussion - - PowerPoint PPT Presentation

Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano, TensorFlow, et al. Rmi Emonet Saintlyon Deep Learning Workshop 2015-11-26 Disclaimer Introductory Poll Did you ever use? Caffe Theano Lasagne


slide-1
SLIDE 1

Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26

slide-2
SLIDE 2

Disclaimer

slide-3
SLIDE 3

Introductory Poll

Did you ever use?

Caffe Theano Lasagne Torch TensorFlow Other?

Any experience to share?

3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

slide-4
SLIDE 4

Overview

Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

slide-5
SLIDE 5

Overview

Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

5 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

slide-6
SLIDE 6

6 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a Function (supervised)

Notations

Input i Output o Function f given Parameters θ to be learned We suppose: o = f (i)

How to optimize it: how to find the best θ?

need some regularity assumptions usually, at least differentiability

Remark: a more generic view

  • = f (i) = f(θ, i)

θ θ

slide-7
SLIDE 7

7 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Gradient Descent

We want to find the best parameters

we suppose: o = f (i) we have examples of inputs i and target output t we want to minimize the sum of errors L(θ) =

L(f (i ), t )

we suppose f and L are differentiable

Gradient descent (gradient = vector of partial derivatives)

start with a random θ compute the gradient and update θ

= θ − γ∇ L(θ)

Variations

stochastic gradient descent (SGD) conjugate gradient descent BFGS L-BFGS ...

θ n n n

θ n n t+1 t θ

slide-8
SLIDE 8

8 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” Function

Idea

f is a composition of functions

2 layers: o = f (i) = f (f (i)) 3 layers: o = f (i) = f (f (f (i)))

K layers: o = f (i) = f (...f (f (f (i)))...)

with all f differentiable

How can we optimize it? The chain rule! Many versions (with F = f ∘ g)

(f ∘ g) = (f ∘ g) ⋅ g F (x) = f (g(x))g (x) = ⋅

θ θ2 2 θ1 1 θ θ3 3 θ2 2 θ1 1 θ θK K θ3 3 θ2 2 θ1 1 l ′ ′ ′ ′ ′ ′

dx df dg df dx dg

slide-9
SLIDE 9

9 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” Function

Reminders: K layers: o = f (i) = f

(...f (f (f (i)))...)

minimize the sum of errors L(θ) =

L(f (i ), t )

chain rule

= ⋅ Goal: compute ∇ L for gradient descent ∇ L = = ∇ L = = ∇ L = = ⋯

: gradient of the loss with respect to its input ✔ : gradient of a function with respect to its input ✔ : gradient of a function with respect to its parameters ✔ θ θK K θ3 3 θ2 2 θ1 1

n

θ n n

dx df dg df dx dg

θ

θK

dθK dL df K dL dθK df K

θK−1

dθK−1 dL df K dL df K−1 df K dθK−1 df K−1

θ1

dθ1 dL df K dL df K−1 df K df 1 df 2 dθ1 df 1 df K dL df k−1 df k dθk df k

slide-10
SLIDE 10

10 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deep Learning and Composite Functions

Deep Learning?

NN can be deep, CNN can be deep “any” composition of differentiable function can be optimized with gradient descent some other models are also deep... (hierarchical models, etc)

Evaluating a composition f (i) = f

(...f (f (f (i)))...)

“forward pass” evaluate successively each function

Computing the gradient ∇ L (for gradient descent)

compute the input ($o) gradient (from the output error) for each f , f , ... compute the parameter gradient (from the output gradient) compute the input gradient (from the output gradient) θ θK K θ3 3 θ2 2 θ1 1 θ

1 2

slide-11
SLIDE 11

11 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Back to “seeing parameters as inputs”

Parameters (θ ) Just another input of f Can be rewritten, e.g. as f (θ , x) More generic

inputs can be constant inputs can be parameters inputs can be produced by another function (e.g. f(g(x), h(x))) k k k k

slide-12
SLIDE 12

Overview

Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

12 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

slide-13
SLIDE 13

13 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Function/Operator/Layer

The functions that we can use for f Many choices

fully connected layers convolutions layers activation functions (element-wise) soft-max pooling ...

Loss Functions: same with no parameters In the wild

Torch module Theano operator k

slide-14
SLIDE 14

14 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Data/Blob/Tensor

The data: input, intermediate result, parameters, gradient, ... Usually a tensor (n-dimensional matrices) In the wild

Torch tensor Theano tensor, scalars, numpy arrays

slide-15
SLIDE 15

Overview

Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

15 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

slide-16
SLIDE 16

16 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Contenders

Caffe Torch Theano Lasagne Tensor Flow Deeplearning4j ...

slide-17
SLIDE 17

17 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Overview

Basics

install CUDA/Cublas/OpenBlas blob/tensors, blocks/layers/loss, parameters cuDNN

  • pen source

Control flow

define a composite function (graph) choice of an optimizer forward, backward

Extend

write a new operator/module "forward" "backward": gradParam, gradInput

slide-18
SLIDE 18

18 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Caffe

"made with expression, speed, and modularity in mind" "developed by the Berkeley Vision and Learning Center (BVLC)" "released under the BSD 2-Clause license" C++ layers-oriented http://caffe.berkeleyvision.org/tutorial /layers.html plaintext protocol buffer schema (prototxt) to describe models (and so save them too) 1,068 / 7,184 / 4,077

slide-19
SLIDE 19

19 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Torch7

By

Ronan Collobert (Idiap, now Facebook) Clement Farabet (NYU, now Madbits now Twitter) Koray Kavukcuoglu (Google DeepMind)

Lua (+ C)

need to learn easy to embed

Layer-oriented

easy to use difficult to extend, sometimes (merging sources)

418 / 3,267 / 757

slide-20
SLIDE 20

20 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Theano

"is a Python library" "allows you to define, optimize, and evaluate mathematical expressions" "involving multi-dimensional arrays" "efficient symbolic differentiation" "transparent use of a GPU" "dynamic C code generation" Use symbolic expressions: reasoning on the graph

write numpy-like code no forced “layered” architecture computation graph

263 / 2,447 / 878

slide-21
SLIDE 21

21 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Lasagne (Keras, etc)

Overlay to Theano Provide layer API close to caffe/torch etc Layer-oriented 133 / 1,401 / 342

slide-22
SLIDE 22

22 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Tensor Flow

By Google, Nov. 2015 Selling points

easy to move from a cluster to a mobile phone easy to distribute

Currently slow? Not fully open yet? 1,303 / 13,232 / 3,375

slide-23
SLIDE 23

23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deeplearning4j

“Deep Learning for Java, Scala & Clojure on Hadoop, Spark & GPUs“ Apache 2.0-licensed Java High level (layer-oriented) Typed API 236 / 1,648 / 548

slide-24
SLIDE 24

Overview

Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

24 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

slide-25
SLIDE 25

25 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Be creative! anything differentiable can be tried!

slide-26
SLIDE 26

26 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

How to choose a framework?

slide-27
SLIDE 27

27 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Any experience to share?

slide-28
SLIDE 28

Thanks

Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al. Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26