Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26
Abstractions and Frameworks for Deep Learning: a Discussion - - PowerPoint PPT Presentation
Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano, TensorFlow, et al. Rmi Emonet Saintlyon Deep Learning Workshop 2015-11-26 Disclaimer Introductory Poll Did you ever use? Caffe Theano Lasagne
Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26
Caffe Theano Lasagne Torch TensorFlow Other?
3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
5 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
6 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Input i Output o Function f given Parameters θ to be learned We suppose: o = f (i)
need some regularity assumptions usually, at least differentiability
θ θ
7 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
we suppose: o = f (i) we have examples of inputs i and target output t we want to minimize the sum of errors L(θ) =
we suppose f and L are differentiable
start with a random θ compute the gradient and update θ
stochastic gradient descent (SGD) conjugate gradient descent BFGS L-BFGS ...
θ n n n
θ n n t+1 t θ
8 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
2 layers: o = f (i) = f (f (i)) 3 layers: o = f (i) = f (f (f (i)))
with all f differentiable
θ θ2 2 θ1 1 θ θ3 3 θ2 2 θ1 1 θ θK K θ3 3 θ2 2 θ1 1 l ′ ′ ′ ′ ′ ′
9 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Reminders: K layers: o = f (i) = f
minimize the sum of errors L(θ) =
L(f (i ), t )
chain rule
= ⋅ Goal: compute ∇ L for gradient descent ∇ L = = ∇ L = = ∇ L = = ⋯
: gradient of the loss with respect to its input ✔ : gradient of a function with respect to its input ✔ : gradient of a function with respect to its parameters ✔ θ θK K θ3 3 θ2 2 θ1 1
n
∑
θ n n
dx df dg df dx dg
θ
θK
dθK dL df K dL dθK df K
θK−1
dθK−1 dL df K dL df K−1 df K dθK−1 df K−1
θ1
dθ1 dL df K dL df K−1 df K df 1 df 2 dθ1 df 1 df K dL df k−1 df k dθk df k
10 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
NN can be deep, CNN can be deep “any” composition of differentiable function can be optimized with gradient descent some other models are also deep... (hierarchical models, etc)
“forward pass” evaluate successively each function
compute the input ($o) gradient (from the output error) for each f , f , ... compute the parameter gradient (from the output gradient) compute the input gradient (from the output gradient) θ θK K θ3 3 θ2 2 θ1 1 θ
1 2
11 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
inputs can be constant inputs can be parameters inputs can be produced by another function (e.g. f(g(x), h(x))) k k k k
12 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
13 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
fully connected layers convolutions layers activation functions (element-wise) soft-max pooling ...
Torch module Theano operator k
14 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Torch tensor Theano tensor, scalars, numpy arrays
15 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
16 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
17 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
install CUDA/Cublas/OpenBlas blob/tensors, blocks/layers/loss, parameters cuDNN
define a composite function (graph) choice of an optimizer forward, backward
write a new operator/module "forward" "backward": gradParam, gradInput
18 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
19 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Ronan Collobert (Idiap, now Facebook) Clement Farabet (NYU, now Madbits now Twitter) Koray Kavukcuoglu (Google DeepMind)
need to learn easy to embed
easy to use difficult to extend, sometimes (merging sources)
20 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
write numpy-like code no forced “layered” architecture computation graph
21 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
22 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
easy to move from a cluster to a mobile phone easy to distribute
23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
24 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
25 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
26 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
27 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al. Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26