Dioptics: a common generalization of Learners Motivation Simple - - PowerPoint PPT Presentation

dioptics a common generalization of
SMART_READER_LITE
LIVE PREVIEW

Dioptics: a common generalization of Learners Motivation Simple - - PowerPoint PPT Presentation

Dioptics, etc. @davidad Overview Gradient-Based Dioptics: a common generalization of Learners Motivation Simple Essence gradient-based learners and open games Abstract version Reconstitution Backpropagation Categories of


slide-1
SLIDE 1

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

1 / 42

Dioptics: a common generalization of gradient-based learners and open games

David A. Dalrymple @davidad

Protocol Labs

SYCO 5 Birmingham, UK 2019-09-05

slide-2
SLIDE 2

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

2 / 42

About This Talk

  • Clarifying connections between (a lot of) prior work
  • Besides abstractions, main novelty: generalizing backpropagation and gradient

descent to Lie groups and framed Riemannian manifolds

  • Work in progress; dubious provenance
slide-3
SLIDE 3

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

3 / 42

Haven’t I seen this talk already?

There is a lot of overlap with Jules’ talk earlier. A couple differences:

  • I only deal with trivializable bundles, TX ∼

= X × X′

  • I’m aiming to cover more than just backpropagation
slide-4
SLIDE 4

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

5 / 42

Notations

  • Composition:
  • f g
  • (x) ≡ g( f (x)) ≡ x f g
  • homs: C
  • A, B
  • means homC
  • A, B
  • . [A, B] denotes the internal hom from A to B.

A ⊸ B denotes the space of (literally) linear maps from A to B.

  • Definitions:

eval

  • name

X,Y

  • variables

:

  • (X ⊸ Y) ⊗ X
  • → Y
  • type

:= f , x

bindings

→ f (x)

  • expression

means the same as evalX,Y :

  • (X ⊸ Y) ⊗ X
  • → Y

evalX,Y f , x = f (x)

slide-5
SLIDE 5

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

6 / 42

Section 1 Gradient-Based Learners

slide-6
SLIDE 6

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

7 / 42

Machine Learning in 60 seconds

∗ ∗ + σ

  • A (supervised) machine learning problem.

is a function approximation problem.

  • A pretty practical class of functions to

approximate things with is neural nets.

  • Deep learning is, in part, about composing
  • layers. The deepness is (sequential)

composition depth.

  • Modern deep learning (e.g. TensorFlow,

PyTorch) uses computational graphs. How much of modern deep learning can be understood from this perspective?

slide-7
SLIDE 7

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

8 / 42

Backpropagation

∗ ∗ + σ ∗′ ∗′ +′ σ′

  • Forward pass computes x → y
  • Backward pass computes d–

dx ← d– dy

  • Technically, the name “backpropagation”

implies codomain R. Else, reverse-mode automatic differentiation.

slide-8
SLIDE 8

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

9 / 42

Two ideas about how "backpropagation is a functor":

“Simple Essence of Automatic Differentiation” arXiv:1804.00746 [cs.PL] Conal Elliott “Backprop as Functor” (presented at SYCO 1!) arXiv:1711.10455 [math.CT] Brendan Fong, David Spivak, Rémy Tuyéras How do these relate?

slide-9
SLIDE 9

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

10 / 42

What’s a Derivative?

  • Elliott constructs a “derivative” functor D+

For X, Y : Euc, f : X → Y, let Df : (

x

  • X

f ′(x):=g

  • (X ⊸ Y)) := x → the unique linear g s.t.

lim

ε→0

  • f (x + ε) −
  • f (x) + f ′(x)(ε)
  • ε

= 0 Chain rule: D( f g)(x) = Df (x) Dg( f (x)) Problem – not functorial: depends on un-D’d f . Let D+f : X →

  • Y × (X ⊸ Y)
  • := x →
  • f (x), Df (x)
  • Proposition (Elliott). D+ is a symmetric monoidal

functor from Euc into a category with objects of Euc and morphisms of type X →Euc

  • Y × (X ⊸ Y)
  • .
slide-10
SLIDE 10

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

11 / 42

What do we really need to assume?

We can work in any category E which...

  • is cartesian closed and locally cartesian closed
  • has a product-preserving endofunctor T (given a space X : E, TX is interpreted as

its tangent bundle)

  • has a “base point” natural transformation p : ∀X. TX → X (that is, p : T ⇒ idE).
  • has a semiadditive subcategory EVect of “vector-like spaces” enriched in E
  • has a subcategory ETriv of “trivializable spaces” s.t. for all X : ETriv, there is

some X′ : EVect satisfying the isomorphism (of bundles over X) TX ∼ = X × X′.

  • Observation: TX ∼

= X × X′ looks like a constant-complement lens TX X

  • satisfies one last hard-to-state assumption about “linearity of derivatives”
slide-11
SLIDE 11

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

12 / 42

What’s a Derivative, Again?

  • If X, Y : CTriv, then T( f : X → Y) : TX →TY ∼

= X × X′ → Y × Y ′

  • By naturality of base-point projection p : T– ⇒ –, we have Tf x, · = f (x), ·.
  • Therefore Tf x, x′ =
  • f (x), π2Tf x, x′
  • .
  • So we can define T +f : X →
  • Y × (X′ → Y ′)
  • := x →
  • f (x), λx′.π2Tf x, x′
  • .
  • Our last assumption is that T +(f )(x) is, in fact, a linear map X′ ⊸ Y ′.
  • Then T +f : X →
  • Y × (X′ ⊸ Y ′)
  • , just like Elliott’s D+.
slide-12
SLIDE 12

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

13 / 42

All That and a Pony

Two ways to instantiate those assumptions:

1 E can be the microlinear spaces of a well-adapted model of synthetic differential

geometry, like the Dubuc/Cahiers topos

  • Here, TX is representable as [D, X] where D is the infinitesimal interval

2 E can be the category of diffeological spaces due to Souriau

In either case, ETriv includes all manifolds with trivializable tangent bundles, e.g.

  • open subsets of Euclidean spaces
  • affine spaces
  • Lie groups
  • framed manifolds
slide-13
SLIDE 13

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

14 / 42

Backpropagation (Reverse-mode automatic differentiation)

(forward-mode) T +f : X →

  • Y × (X′ ⊸ Y ′)
  • T ⊳

Z ( f : X → Y) : X →

  • Y ×
  • k
  • (Y ′ ⊸ Z′) , (

d

  • X′ ⊸ Z′)
  • :=

x →

  • f (x), k → d →
  • x, d
  • γ−1

X Tf γY π2 k

  • where γX : TX ∼

= X × X′, γY : TY ∼ = Y × Y ′.

T ⊳

Z f

ev value in value out gradient out gradient in k : Y ′ ⊸ Z′ Y X X′ ⊸ Z′

T ⊳

Z : ETriv → OpticE := X →

  • X, X′ ⊸ Z′
slide-14
SLIDE 14

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

15 / 42

“Categories of Optics”

Definition [Ril18]. In any symmetric monoidal category C, OpticC

  • (X, X−), (Y, Y −)
  • :=

M:C C(X, M ⊗ Y) × C(M ⊗ Y −, X−) Theorem (Riley). OpticC is a symmetric monoidal category with objects of C × Cop. If C is cartesian, OpticC is equivalent to LensC

  • (X, X−), (Y, Y −)
  • := C(X, Y)
  • get

× C(X × Y −, X−)

  • put

If C is symmetric monoidal closed, OpticC is equivalent to CurriedLensC

  • (X, X−), (Y, Y −)
  • := X →
  • Y ⊗ [Y −, X−]
  • Theorem (de Paiva). If C is cartesian closed and locally cartesian closed, OpticC is a

symmetric monoidal closed category, with internal hom defined as (X, X−) C (Y, Y −) =

  • X, Y × [Y −, X−]C
  • C , X × Y −
slide-15
SLIDE 15

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

16 / 42

Backpropagation (Reverse-mode automatic differentiation)

(forward-mode) T +f : X →

  • Y × (X′ ⊸ Y ′)
  • T ⊳

Z ( f : X → Y) : X →

  • Y ×
  • k
  • (Y ′ ⊸ Z′) , (

d

  • X′ ⊸ Z′)
  • :=

x →

  • f (x), k → d →
  • x, d
  • γ−1

X Tf γY π2 k

  • where γX : TX ∼

= X × X′, γY : TY ∼ = Y × Y ′.

T ⊳

Z f

ev value in value out gradient out gradient in k : Y ′ ⊸ Z′ Y X X′ ⊸ Z′

T ⊳

Z : ETriv → OpticE := X →

  • X, X′ ⊸ Z′

OpticE

  • (X, X−), (Y, Y −)

= X →

  • Y × [Y −, X−]
  • X− := (X′ ⊸ Z′)
slide-16
SLIDE 16

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

17 / 42

“Backprop as Functor”: Learn

Definition [FST17]. Given X, Y : Set, a learner ℓ from X → Y is defined by: Sℓ : Set the parameter space Iℓ : Sℓ × X → Y the implementation function rℓ : Sℓ × X × Y → X the request function Uℓ : Sℓ × X × Y → Sℓ the update function

fcurry f

  • p, −
  • parameters/

strategies inputs/

  • bservations
  • utputs/

moves S X Y fcurry

slide-17
SLIDE 17

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

17 / 42

“Backprop as Functor”: Learn

Definition [FST17]. Given X, Y : Set, a learner ℓ from X → Y is defined by: Sℓ : Set the parameter space Iℓ : Sℓ × X → Y the implementation function rℓ : Sℓ × X × Y → X the request function Uℓ : Sℓ × X × Y → Sℓ the update function Equivalently, a learner ℓ : X → Y is exactly

  • a family of lenses, i.e. a set Sℓ and for each s : Sℓ a lens ℓs : (X, X) Set (Y, Y)
  • Uℓ : Sℓ × X × Y → Sℓ

Observation: Also equivalently, a learner ℓ : X → Y is exactly a set Sℓ and a lens (Sℓ, Sℓ) Set

  • (X, X) Set (Y, Y)
  • , or
  • (Sℓ, Sℓ) ⊗ (X, X)
  • Set
  • Y, Y
  • .

Proposition [FST17]. There is a symmetric monoidal category Learn whose objects are sets and whose morphisms are equivalence classes of learners.

slide-18
SLIDE 18

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

18 / 42

Section 2 Dioptics

slide-19
SLIDE 19

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

19 / 42

Dbl(C)

Definition (Sprunger, Katsumata). Given a cartesian category C, the double category Dbl(C) has

  • One 0-cell, written as ·
  • Horizontal and vertical 1-cells both given by objects of C, composed with ×C,

with identity given by the terminal object in C.

  • A 2-cell with boundary X, Y, S, S′ is given by a morphism C
  • S × X, S′ × Y
  • .
slide-20
SLIDE 20

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

20 / 42

Dbl(OpticC)

2-cells/tiles of Dbl(OpticC) are morphisms OpticC

  • (S, S−) ⊗ (X, X−), (S′, S′−) ⊗ (Y, Y −)
  • .

If we look only at tiles with trivial vertical codomain (monoidal unit), we get OpticC

  • (S × X, S− × X−), (Y, Y −)
  • , exactly the desired structure:

ℓ0 r0 ℓ1 r1 parameters/ strategies updates inputs/

  • bservations
  • utputs/

moves request/ feedback gradient/ coutility S S− X X− Y Y − ℓ0 r0

slide-21
SLIDE 21

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

21 / 42

Quotienting by equivalence of parameter space S

If composition of parameterized morphisms involves Cartesian-product-ing their parameter spaces, then associativity of composition does not (directly, strictly) hold. Ways to solve this:

  • Make the parameter space into an “opaque” or “existential” type:

1 Explicit meta-theoretic quotient (as for Learn, Para, Game) 2 Bind it with a coend (as for Optic) — this is what I do for now with DiopticF,G

  • Give up strict associativity; define a bicategory instead.

(2-morphisms are reparameterizations.)

  • Construct the double category Dbl(C), using monoidal strictification.

Question: how do we recover a symmetric monoidal category?

slide-22
SLIDE 22

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

22 / 42

Recovering a symmetric monoidal category from Dbl(OpticC)

Proposed approach: Cat(Cat) ? − → SymMon2Cat

forgetful

− − − − → SymMonCat

slide-23
SLIDE 23

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

22 / 42

Recovering a symmetric monoidal category from Dbl(OpticC)

slide-24
SLIDE 24

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

23 / 42

DiopticF,G

ℓ0 r0 ℓ1 r1 π1G¨ S π2G¨ S π1F ¨ X π2F ¨ X π1F ¨ Y π2F ¨ Y F ¨ Y F ¨ X inputs/

  • bservations

gradient/ feedback/ request

  • utputs/

moves gradient/ coutility updates parameters/ strategies G¨ S ¨ S ¨ X ¨ Y

  • ℓ0

r0

Construction takes as input:

  • C a cartesian closed,

locally cartesian closed category

  • S, T symmetric

monoidal categories

  • F : T → OpticC,

G : S → OpticC are symmetric oplax monoidal functors

  • Canonical

embedding (C×Cop) ֒ → OpticC can be useful

  • Conjecture: If F is

strong symmetric monoidal, DiopticF,G is symmetric monoidal. DiopticF,G : Top×T → Set := (¨ X, ¨ Y) → ¨

S:S

OpticC(G¨ S, F ¨ X C F ¨ Y) = ¨

S:S

OpticC(G¨ S×F ¨ X, F ¨ Y)

slide-25
SLIDE 25

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

24 / 42

Para as DiopticFwd,Fwd

Let E be the Dubuc topos or the category of diffeological spaces; then let Fwd : Euc → E × Eop := X → (X, 1) Then we have DiopticFwd,Fwd(X, Y) = S:Euc OpticC

  • (S, 1), (X, 1) C (Y, 1)

= S:Euc LensC((S, 1) ⊗ (X, 1), (Y, 1)) ∼ = S:Euc C(S × X, Y)

slide-26
SLIDE 26

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

25 / 42

GradLearn := DiopticT ⊳

R ,T ⊳ R

Let E be the Dubuc topos or the category of diffeological spaces, with ETriv the subcategory with trivializable bundles (TX ∼ = X × X′). Then T ⊳

R : ETriv → OpticE = X →

  • X, X′ ⊸ R
  • We have

DiopticT ⊳

R ,T ⊳ R (X, Y) =

S:ETriv OpticE

  • (S, S′ ⊸ R), (X, X′ ⊸ R) E (Y, Y ′ ⊸ R)

= S:ETriv LensE

  • (S, S′∗) ⊗ (X, X′∗), (Y, Y ′∗)

= S:ETriv LensE

  • (S × X, S′∗ × X′∗), (Y, Y ′∗)

= S:ETriv E

  • S × X, Y
  • × E
  • S × X × Y ′∗, S′∗ × X′∗

T ⊳

Z is strong symmetric monoidal: (X × Y)′ ⊸ Z ∼

= (X′ ⊸ Z) × (Y ′ ⊸ Z), due to product-preservation of T and semiadditivity of EVect.

slide-27
SLIDE 27

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

26 / 42

Learn as Dioptic∆⇄

Set,∆⇄ Set

Let ∆⇄

Set : Core

  • Set
  • → OpticSet := X → (X, X)

Then we have Dioptic∆⇄

Set,∆⇄ Set(X, Y) =

S:Set OpticSet

  • (S, S), (X, X) Set (Y, Y)
  • =

S:Set Set

  • S × X, Y
  • × Set
  • S × X × Y, S × X

= Learn(X, Y)

slide-28
SLIDE 28

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

27 / 42

Gradient descent

Earlier, I said instead of computing an unknown Y ′ from an unknown X′, we want to compute X′ ⊸ R (that is, X′∗) from Y ′ ⊸ R (that is, Y ′∗). Actually, we want to compute a new value of type S! Fortunately, we have a covector c : S′ ⊸ R to work with. Steps to compute a new value for S, assuming S is equipped with a Riemannian structure (a symmetric, nonnegative, nondegenerate bilinear form g : S′ × S′ ⊸ R:

  • There exists a unique vector v such that c = λd.g(v, d).
  • Scale the vector by an arbitrary learning rate η : R (and probably −1, if you’re

minimizing a loss).

  • Handling hyperparameters like η internal to the theory is very WIP, but should

work.

  • Using the Riemannian structure, compute the unique torsion-free Levi-Civita

connection for parallel transport.

  • Apply some appropriate theorem for the existence and uniqueness of differential

equation solutions to integrate the tangent vector −ηv along a geodesic a(t) starting from the current parameter state si : S.

  • Let si+1 := a(1).

By vertically composing all that machinery on top of a gradient-based learner of type (S, S′∗) (X, X′∗) (Y, Y ′∗), we obtain a dioptic (S, S) (X, X′∗) (Y, Y ′∗).

slide-29
SLIDE 29

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

28 / 42

Gradient descent

slide-30
SLIDE 30

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

29 / 42

Getting to Learn

  • We now have (S, S) (X, X′∗) (Y, Y ′∗), but in Learn we have dioptics of

type (S, S) (X, X) (Y, Y).

  • Getting to learn requires a bit of a hack—we need to package up the loss function

and the gradient of its inverse into every morphism (tile). This introduces a lot of unnecessary operations, and the same is true for [FST17]’s original functor from Para → Learn. Given a positive number η : R (the step size) and a differentiable function e(x, y) : R × R → R (the loss function) such that ∂e

∂x(z, −) : R → R is invertible

∀z : R, we can define a faithful, injective-on-objects, symmetric monoidal functor Le,η : Para → Learn that sends each parametrised function f : S × X → Y to the learner (S, f , Uf , rf ) defined by Uf (s, x, y) := s − η∇s

  • je
  • f (s, x)j, yj
  • rf (s, x, y) := fx
  • ∇x
  • je
  • f (s, x)j, yj
  • where fx is componentwise application of the inverse to ∂e

∂x(xi, −) for each i.

  • The same trick works in a dioptic context, but only for bona fide Euclidean

spaces.

slide-31
SLIDE 31

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

30 / 42

Section 3 Open Games

slide-32
SLIDE 32

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

31 / 42

“Compositional Game Theory”: Game

Definition [GHWZ18]. Given X, X−, Y, Y − : Set, an open game G from (X, X−) → (Y, Y −) is defined by: SG : Set the strategy profile space PG : SG × X → Y the play function CG : SG × X × Y − → X− the coplay function EG : SG × X × (Y → Y −) → SG → 2 the equilibrium function We define these auxiliary functors, with codomain Set × Setop: E+ := S → (S, 2), C+ := (X, X−) → (X, [X, X−]) B+ := E+ C+ = S → (S, [S, 2]) The oplaxator of E+ is defined using conjunction: E+.∆S,T : (S×T,2)

  • E+(S × T) →Set×Setop

(S×T,2×2)

  • E+S ⊗ E+T :=
  • (s, t) → (s, t), (a ∧ b) ← (a, b)
  • Conjecture. Game has a faithful, identity-on-objects functor into DiopticC+,B+.
slide-33
SLIDE 33

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

32 / 42

Open Games as Dioptics

DiopticC+,B+

  • (X, X−), (Y, Y −)
  • =

¨

S:Set

OpticSet

  • B+¨

S, C+(X, X−) Set C+(Y, Y −)

  • =

S:Set OpticSet

  • S, S → 2
  • ,
  • X, X → X−

Set

  • Y, Y → Y −

∼ = S:Set OpticSet   S, S → 2

  • ,
  • X →
  • Y ×
  • (Y → Y −) → (X → X−)
  • , X × (Y → Y −)

  ∼ = S:Set Set

  • S, X →
  • Y ×
  • (Y → Y −) → (X → X−)
  • × Set
  • S × X × (Y → Y −), (S → 2)

= S:Set Set

  • S × X, Y
  • × Set
  • S × X × (Y → Y −), (X → X−)
  • × Set
  • X × (Y → Y −), (S × S → 2)
  • S:Set

Set

  • S × X, Y
  • × Set
  • S × X × (Y → Y −), (X → X−)
  • × Set
  • X × (Y → Y −), (S × S → 2)
  • φ

← −

  • S:Set

play function P

  • Set
  • S × X, Y
  • ×

coplay function C

  • Set
  • S × X × Y −, X−

×

best-response function B

  • Set
  • X × (Y → Y −), (S × S) → 2

֓ Game

  • (X, X−), (Y, Y −)
  • where

φ :=

  • S, P, C, B
  • S, P,
  • s, ❆

x, k

  • → x → C
  • s, x, k
  • P(s, x)
  • , B
  • φ← :=
  • S, P, K, B
  • S, P,
  • s, x, Y −

→ K

  • s, x, (y → Y −)
  • (x), B
slide-34
SLIDE 34

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

33 / 42

Dishonest morphisms

EG PG CG strategies equilib?

  • bservations

moves continuation continuation S [S, 2] X [X, X−] Y [Y, Y −] EG

  • Nothing I’ve done says the continuation that’s output to the left has to be true.
  • The sequential composition rule holds up, but a “dishonest” tile can corrupt a

whole diagram out of the subcategory that corresponds to Game.

slide-35
SLIDE 35

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

34 / 42

But it’s not even monoidal

  • Unfortunately DiopticC+,B+ fails to be monoidal, because C+ is not a strong

monoidal functor (merely bilax monoidal and Frobenius monoidal)

  • There are natural transformations

µ(X,X−),(Y,Y −) :

C+((X,X−))

  • X, [X, X−]

C+((Y,Y −))

  • Y, [Y, Y −]

C+((X,X−)⊗(Y,Y −))

  • X × Y, [X × Y, X− × Y −]
  • and

∆(X,X−),(Y,Y −) :

C+((X,X−))

  • X, [X, X−]

C+((Y,Y −))

  • Y, [Y, Y −]

C+((X,X−)⊗(Y,Y −))

  • X × Y, [X × Y, X− × Y −]
  • but they are not inverses
  • The backwards part (put) of µ, of type

X × Y × [X × Y, X− × Y −] → [X, X−] × [Y, Y −] is lossy

  • As a result, in DiopticC+,B+, idX ⊗ idY ∼

= idX⊗Y

slide-36
SLIDE 36

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

35 / 42

Is there a different way?

  • Problem: passing X → X− to f and Y → Y − to g loses information about the

joint dependency X × Y → X− × Y −.

  • Perhaps continuations can be upgraded to some kind of “nominal diagrams” that

express dependencies on all uncles, and from which joint information can be recovered.     7 6 5 3 1 8 4    

  • Decorated cospans?
slide-37
SLIDE 37

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

36 / 42

Future work

  • Actually proving stuff
  • Working out the quotienting machinery
  • Blue-sky idea: if it works, can it replace coend in the definition of Optic itself?

OpticC

  • (X, X−), (Y, Y −)
  • :=

M:C C(X, M ⊗ Y) × C(M ⊗ Y −, X−)

  • Proving stuff in Coq
  • Generalizing to nontrivializable bundles (merge with Jules’)
  • Trying more computable base fields than R
slide-38
SLIDE 38

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

37 / 42

Characterizing truthfulness

  • Would be nice to axiomatize which dioptics are in the image of the faithful

functor Game ֒ → DiopticC+,B+

  • Naïvely, might hope that “truthful” ∼ “lawful”, but there seems to be no

applicable definition of “lawful”

slide-39
SLIDE 39

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

38 / 42

Synthesizing functors between categories of dioptics

  • The alleged functors

T ∗ : Para ∼ = DiopticFwd,Fwd → DiopticT ⊳

R ,T ⊳ R =: GradLearn

L∗

e,η : GradLearn := DiopticT ⊳

R ,T ⊳ R → Dioptic∆⇄ Set,∆⇄ Set

∼ = Learn D∗

η : GradLearn := DiopticT ⊳

R ,T ⊳ R → DiopticT ⊳ R ,∆⇄ Set =: GradDesc

all go from one category of dioptics to another.

  • Is there a generic “recipe”?
slide-40
SLIDE 40

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

39 / 42

Nonsmooth activation functions

x ReLU y

−2 −1 1 2 −2 −1 1 2

ReLU(x) := max{x, 0}

  • ReLU (“Rectified Linear Unit”) is a

pervasive ML primitive At least 5 ways to handle:

0 Pretend ReLU′(0) := 1 1 Smooth almost everywhere 2 Subdifferentiable 3 Semismooth from the right 4 ReLU′(0) := ⊥

slide-41
SLIDE 41

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

40 / 42

Selected References

Riley, Mitchell. Categories of Optics, 7th Sept. 2018, arXiv: 1809.00738v2 [math.CT] (cited on p. 14). Fong, Brendan, David I. Spivak and Rémy Tuyéras. Backprop as Functor: A compositional perspective on supervised learning, 13th Dec. 2017, arXiv: 1711.10455v2 [math.CT] (cited on pp. 16, 17, 30). Ghani, Neil, Jules Hedges, Viktor Winschel and Philipp Zahn. “Compositional game theory”, Logic in Computer Science, LICS ’18 (Oxford, UK), 9th–12th July 2018, doi: 10.1145/3209108.3209165 (cited on p. 32); preliminary version on arXiv: 1603.04641v3 [cs.GT] (15th Mar. 2016). Elliott, Conal. “The simple essence of automatic differentiation”,

  • Proc. ACM on Programming Languages,

ICFP 2018 (St. Louis, MO, USA), vol. 2.70, 24th–26th Sept. 2018, doi: 10.1145/3236765; extended version on arXiv: 1804.00746v4 [cs.PL]; url: http://conal.net/papers/essence-of-ad/ (Mar. 2018). Sprunger, David and Shin-ya Katsumata. Differentiable Causal Computations via Delayed Trace, 4th Mar. 2019, arXiv: 1903.01093v1 [cs.LO].

slide-42
SLIDE 42

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

41 / 42

Acknowledgments

  • Thanks to Eliana Lorch for key insights
  • Thanks to Jules Hedges, David Spivak, and Brendan Fong for support and

conversations.

slide-43
SLIDE 43

Dioptics, etc. @davidad Overview Gradient-Based Learners

Motivation “Simple Essence” Abstract version Reconstitution Backpropagation “Categories of Optics” “Backprop as Functor”

Dioptics

Dbl(C) Dbl(OpticC) Quotienting DiopticF,G Gradient descent Getting to Learn

Open Games

“Compos. Game Thy.” As Dioptics Caveat

Future work

Truthfulness? Functor recipe? ReLU

References Thanks

42 / 42

Thank you for your attention!

David A. Dalrymple @davidad

Protocol Labs

SYCO 5 Birmingham, UK 2019-09-05