Probabilistic Programming Hongseok Yang University of Oxford - - PowerPoint PPT Presentation

probabilistic programming
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Programming Hongseok Yang University of Oxford - - PowerPoint PPT Presentation

Probabilistic Programming Hongseok Yang University of Oxford Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. Computer. Produced by Stracheys Love Letter (1952) Generated by


slide-1
SLIDE 1

Probabilistic Programming

Hongseok Yang University of Oxford

slide-2
SLIDE 2

Manchester Univ. 1953

slide-3
SLIDE 3

Manchester Univ. 1953

slide-4
SLIDE 4

Manchester Univ. 1953

slide-5
SLIDE 5

Manchester Univ. 1953

Manchester Univ. Computer. Produced by Strachey’s “Love Letter” (1952)

Generated by the reimplementation in http://www.gingerbeardman.com/loveletter/

slide-6
SLIDE 6

Strachey’s program

Implements a simple randomised algorithm:

  • 1. Randomly pick two opening words.
  • 2. Repeat the following five times:
  • Pick a sentence structure randomly.
  • Fill the structure with random words.
  • 3. Randomly pick closing words.
slide-7
SLIDE 7

Strachey’s Program

Implements a simple randomised algorithm:

  • 1. Randomly pick two opening words.
  • 2. Repeat the following five times:
  • Pick a sentence structure randomly.
  • Fill the structure with random words.
  • 3. Randomly pick closing words.

random N times

  • 1. More randomness.
  • 2. Adjust randomness.

Use data.

slide-8
SLIDE 8

Strachey’s Program

Implements a simple randomised algorithm:

  • 1. Randomly pick two opening words.
  • 2. Repeat the following five times:
  • Pick a sentence structure randomly.
  • Fill the structure with random words.
  • 3. Randomly pick closing words.
  • 1. More randomness.
  • 2. Adjust randomness.

Use data. random N times

slide-9
SLIDE 9

What is probabilistic programming?

slide-10
SLIDE 10

(Bayesian) probabilistic modelling of data

  • 1. Develop a new probabilistic (generative) model.
  • 2. Design an inference algorithm for the model.
  • 3. Using the algo., fit the model to the data.
slide-11
SLIDE 11

(Bayesian) probabilistic modelling of data

  • 1. Develop a new probabilistic (generative) model.
  • 2. Design an inference algorithm for the model.
  • 3. Using the algo., fit the model to the data.

in a prob. prog. language

slide-12
SLIDE 12

(Bayesian) probabilistic modelling of data

  • 1. Develop a new probabilistic (generative) model.
  • 2. Design an inference algorithm for the model.
  • 3. Using the algo., fit the model to the data.

as a program

in a prob. prog. language

slide-13
SLIDE 13

(Bayesian) probabilistic modelling of data

  • 1. Develop a new probabilistic (generative) model.
  • 2. Design an inference algorithm for the model.
  • 3. Using the algo., fit the model to the data.

a generic inference algo.

  • f the language

as a program

in a prob. prog. language

slide-14
SLIDE 14

Line fitting

X Y

slide-15
SLIDE 15

Line fitting

f(x) = s*x + b X Y

slide-16
SLIDE 16

Bayesian generative model

s b yi

i=1..5

slide-17
SLIDE 17

Bayesian generative model

s b yi

i=1..5

s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b yi ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) gi ven y1 .. y5?

slide-18
SLIDE 18

Bayesian generative model

s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b yi ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y1 .. y5?

s b yi

i=1..5

slide-19
SLIDE 19

Bayesian generative model

s b yi

i=1..5

s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b yi ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y1=2.5, …, y5=10.1?

slide-20
SLIDE 20

Posterior of s and b given yi's

P(y1, .., y5 | s,b) × P(s,b) P(y1, .., y5) P(s, b | y1, .., y5) =

slide-21
SLIDE 21

Posterior of s and b given yi's

P(y1, .., y5 | s,b) × P(s,b) P(y1, .., y5) P(s, b | y1, .., y5) =

slide-22
SLIDE 22

Posterior of s and b given yi's

P(y1, .., y5 | s,b) × P(s,b) P(y1, .., y5) P(s, b | y1, .., y5) =

slide-23
SLIDE 23

Posterior of s and b given yi's

P(y1, .., y5 | s,b) × P(s,b) P(y1, .., y5) P(s, b | y1, .., y5) =

slide-24
SLIDE 24

Posterior of s and b given yi's

P(y1, .., y5 | s,b) × P(s,b) P(y1, .., y5) P(s, b | y1, .., y5) =

slide-25
SLIDE 25

Anglican program

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) f))

slide-26
SLIDE 26

Anglican program

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :f f))

slide-27
SLIDE 27

Anglican program

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

slide-28
SLIDE 28

Anglican program

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

slide-29
SLIDE 29

Samples from posterior

X Y

slide-30
SLIDE 30

Why should one care about prob. programming?

slide-31
SLIDE 31

My favourite answer

“Because probabilistic programming is a good way to build an AI.” (My ML colleague)

slide-32
SLIDE 32

Procedural modelling

SOSMC-Controlled Sampling

Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

slide-33
SLIDE 33

Procedural modelling

Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

slide-34
SLIDE 34

Procedural modelling

Asynchronous function call via future Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

slide-35
SLIDE 35

Captcha solving

Noise: N

2011 ( )

  • (

), ).3 Figu

Le, Baydin, Wood [2016]

slide-36
SLIDE 36

Compilation Probabilistic program p; y) Inference Training data ); y)g Test data y Posterior p j y) Training

  • Expensive / slow

Cheap / fast SIS NN architecture Compilation artifact q j y; ) DKL p j y) jj q j y; ))

Le, Baydin, Wood [2016]

slide-37
SLIDE 37

Compilation Probabilistic program p; y) Inference Training data ); y)g Test data y Posterior p j y) Training

  • Expensive / slow

Cheap / fast SIS NN architecture Compilation artifact q j y; ) DKL p j y) jj q j y; ))

Le, Baydin, Wood [2016] Approximating prob. programs by neural nets.

slide-38
SLIDE 38

Nonparametric Bayesian: Indian buffer process

(define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) ))))))

Roy et al. 2008

slide-39
SLIDE 39

Nonparametric Bayesian: Indian buffer process

(define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) ))))))

Roy et al. 2008 Lazy infinite array

slide-40
SLIDE 40

Nonparametric Bayesian: Indian buffer process

(define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) ))))))

Roy et al. 2008 Higher-order parameter

slide-41
SLIDE 41

My research : Denotational semantics

Joint work with Chris Heunen, Ohad Kammar, Sam Staton, Frank Wood [LICS 2016]

slide-42
SLIDE 42

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

slide-43
SLIDE 43

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f)

slide-44
SLIDE 44

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f)

Generates a random function of type R→R. But its mathematical meaning is not clear.

slide-45
SLIDE 45

Measurability issue

  • Measure theory is the foundation of probability

theory that avoids paradoxes.

  • Silent about high-order functions.
  • [Halmos] ev(f,a) = f(a) is not measurable.
  • The category of measurable sets is not CCC.
  • But Anglican supports high-order functions.
slide-46
SLIDE 46

Meas Monad Meas Use category theory to extend measure theory.

slide-47
SLIDE 47

Meas Monad [Measop, Set]∏ [Measop, Set]∏ Yoneda Embedding Meas Yoneda Embedding Use category theory to extend measure theory.

slide-48
SLIDE 48

Meas Monad [Measop, Set]∏ [Measop, Set]∏ Yoneda Embedding Left Kan Extension Meas Yoneda Embedding Use category theory to extend measure theory.

slide-49
SLIDE 49

Meas Monad [Measop, Set]∏ [Measop, Set]∏ Yoneda Embedding Left Kan Extension Meas Yoneda Embedding Use category theory to extend measure theory.

slide-50
SLIDE 50

Meas Monad [Measop, Set]∏ [Measop, Set]∏ Yoneda Embedding Left Kan Extension Meas Yoneda Embedding Use category theory to extend measure theory. Enough structure for function types

slide-51
SLIDE 51

Meas Monad [Measop, Set]∏ [Measop, Set]∏ Yoneda Embedding Left Kan Extension Meas Yoneda Embedding Use category theory to extend measure theory. Enough structure for function types Preserves nearly all the structures

slide-52
SLIDE 52

[Question] Are all definable functions from R to R in a high-order probabilistic PL measurable? Our semantics says that the answer is yes for a core call-by-value language, such as Anglican.

slide-53
SLIDE 53

The monad M(⟦R→R⟧) at ⟦R→R⟧ consists of: equivalence classes of measurable functions f : Ω×R → R for probability spaces Ω. The function f is what probabilists call a measurable stochastic process.

slide-54
SLIDE 54

The extended monad M describes computations with dynamically allocated read-only variables. M(T)(w) = { [(a, f)]~ | ∃v. a∈T(v) ⋀ f : w →m Prob(v) }

slide-55
SLIDE 55

M(T)(w) = { [(a, f)]~ | ∃v. a∈T(v) ⋀ f : w →m Prob(v) } The extended monad M describes computations with dynamically allocated read-only variables. T is the type of a value. w represents a space of all random vars so far. v extends w with new random variables according to f.

slide-56
SLIDE 56

M(T)(w) = { [(a, f)]~ | ∃v. a∈T(v) ⋀ f : w →m Prob(v) } The extended monad M describes computations with dynamically allocated read-only variables. T is the type of a value. w represents a space of all random vars so far. v extends w with new random variables according to f.

slide-57
SLIDE 57

M(T)(w) = { [(a, f)]~ | ∃v. a∈T(v) ⋀ f : w →m Prob(v) } The extended monad M describes computations with dynamically allocated read-only variables. T is the type of a value. w represents a space of all random vars so far. v extends w with new random variables according to f.

slide-58
SLIDE 58

Try a probabilistic prog. language. It is fun.

  • Anglican:

http://www.robots.ox.ac.uk/~fwood/ anglican/index.html

  • WebPPL:

http://webppl.org/