Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical - - PowerPoint PPT Presentation

probability intro part ii bayes rule
SMART_READER_LITE
LIVE PREVIEW

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical - - PowerPoint PPT Presentation

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13 Quick recap Random variable X takes on different values according to a probability distribution discrete:


slide-1
SLIDE 1

Probability Intro Part II: Bayes’ Rule

Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13

slide-2
SLIDE 2

Quick recap

  • Random variable X takes on different values

according to a probability distribution

  • discrete: probability mass function (pmf)
  • continuous: probability density function (pdf)
  • marginalization: summing (“splatting”)
  • conditionalization: “slicing”
slide-3
SLIDE 3

conditionalization (“slicing”)

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • 3
  • 2
  • 1

1 2 3

(“joint divided by marginal”)

slide-4
SLIDE 4

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • 3
  • 2
  • 1

1 2 3

conditionalization (“slicing”)

(“joint divided by marginal”)

slide-5
SLIDE 5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • 3
  • 2
  • 1

1 2 3

marginal P(x) conditional

conditionalization (“slicing”)

slide-6
SLIDE 6

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • 3
  • 2
  • 1

1 2 3

slide-7
SLIDE 7

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

  • 3
  • 2
  • 1

1 2 3

slide-8
SLIDE 8

Bayes’ Rule

Conditional Densities

posterior

Bayes’ Rule

likelihood prior marginal probability of y (“normalizer”)

slide-9
SLIDE 9

A little math: Bayes’ rule

  • very simple formula for manipulating probabilities

P(A | B) P(B) P(A) P(B | A) =

conditional probability “probability of B given that A occurred”

P(B | A) ∝ P(A | B) P(B)

probability of A probability of B simplified form:

slide-10
SLIDE 10

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

  • one coin is fake: “heads” on both sides (H / H)
  • one coin is standard: (H / T)

You grab one of the coins at random and flip it. It comes up “heads”. What is the probability that you’re holding the fake?

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) ( ½ ) ( ½ ) = ¼ = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

probabilities must sum to 1

slide-11
SLIDE 11

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

fake normal start

H H H T ( ½ ) ( ½ ) = ¼

probabilities must sum to 1

slide-12
SLIDE 12

= 0 A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins Experiment #2: It comes up “tails”. What is the probability that you’re holding the fake?

p( Fake | T) p( Nrml | T) ( ½ ) ( 0 ) ( ½ ) ( ½ ) = ¼ = 0

probabilities must sum to 1

∝ p(T | Fake) p(Fake) ∝ p (T | Nrml) p(Nrml)

fake normal start

H H H T = 1

slide-13
SLIDE 13

Is the middle circle popping “out” or “in”?

slide-14
SLIDE 14

P( image | OUT & light is above) = 1 P(image | IN & Light is below) = 1

  • Image equally likely to be OUT or IN given sensory data alone

What we want to know: P(OUT | image) vs. P(IN | image)

P(OUT | image) ∝ P(image | OUT & light above) × P(OUT) × P(light above) P(IN | image) ∝ P(image | IN & light below ) × P(IN) × P(light below)

prior Which of these is greater? Apply Bayes’ rule:

slide-15
SLIDE 15

Bayesian Models for Perception

P(B | A) ∝ P(A | B) P(B) Bayes’ rule: Formula for computing: P(what’s in the world | sensory data) B A

(This is what our brain wants to know!)

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

slide-16
SLIDE 16

Helmholtz: perception as “optimal inference”

“Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience.”

helmholtz 1821-1894

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

slide-17
SLIDE 17

Helmholtz: perception as “optimal inference”

helmholtz 1821-1894

“Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience.”

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

slide-18
SLIDE 18

Many different 3D scenes can give rise to the same 2D retinal image

The Ames Room

slide-19
SLIDE 19

Many different 3D scenes can give rise to the same 2D retinal image

The Ames Room

How does our brain go about deciding which interpretation? A B

P(image | A) and P(image | B) are equal! (both A and B could have generated this image) Let’s use Bayes’ rule: P(A | image) = P(image | A) P(A) / Z P(B | image) = P(image | B) P(B) / Z

slide-20
SLIDE 20

Hollow Face Illusion

http://www.richardgregory.org/experiments/

slide-21
SLIDE 21

ex ve eo

Hollow Face Illusion

Hypothesis #1: face is concave Hypothesis #2: face is convex P(convex|video) ∝P(video|convex) P(convex) P(concave|video)∝P(video|concave) P(concave) posterior likelihood prior P(convex) > P(concave) ⇒ posterior probability of convex is higher (which determines our percept)

slide-22
SLIDE 22
  • prior belief that objects are convex is SO strong we can’t
  • ver-ride it, even when we know it’s wrong!

(So your brain knows Bayes’ rule even if you don’t!)

slide-23
SLIDE 23

Terminology question:

  • When do we call this a likelihood?

A: when considered as a function of x
 (i.e., with y held fixed)

  • note: doesn’t integrate to 1.
  • What’s it called as a function of y, for fixed x?

conditional distribution or sampling distribution

slide-24
SLIDE 24

independence

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

slide-25
SLIDE 25

independence

Definition: x, y are independent iff

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

slide-26
SLIDE 26

independence

Definition: x, y are independent iff In linear algebra terms:

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

(outer product)

slide-27
SLIDE 27

Summary

  • marginalization (splatting)
  • conditionalization (slicing)
  • Bayes’ rule (prior, likelihood, posterior)
  • independence