Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical - - PowerPoint PPT Presentation

▶

Mar 17, 2023 338 likes •634 views

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13 Quick recap Random variable X takes on different values according to a probability distribution discrete:

SLIDE 1

Probability Intro Part II: Bayes’ Rule

Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13

SLIDE 2

Quick recap

Random variable X takes on different values

according to a probability distribution

discrete: probability mass function (pmf)
continuous: probability density function (pdf)
marginalization: summing (“splatting”)
conditionalization: “slicing”

SLIDE 3

conditionalization (“slicing”)

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

1 2 3

(“joint divided by marginal”)

SLIDE 4

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

1 2 3

conditionalization (“slicing”)

(“joint divided by marginal”)

SLIDE 5

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

1 2 3

marginal P(x) conditional

conditionalization (“slicing”)

SLIDE 6

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

1 2 3

SLIDE 7

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

1 2 3

SLIDE 8

Bayes’ Rule

Conditional Densities

posterior

Bayes’ Rule

likelihood prior marginal probability of y (“normalizer”)

SLIDE 9

A little math: Bayes’ rule

very simple formula for manipulating probabilities

P(A | B) P(B) P(A) P(B | A) =

conditional probability “probability of B given that A occurred”

P(B | A) ∝ P(A | B) P(B)

probability of A probability of B simplified form:

SLIDE 10

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

one coin is fake: “heads” on both sides (H / H)
one coin is standard: (H / T)

You grab one of the coins at random and flip it. It comes up “heads”. What is the probability that you’re holding the fake?

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) ( ½ ) ( ½ ) = ¼ = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

probabilities must sum to 1

SLIDE 11

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

fake normal start

H H H T ( ½ ) ( ½ ) = ¼

probabilities must sum to 1

SLIDE 12

= 0 A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins Experiment #2: It comes up “tails”. What is the probability that you’re holding the fake?

p( Fake | T) p( Nrml | T) ( ½ ) ( 0 ) ( ½ ) ( ½ ) = ¼ = 0

probabilities must sum to 1

∝ p(T | Fake) p(Fake) ∝ p (T | Nrml) p(Nrml)

fake normal start

H H H T = 1

SLIDE 13

Is the middle circle popping “out” or “in”?

SLIDE 14

P( image | OUT & light is above) = 1 P(image | IN & Light is below) = 1

Image equally likely to be OUT or IN given sensory data alone

What we want to know: P(OUT | image) vs. P(IN | image)

P(OUT | image) ∝ P(image | OUT & light above) × P(OUT) × P(light above) P(IN | image) ∝ P(image | IN & light below ) × P(IN) × P(light below)

prior Which of these is greater? Apply Bayes’ rule:

SLIDE 15

Bayesian Models for Perception

P(B | A) ∝ P(A | B) P(B) Bayes’ rule: Formula for computing: P(what’s in the world | sensory data) B A

(This is what our brain wants to know!)

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

SLIDE 16

Helmholtz: perception as “optimal inference”

“Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience.”

helmholtz 1821-1894

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

SLIDE 17

Helmholtz: perception as “optimal inference”

helmholtz 1821-1894

“Perception is our best guess as to what is in the world, given our current sensory evidence and our prior experience.”

P(world | sense data) ∝ P(sense data | world) P(world)

(given by past experience)

Prior

(given by laws of physics; ambiguous because many world states could give rise to same sense data)

Likelihood Posterior

(resulting beliefs about the world)

SLIDE 18

Many different 3D scenes can give rise to the same 2D retinal image

The Ames Room

SLIDE 19

Many different 3D scenes can give rise to the same 2D retinal image

The Ames Room

How does our brain go about deciding which interpretation? A B

SLIDE 20

Hollow Face Illusion

http://www.richardgregory.org/experiments/

SLIDE 21

∴

ex ve eo

Hollow Face Illusion

Hypothesis #1: face is concave Hypothesis #2: face is convex P(convex|video) ∝P(video|convex) P(convex) P(concave|video)∝P(video|concave) P(concave) posterior likelihood prior P(convex) > P(concave) ⇒ posterior probability of convex is higher (which determines our percept)

SLIDE 22

prior belief that objects are convex is SO strong we can’t
ver-ride it, even when we know it’s wrong!

(So your brain knows Bayes’ rule even if you don’t!)

SLIDE 23

Terminology question:

When do we call this a likelihood?

A: when considered as a function of x  (i.e., with y held fixed)

note: doesn’t integrate to 1.
What’s it called as a function of y, for fixed x?

conditional distribution or sampling distribution

SLIDE 24

independence

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

SLIDE 25

independence

Definition: x, y are independent iff

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

SLIDE 26

independence

Definition: x, y are independent iff In linear algebra terms:

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

(outer product)

SLIDE 27

Summary

marginalization (splatting)
conditionalization (slicing)
Bayes’ rule (prior, likelihood, posterior)
independence

Probability Intro Part II: Bayes’ Rule

Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13

Quick recap

according to a probability distribution

conditionalization (“slicing”)

conditionalization (“slicing”)

conditionalization (“slicing”)

conditional densities

conditional densities

Bayes’ Rule

Conditional Densities

Bayes’ Rule

A little math: Bayes’ rule

P(A | B) P(B) P(A) P(B | A) =

P(B | A) ∝ P(A | B) P(B)

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

You grab one of the coins at random and flip it. It comes up “heads”. What is the probability that you’re holding the fake?

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) ( ½ ) ( ½ ) = ¼ = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins

p( Fake | H) p( Nrml | H) ( ½ ) ( 1 ) = ½ ∝ p(H | Fake) p(Fake) ∝ p (H | Nrml) p(Nrml)

H H H T ( ½ ) ( ½ ) = ¼

= 0 A little math: Bayes’ rule P(B | A) ∝ P(A | B) P(B)

Example: 2 coins Experiment #2: It comes up “tails”. What is the probability that you’re holding the fake?

p( Fake | T) p( Nrml | T) ( ½ ) ( 0 ) ( ½ ) ( ½ ) = ¼ = 0

∝ p(T | Fake) p(Fake) ∝ p (T | Nrml) p(Nrml)

H H H T = 1

Is the middle circle popping “out” or “in”?

P( image | OUT & light is above) = 1 P(image | IN & Light is below) = 1

What we want to know: P(OUT | image) vs. P(IN | image)

prior Which of these is greater? Apply Bayes’ rule:

Bayesian Models for Perception

P(B | A) ∝ P(A | B) P(B) Bayes’ rule: Formula for computing: P(what’s in the world | sensory data) B A

P(world | sense data) ∝ P(sense data | world) P(world)

Prior

Likelihood Posterior

Helmholtz: perception as “optimal inference”

P(world | sense data) ∝ P(sense data | world) P(world)

Prior

Likelihood Posterior

Helmholtz: perception as “optimal inference”

P(world | sense data) ∝ P(sense data | world) P(world)

Prior

Likelihood Posterior

Many different 3D scenes can give rise to the same 2D retinal image

Many different 3D scenes can give rise to the same 2D retinal image

How does our brain go about deciding which interpretation? A B

Hollow Face Illusion

Hollow Face Illusion

(So your brain knows Bayes’ rule even if you don’t!)

Terminology question:

A: when considered as a function of x (i.e., with y held fixed)

conditional distribution or sampling distribution

independence

independence

independence

Summary

A: when considered as a function of x  (i.e., with y held fixed)