Review: probability Monty Hall, weighted dice Frequentist v. - - PowerPoint PPT Presentation

review probability
SMART_READER_LITE
LIVE PREVIEW

Review: probability Monty Hall, weighted dice Frequentist v. - - PowerPoint PPT Presentation

Review: probability Monty Hall, weighted dice Frequentist v. Bayesian Independence Expectations, conditional expectations Exp. & independence; linearity of exp. Estimator (RV computed from sample) law of large #s,


slide-1
SLIDE 1

Review: probability

  • Monty Hall, weighted dice
  • Frequentist v. Bayesian
  • Independence
  • Expectations, conditional expectations
  • Exp. & independence; linearity of exp.
  • Estimator (RV computed from sample)
  • law of large #s, bias, variance, tradeoff
1
slide-2
SLIDE 2

Covariance

  • Suppose we want an approximate numeric

measure of (in)dependence

  • Let E(X) = E(Y) = 0 for simplicity
  • Consider the random variable XY
  • if X, Y are typically both +ve or both -ve
  • if X, Y are independent
2
slide-3
SLIDE 3

Covariance

  • cov(X, Y) =
  • Is this a good measure of dependence?
  • Suppose we scale X by 10:
3
slide-4
SLIDE 4

Correlation

  • Like covariance, but controls for variance of

individual r.v.s

  • cor(X, Y) =
  • cor(10X, Y) =
4
slide-5
SLIDE 5

Correlation & independence

  • Equal probability
  • n each point
  • Are X and Y

independent?

  • Are X and Y

uncorrelated? X Y

!! " ! !# !! !$ " $ ! #

5
slide-6
SLIDE 6

Correlation & independence

  • Do you think that all independent pairs of

RVs are uncorrelated?

  • Do you think that all uncorrelated pairs of

RVs are independent?

6
slide-7
SLIDE 7

Proofs and counterexamples

  • For a question A ⇒ B
  • e.g., X, Y uncorrelated ⇒ X, Y independent
  • if true, usually need to provide a proof
  • if false, usually only need to provide a

counterexample ? ?

7
slide-8
SLIDE 8

Counterexamples

  • Counterexample = example satisfying A but

not B

  • E.g., RVs X and Y that are not independent,

but are correlated A ⇒ B X, Y uncorrelated ⇒ X, Y independent ? ?

8
slide-9
SLIDE 9

Correlation & independence

  • Equal probability
  • n each point
  • Are X and Y

independent?

  • Are X and Y

uncorrelated?

!! " ! !# !! !$ " $ ! #

X Y

9
slide-10
SLIDE 10
  • For any X, Y, C
  • P(X | Y, C) P(Y | C) = P(Y | X, C) P(X | C)
  • Simple version (without context)
  • P(X | Y) P(Y) = P(Y | X) P(X)
  • Can be taken as definition of conditioning

Bayes Rule

  • Rev. Thomas Bayes

1702–1761

10
slide-11
SLIDE 11

Exercise

  • You are tested for a rare disease,

emacsitis—prevalence 3 in 100,000

  • Your receive a test that is 99%

sensitive and 99% specific

  • sensitivity = P(yes | emacsitis)
  • specificity = P(no | ~emacsitis)
  • The test comes out positive
  • Do you have emacsitis?
11
slide-12
SLIDE 12

Revisit: weighted dice

  • Fair dice: all 36 rolls equally likely
  • Weighted: rolls summing to 7 more likely
  • Data: 1-6 2-5
12
slide-13
SLIDE 13

Learning from data

  • Given a model class
  • And some data, sampled from a model in

this class

  • Decide which model best explains the

sample

13
slide-14
SLIDE 14

Bayesian model learning

  • P(model | data) =
  • Z =
  • So, for each model, compute:
  • Then:
14
slide-15
SLIDE 15

Prior: uniform

all H all T

0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25

15
slide-16
SLIDE 16

Posterior: after 5H, 8T

0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25

all H all T

16
slide-17
SLIDE 17

Posterior:11H, 20T

0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25

all H all T

17
slide-18
SLIDE 18

Graphical models

18
slide-19
SLIDE 19

Why do we need graphical models?

  • So far, only way we’ve seen to write down a

distribution is as a big table

  • Gets unwieldy fast!
  • E.g., 10 RVs, each w/ 10 settings
  • Table size =
  • Graphical model: way to write distribution

compactly using diagrams & numbers

19
slide-20
SLIDE 20

Example ML problem

  • US gov’t inspects food packing plants
  • 27 tests of contamination of surfaces
  • 12-point ISO 9000 compliance checklist
  • are there food-borne illness incidents in

30 days after inspection? (15 types)

  • Q:
  • A:
20
slide-21
SLIDE 21

Big graphical models

  • Later in course, we’ll use graphical models

to express various ML algorithms

  • e.g., the one from the last slide
  • These graphical models will be big!
  • Please bear with some smaller examples

for now so we can fit them on the slides and do the math in our heads…

21
slide-22
SLIDE 22

Bayes nets

  • Best-known type of graphical model
  • Two parts: DAG and CPTs
22
slide-23
SLIDE 23

Rusty robot: the DAG

23
slide-24
SLIDE 24

Rusty robot: the CPTs

  • For each RV (say X), there is one CPT

specifying P(X | pa(X))

24
slide-25
SLIDE 25

Interpreting it

25
slide-26
SLIDE 26

Benefits

  • 11 v. 31 numbers
  • Fewer parameters to learn
  • Efficient inference = computation of

marginals, conditionals ⇒ posteriors

26
slide-27
SLIDE 27

Inference example

  • P(M, Ra, O, W, Ru) =

P(M) P(Ra) P(O) P(W|Ra,O) P(Ru|M,W)

  • Find marginal of M, O
27
slide-28
SLIDE 28

Independence

  • Showed M ⊥ O
  • Any other independences?
  • Didn’t use
  • independences depend only on
  • May also be “accidental” independences
28
slide-29
SLIDE 29

Conditional independence

  • How about O, Ru? O Ru
  • Suppose we know we’re not wet
  • P(M, Ra, O, W, Ru) =

P(M) P(Ra) P(O) P(W|Ra,O) P(Ru|M,W)

  • Condition on W=F, find marginal of O, Ru
29
slide-30
SLIDE 30

Conditional independence

  • This is generally true
  • conditioning on evidence can make or

break independences

  • many (conditional) independences can be

derived from graph structure alone

  • “accidental” ones are considered less

interesting

30
slide-31
SLIDE 31

Graphical tests for independence

  • We derived (conditional) independence by

looking for factorizations

  • It turns out there is a purely graphical test
  • this was one of the key contributions of

Bayes nets

  • Before we get there, a few more examples
31
slide-32
SLIDE 32

Blocking

  • Shaded = observed (by convention)
32
slide-33
SLIDE 33

Explaining away

  • Intuitively:
33
slide-34
SLIDE 34

Son of explaining away

34
slide-35
SLIDE 35

d-separation

  • General graphical test: “d-separation”
  • d = dependence
  • X ⊥ Y | Z when there are no active

paths between X and Y

  • Active paths (W outside conditioning set):
35
slide-36
SLIDE 36

Longer paths

  • Node is active if:

and inactive o/w

  • Path is active if intermediate nodes are
36
slide-37
SLIDE 37

Another example

37
slide-38
SLIDE 38

Markov blanket

  • Markov blanket of

C = minimal set

  • f observations

to render C independent of rest of graph

38
slide-39
SLIDE 39

Learning Bayes nets

M Ra O W Ru T F T T F T T T T T F T T F F T F F F T F F T F T

P(Ra) = P(M) = P(O) = P(W | Ra, O) = P(Ru | M, W) =

39
slide-40
SLIDE 40

Laplace smoothing

M Ra O W Ru T F T T F T T T T T F T T F F T F F F T F F T F T

P(Ra) = P(M) = P(O) = P(W | Ra, O) = P(Ru | M, W) =

40
slide-41
SLIDE 41

Advantages of Laplace

  • No division by zero
  • No extreme probabilities
  • No near-extreme probabilities unless lots
  • f evidence
41
slide-42
SLIDE 42

Limitations of counting and Laplace smoothing

  • Work only when all variables are observed

in all examples

  • If there are hidden or latent variables,

more complicated algorithm—we’ll cover a related method later in course

  • or just use a toolbox!
42
slide-43
SLIDE 43

Factor graphs

  • Another common type of graphical model
  • Uses undirected, bipartite graph

instead of DAG

43
slide-44
SLIDE 44

Rusty robot: factor graph

P(M) P(Ra) P(O) P(W|Ra,O) P(Ru|M,W)

44
slide-45
SLIDE 45

Convention

  • Don’t need to show unary factors
  • Why? They don’t affect algorithms below.
45
slide-46
SLIDE 46

Non-CPT factors

  • Just saw: easy to convert Bayes net →

factor graph

  • In general, factors need not be CPTs: any

nonnegative #s allowed

  • In general, P(A, B, …) =
  • Z =
46
slide-47
SLIDE 47

Ex: image segmentation

47
slide-48
SLIDE 48

Factor graph → Bayes net

  • Possible, but more involved
  • Each representation can handle any

distribution

  • Without adding nodes:
  • Adding nodes:
48
slide-49
SLIDE 49

Independence

  • Just like Bayes nets, there are graphical tests

for independence and conditional independence

  • Simpler, though:
  • Cover up all observed nodes
  • Look for a path
49
slide-50
SLIDE 50

Independence example

50
slide-51
SLIDE 51

Modeling independence

  • Take a Bayes net, list the (conditional)

independences

  • Convert to a factor graph, list the

(conditional) independences

  • Are they the same list?
  • What happened?
51