[PPT] - Being Bayesian About Being Bayesian About Net work St ruct ure Net PowerPoint Presentation

SLIDE 1

04/ 21/ 2005 CS673 1

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure

A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller

SLIDE 2

04/ 21/ 2005 CS673 2

Roadmap Roadmap

Bayesian lear ning of Bayesian Net wor ks

– Exact vs Approximat e Learning

Markov Chain Mont e Carlo met hod

– MCMC over st ruct ures – MCMC over orderings

Experiment al Result s
Conclusions

SLIDE 3

04/ 21/ 2005 CS673 3

Bayesian Net works Bayesian Net works

Compact represent at ion of probabilit y dist ribut ions via

condit ional independence

Qualit at ive part :

Direct ed acyclic graph-DAG

Nodes – random variables
Edges – direct inf luence

Toget her :

Def ine a unique dist ribut ion in a f act ored f orm

Quant it at ive part :

Set of condit ional probabilit y dist ribut ion E B R A C

0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 e b e !b !e b !e !b

P (A| E,B) E B

P(B,E,A,C,R) =P(B)P(E)P(A| B,E)P(R| E)P(C| A)

SLIDE 4

04/ 21/ 2005 CS673 4

Why Learn Bayesian Net works? Why Learn Bayesian Net works?

Condit ional independencies & graphical represent at ion

capt ure t he st ruct ure of many real-world dist ribut ions

P

rovides insight s int o domain

Graph st ruct ure allows “ knowledge discovery”
I s t here a direct connect ion bet ween X & Y
Does X separat e bet ween t wo “ subsyst ems”
Does X causally af f ect Y
Bayesian Net works can be used f or many t asks

– I nf erence, causalit y, et c.

Examples: scient if ic dat a mining
Disease propert ies and sympt oms
I nt eract ions bet ween t he expression of genes

SLIDE 5

04/ 21/ 2005 CS673 5

Learning Bayesian Net works Learning Bayesian Net works

Data + Prior Information

Inducer E B R A C

0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 e b e !b !e b !e !b

P (A| E,B) E B

I nducer needs t he prior probabilit y dist ribut ion P(

I nducer needs t he prior probabilit y dist ribut ion P(B B) ) Using Bayesian condit ioning, updat e t he prior Using Bayesian condit ioning, updat e t he prior P( P(B B) ) P( P(B B| D) | D)

SLIDE 6

04/ 21/ 2005 CS673 6

Why St ruggle f or Accurat e St ruct ure? Why St ruggle f or Accurat e St ruct ure?

A E B S A E B S A E B S

“Tr ue” st r uct ur e Tr ue” st r uct ur e Adding an ar c Adding an ar c Missing an ar c Missing an ar c

I ncr eases t he number of

I ncr eases t he number of par amet er s t o be f it t ed par amet er s t o be f it t ed Wr ong assumpt ions about Wr ong assumpt ions about causalit y and domain st r uct ur e causalit y and domain st r uct ur e

Cannot be compensat ed by

Cannot be compensat ed by accur at e f it t ing of par amet er s accur at e f it t ing of par amet er s Also misses causalit y and domain Also misses causalit y and domain st r uct ur e st r uct ur e

SLIDE 7

04/ 21/ 2005 CS673 7

Score Score-

based learning

based learning

Def ine scoring f unct ion t hat evaluat es how well a

st ruct ure mat ches t he dat a

E B A E A B E B A E, B, A < Y,N,N> < Y,Y,Y> < N,Y,Y> . . < N,N,N>

Search f or a st ruct ure t hat maximizes t he score

SLIDE 8

04/ 21/ 2005 CS673 8

Bayesian Score of a Model Bayesian Score of a Model

) ( ) ( ) | ( ) | ( D P G P G D P D G P =

where where

θ θ θ d G P G D P G D P ) | ( ) , | ( ) | (

∫

=

Marginal Likelihood Likelihood Prior over parameters

SLIDE 9

04/ 21/ 2005 CS673 9

Discovering St ruct ure Discovering St ruct ure – – Model Select ion Model Select ion

P(G| D) P(G| D)

E B R A C

Current pract ice: model select ion

Current pract ice: model select ion

P ick a single high P ick a single high-

scoring model

scoring model Use t hat model t o inf er domain st ruct ure Use t hat model t o inf er domain st ruct ure

SLIDE 10

04/ 21/ 2005 CS673 10

Discovering St ruct ure Discovering St ruct ure – – Model Averaging Model Averaging

P(G| D) P(G| D)

E B R A C E B R A C E B R A C E B R A C E B R A C

Pr oblem

Pr oblem

Small sample size Small sample size many high scoring models many high scoring models Answer based on one model of t en useless Answer based on one model of t en useless Want f eat ures common t o many models Want f eat ures common t o many models

⇒

SLIDE 11

04/ 21/ 2005 CS673 11

Bayesian Approach Bayesian Approach

Est imat e probabilit y of f eatures

– Edge X Y – Markov edge X -- Y – Pat h X … Y – ...

∑

=

G

D G P G f D f P ) | ( ) ( ) | (

Feature of G, e.g., X Y Indicator function for feature f Bayesian score for G

Huge (super-exponent ial – 2T (n2)) number of net works G
Exact learning - int ract able

SLIDE 12

04/ 21/ 2005 CS673 12

Approximat e Bayesian Learning Approximat e Bayesian Learning

Rest rict t he search space t o G

k,

where G

k – set of graphs wit h indegree bounded by k

space st ill super-exponent ial
Find a set G of high scoring st ruct ures

– Est imat e

Hill-climbing – biased sample of st ruct ures

∑ ∑

≈

G G

D G P G f D G P D f P ) | ( ) ( ) | ( ) | (

SLIDE 13

04/ 21/ 2005 CS673 13

Markov Chain Mont e Carlo over Net works Markov Chain Mont e Carlo over Net works

MCMC Sampling

– Def ine Markov Chain over BNs – Perf orm a walk t hrough t he chain t o get samples G’s whose post eriors converge t o t he post erior P(G| D) of t he t rue st ruct ure

Possible pit f alls:

– St ill super-exponent ial number of net works – Time f or chain t o converge t o post erior is unknown – I slands of high post erior, connect ed by low bridges

SLIDE 14

04/ 21/ 2005 CS673 14

Bet t er Approach t o Approximat e Learning Bet t er Approach t o Approximat e Learning

Furt her const raint s on t he search space

– P erf orm model averaging over t he st ruct ures consist ent wit h some know (f ixed) t ot al ordering ‹

Ordering of variables:

– X1 ‹ X2 ‹…

‹ Xn

parent s f or X i must be in X1, X2,… , Xi-1

I nt uit ion: Order decouples choice of parent s

– Choice of P a(X7) does not rest rict choice of P a(X12)

Can comput e ef f icient ly in closed f orm

Can comput e ef f icient ly in closed f orm

Likelihood P (D| Likelihood P (D| ‹

‹)

) Feat ure probabilit y Feat ure probabilit y P(f | D

P(f | D, ,‹

‹)

)

SLIDE 15

04/ 21/ 2005 CS673 15

Sample Orderings Sample Orderings

We can writ e

∑

=

p

p p ) | ( ) , | ( ) | ( D P D f P D f P

Sample orderings and approximat e

∑

=

n i i D

f P D f P

1

) , | ( ) | ( p

MCMC Sampling

Def ine Markov Chain over orderings
Run chain t o get samples f rom post erior P(<

| D)

SLIDE 16

04/ 21/ 2005 CS673 16

Experiment s: Exact post erior over orders Experiment s: Exact post erior over orders versus order versus order -

MCMC

MCMC

SLIDE 17

04/ 21/ 2005 CS673 17

Experiment s: Convergence Experiment s: Convergence

SLIDE 18

04/ 21/ 2005 CS673 18

Experiment s: st ruct ure Experiment s: st ruct ure-

MCMC

MCMC – – post erior post erior correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs

SLIDE 19

04/ 21/ 2005 CS673 19

Experiment s: order Experiment s: order -

MCMC

MCMC – – post erior post erior correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs

SLIDE 20

04/ 21/ 2005 CS673 20

Conclusion Conclusion

Or der -MCMC bet t er t han st r uct ur e-MCMC

SLIDE 21

04/ 21/ 2005 CS673 21

Ref erences Ref erences

Being Bayesian about Net work St ruct ure: A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works, N. Friedman and D. Koller. Machine Learning J ournal, 2002

NI P S 2001 Tut orial on learning Bayesian net works f rom Dat a. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI -98 Tut orial on learning Bayesian net works f rom Dat a.

D. Hecker man. A Tut or ial on Lear ning wit h Bayesian Net wor ks. I n Lear ning in

Gr aphical Models, M. J or dan, ed.. MI T Pr ess, Cambr idge, MA, 1999. Also appear s as Technical Repor t MSR-TR-95-06, Micr osof t Resear ch, Mar ch,

1995. An ear lier ver sion appear s as Bayesian Net wor ks f or Dat a Mining, Dat a

Mining and Knowledge Discover y, 1:79-119, 1997.

Christ ophe Andrieu, Nando de Freit as, Arnaud Doucet and Michael I . J ordan. An I ntroduction to MCMC f or Machine Learning. Machine Learning, 2002. Art if icial I nt elligence: A Modern Approach. St uart Russell and P et er Norvig