Being Bayesian About Being Bayesian About Net work St ruct ure Net - - PowerPoint PPT Presentation

being bayesian about being bayesian about net work st
SMART_READER_LITE
LIVE PREVIEW

Being Bayesian About Being Bayesian About Net work St ruct ure Net - - PowerPoint PPT Presentation

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller 04/ 21/ 2005 CS673 1 Roadmap Roadmap Bayesian lear


slide-1
SLIDE 1

04/ 21/ 2005 CS673 1

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure

A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller

slide-2
SLIDE 2

04/ 21/ 2005 CS673 2

Roadmap Roadmap

  • Bayesian lear ning of Bayesian Net wor ks

– Exact vs Approximat e Learning

  • Markov Chain Mont e Carlo met hod

– MCMC over st ruct ures – MCMC over orderings

  • Experiment al Result s
  • Conclusions
slide-3
SLIDE 3

04/ 21/ 2005 CS673 3

Bayesian Net works Bayesian Net works

  • Compact represent at ion of probabilit y dist ribut ions via

condit ional independence

Qualit at ive part :

Direct ed acyclic graph-DAG

  • Nodes – random variables
  • Edges – direct inf luence

Toget her :

Def ine a unique dist ribut ion in a f act ored f orm

Quant it at ive part :

Set of condit ional probabilit y dist ribut ion E B R A C

0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 e b e !b !e b !e !b

P (A| E,B) E B

P(B,E,A,C,R) =P(B)P(E)P(A| B,E)P(R| E)P(C| A)

slide-4
SLIDE 4

04/ 21/ 2005 CS673 4

Why Learn Bayesian Net works? Why Learn Bayesian Net works?

  • Condit ional independencies & graphical represent at ion

capt ure t he st ruct ure of many real-world dist ribut ions

  • P

rovides insight s int o domain

  • Graph st ruct ure allows “ knowledge discovery”
  • I s t here a direct connect ion bet ween X & Y
  • Does X separat e bet ween t wo “ subsyst ems”
  • Does X causally af f ect Y
  • Bayesian Net works can be used f or many t asks

– I nf erence, causalit y, et c.

  • Examples: scient if ic dat a mining
  • Disease propert ies and sympt oms
  • I nt eract ions bet ween t he expression of genes
slide-5
SLIDE 5

04/ 21/ 2005 CS673 5

Learning Bayesian Net works Learning Bayesian Net works

Data + Prior Information

Inducer E B R A C

0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 e b e !b !e b !e !b

P (A| E,B) E B

  • I nducer needs t he prior probabilit y dist ribut ion P(

I nducer needs t he prior probabilit y dist ribut ion P(B B) ) Using Bayesian condit ioning, updat e t he prior Using Bayesian condit ioning, updat e t he prior P( P(B B) ) P( P(B B| D) | D)

slide-6
SLIDE 6

04/ 21/ 2005 CS673 6

Why St ruggle f or Accurat e St ruct ure? Why St ruggle f or Accurat e St ruct ure?

A E B S A E B S A E B S

“Tr ue” st r uct ur e Tr ue” st r uct ur e Adding an ar c Adding an ar c Missing an ar c Missing an ar c

  • I ncr eases t he number of

I ncr eases t he number of par amet er s t o be f it t ed par amet er s t o be f it t ed Wr ong assumpt ions about Wr ong assumpt ions about causalit y and domain st r uct ur e causalit y and domain st r uct ur e

  • Cannot be compensat ed by

Cannot be compensat ed by accur at e f it t ing of par amet er s accur at e f it t ing of par amet er s Also misses causalit y and domain Also misses causalit y and domain st r uct ur e st r uct ur e

slide-7
SLIDE 7

04/ 21/ 2005 CS673 7

Score Score-

  • based learning

based learning

  • Def ine scoring f unct ion t hat evaluat es how well a

st ruct ure mat ches t he dat a

E B A E A B E B A E, B, A < Y,N,N> < Y,Y,Y> < N,Y,Y> . . < N,N,N>

  • Search f or a st ruct ure t hat maximizes t he score
slide-8
SLIDE 8

04/ 21/ 2005 CS673 8

Bayesian Score of a Model Bayesian Score of a Model

) ( ) ( ) | ( ) | ( D P G P G D P D G P =

where where

θ θ θ d G P G D P G D P ) | ( ) , | ( ) | (

=

Marginal Likelihood Likelihood Prior over parameters

slide-9
SLIDE 9

04/ 21/ 2005 CS673 9

Discovering St ruct ure Discovering St ruct ure – – Model Select ion Model Select ion

P(G| D) P(G| D)

E B R A C

  • Current pract ice: model select ion

Current pract ice: model select ion

P ick a single high P ick a single high-

  • scoring model

scoring model Use t hat model t o inf er domain st ruct ure Use t hat model t o inf er domain st ruct ure

slide-10
SLIDE 10

04/ 21/ 2005 CS673 10

Discovering St ruct ure Discovering St ruct ure – – Model Averaging Model Averaging

P(G| D) P(G| D)

E B R A C E B R A C E B R A C E B R A C E B R A C

  • Pr oblem

Pr oblem

Small sample size Small sample size many high scoring models many high scoring models Answer based on one model of t en useless Answer based on one model of t en useless Want f eat ures common t o many models Want f eat ures common t o many models

slide-11
SLIDE 11

04/ 21/ 2005 CS673 11

Bayesian Approach Bayesian Approach

  • Est imat e probabilit y of f eatures

– Edge X Y – Markov edge X -- Y – Pat h X … Y – ...

=

G

D G P G f D f P ) | ( ) ( ) | (

Feature of G, e.g., X Y Indicator function for feature f Bayesian score for G

  • Huge (super-exponent ial – 2T (n2)) number of net works G
  • Exact learning - int ract able
slide-12
SLIDE 12

04/ 21/ 2005 CS673 12

Approximat e Bayesian Learning Approximat e Bayesian Learning

  • Rest rict t he search space t o G

k,

where G

k – set of graphs wit h indegree bounded by k

  • space st ill super-exponent ial
  • Find a set G of high scoring st ruct ures

– Est imat e

  • Hill-climbing – biased sample of st ruct ures

∑ ∑

G G

D G P G f D G P D f P ) | ( ) ( ) | ( ) | (

slide-13
SLIDE 13

04/ 21/ 2005 CS673 13

Markov Chain Mont e Carlo over Net works Markov Chain Mont e Carlo over Net works

MCMC Sampling

– Def ine Markov Chain over BNs – Perf orm a walk t hrough t he chain t o get samples G’s whose post eriors converge t o t he post erior P(G| D) of t he t rue st ruct ure

  • Possible pit f alls:

– St ill super-exponent ial number of net works – Time f or chain t o converge t o post erior is unknown – I slands of high post erior, connect ed by low bridges

slide-14
SLIDE 14

04/ 21/ 2005 CS673 14

Bet t er Approach t o Approximat e Learning Bet t er Approach t o Approximat e Learning

  • Furt her const raint s on t he search space

– P erf orm model averaging over t he st ruct ures consist ent wit h some know (f ixed) t ot al ordering ‹

  • Ordering of variables:

– X1 ‹ X2 ‹…

‹ Xn

parent s f or X i must be in X1, X2,… , Xi-1

  • I nt uit ion: Order decouples choice of parent s

– Choice of P a(X7) does not rest rict choice of P a(X12)

  • Can comput e ef f icient ly in closed f orm

Can comput e ef f icient ly in closed f orm

Likelihood P (D| Likelihood P (D| ‹

‹)

) Feat ure probabilit y Feat ure probabilit y P(f | D

P(f | D, ,‹

‹)

)

slide-15
SLIDE 15

04/ 21/ 2005 CS673 15

Sample Orderings Sample Orderings

We can writ e

=

p

p p ) | ( ) , | ( ) | ( D P D f P D f P

Sample orderings and approximat e

=

=

n i i D

f P D f P

1

) , | ( ) | ( p

MCMC Sampling

  • Def ine Markov Chain over orderings
  • Run chain t o get samples f rom post erior P(<

| D)

slide-16
SLIDE 16

04/ 21/ 2005 CS673 16

Experiment s: Exact post erior over orders Experiment s: Exact post erior over orders versus order versus order -

  • MCMC

MCMC

slide-17
SLIDE 17

04/ 21/ 2005 CS673 17

Experiment s: Convergence Experiment s: Convergence

slide-18
SLIDE 18

04/ 21/ 2005 CS673 18

Experiment s: st ruct ure Experiment s: st ruct ure-

  • MCMC

MCMC – – post erior post erior correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs

slide-19
SLIDE 19

04/ 21/ 2005 CS673 19

Experiment s: order Experiment s: order -

  • MCMC

MCMC – – post erior post erior correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs

slide-20
SLIDE 20

04/ 21/ 2005 CS673 20

Conclusion Conclusion

  • Or der -MCMC bet t er t han st r uct ur e-MCMC
slide-21
SLIDE 21

04/ 21/ 2005 CS673 21

Ref erences Ref erences

Being Bayesian about Net work St ruct ure: A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works, N. Friedman and D. Koller. Machine Learning J ournal, 2002

NI P S 2001 Tut orial on learning Bayesian net works f rom Dat a. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI -98 Tut orial on learning Bayesian net works f rom Dat a.

  • D. Hecker man. A Tut or ial on Lear ning wit h Bayesian Net wor ks. I n Lear ning in

Gr aphical Models, M. J or dan, ed.. MI T Pr ess, Cambr idge, MA, 1999. Also appear s as Technical Repor t MSR-TR-95-06, Micr osof t Resear ch, Mar ch,

  • 1995. An ear lier ver sion appear s as Bayesian Net wor ks f or Dat a Mining, Dat a

Mining and Knowledge Discover y, 1:79-119, 1997.

Christ ophe Andrieu, Nando de Freit as, Arnaud Doucet and Michael I . J ordan. An I ntroduction to MCMC f or Machine Learning. Machine Learning, 2002. Art if icial I nt elligence: A Modern Approach. St uart Russell and P et er Norvig