Parametric Models Part IV: Bayesian Belief Networks
Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr
CS 551, Spring 2006
Parametric Models Part IV: Bayesian Belief Networks Selim Aksoy - - PowerPoint PPT Presentation
Parametric Models Part IV: Bayesian Belief Networks Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr CS 551, Spring 2006 Introduction Recall Bayesian minimum-error classification: Given an
CS 551, Spring 2006
◮ Univariate or multivariate Gaussians ◮ Mixtures of Gaussians ◮ Hidden Markov Models
CS 551, Spring 2006 1/23
CS 551, Spring 2006 2/23
◮ Each node in the graph G represents a random variable and
◮ The set Θ of parameters specifies the probability distributions
Figure 1: An example BN.
CS 551, Spring 2006 3/23
n
n
CS 551, Spring 2006 4/23
Figure 2: P(a, b, c, d, e) = P(a)P(b)P(c|b)P(d|a, c)P(e|d) Figure 3: P(a, b, c, d) = P(a)P(b|a)P(c|b)P(d|c) Figure 4: P(e, f, g, h) = P(e)P(f|e)P(g|e)P(h|f, g)
CS 551, Spring 2006 5/23
Figure 5: When y is given, x and z are conditionally independent. Think of x as the past, y as the present, and z as the future. Figure 6: When y is given, x and z are conditionally independent. Think
independent effects x and z. Figure 7: x and z are marginally independent, but when y is given, they are conditionally dependent. This is called explaining away.
CS 551, Spring 2006 6/23
CS 551, Spring 2006 7/23
Figure 8: The Bayesian network for the burglar alarm example. Burglary (B) and earthquake (E) directly affect the probability of the alarm (A) going off, but whether
Norvig, Artificial Intelligence: A Modern Approach, 1995)
CS 551, Spring 2006 8/23
CS 551, Spring 2006 9/23
CS 551, Spring 2006 10/23
Figure 9: Another Bayesian network example. The event that the grass being wet (W = true) has two possible causes: either the water sprinkler was on (S = true)
(Russell and Norvig, Artificial Intelligence: A Modern Approach, 1995)
CS 551, Spring 2006 11/23
CS 551, Spring 2006 12/23
◮ Machine learning ◮ Statistics ◮ Computer vision ◮ Natural language
◮ Speech recognition ◮ Error-control codes ◮ Bioinformatics ◮ Medical diagnosis ◮ Weather forecasting
◮ Pathfinder medical diagnosis system at Stanford ◮ Microsoft Office assistant and troubleshooters ◮ Space shuttle monitoring system at NASA Mission
CS 551, Spring 2006 13/23
CS 551, Spring 2006 14/23
CS 551, Spring 2006 15/23
◮ sampling (Monte Carlo) methods ◮ variational methods ◮ loopy belief propagation
CS 551, Spring 2006 16/23
Table 1: Four cases in Bayesian network learning.
Observability Structure Full Partial Known Maximum Likelihood Estimation EM (or gradient ascent) Unknown Search through model space EM + search through model space
CS 551, Spring 2006 17/23
n
m
n
CS 551, Spring 2006 18/23
CS 551, Spring 2006 19/23
k=1 Nijk.
CS 551, Spring 2006 20/23
CS 551, Spring 2006 21/23
k=1 αijk and Nij = ri k=1 Nijk as before.
CS 551, Spring 2006 22/23
n
. . .
x2 x1 xn w
Figure 10: Naive Bayesian network structure. It looks like a very simple model but it often works quite well in practice.
CS 551, Spring 2006 23/23