Introduction to Artificial Intelligence Belief networks Chapter - - PowerPoint PPT Presentation

introduction to artificial intelligence belief networks
SMART_READER_LITE
LIVE PREVIEW

Introduction to Artificial Intelligence Belief networks Chapter - - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Belief networks Chapter 15.12 Dieter Fox Based on AIMA Slides S. Russell and P. Norvig, 1998 c Chapter 15.12 0-0 Outline Bayesian networks: syntax and semantics Inference tasks Based


slide-1
SLIDE 1

Introduction to Artificial Intelligence Belief networks

Chapter 15.1–2

Dieter Fox

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-0

slide-2
SLIDE 2

Outline

♦ Bayesian networks: syntax and semantics ♦ Inference tasks

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-1

slide-3
SLIDE 3

Belief networks

A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ “directly influences”) a conditional distribution for each node given its parents: P(Xi|Parents(Xi)) In the simplest case, conditional distribution represented as a conditional probability table (CPT)

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-2

slide-4
SLIDE 4

Example

I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge:

B

T

  • T
  • F

F

E

T

  • F

T

  • F

P(A)

.95 .29 .001 .001

P(B)

.002

P(E)

Alarm

Earthquake MaryCalls JohnCalls

Burglary

A P(J)

T

  • F

.90 .05

A P(M)

T

  • F

.70 .01 .94 Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-3

slide-5
SLIDE 5

Example

I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge:

B

T

  • T
  • F

F

E

T

  • F

T

  • F

P(A)

.95 .29 .001 .001

P(B)

.002

P(E)

Alarm

Earthquake MaryCalls JohnCalls

Burglary

A P(J)

T

  • F

.90 .05

A P(M)

T

  • F

.70 .01 .94

Note: ≤ k parents ⇒ O(dkn) numbers vs. O(dn)

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-4

slide-6
SLIDE 6

Semantics

“Global” semantics defines the full joint distribution as the product of the local conditional distributions: P(X1, . . . , Xn) =Π

n i = 1P(Xi|Parents(Xi))

e.g., P(J ∧ M ∧ A ∧ ¬B ∧ ¬E) is given by?? =

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-5

slide-7
SLIDE 7

Semantics

“Global” semantics defines the full joint distribution as the product of the local conditional distributions: P(X1, . . . , Xn) =Π

n i = 1P(Xi|Parents(Xi))

e.g., P(J ∧ M ∧ A ∧ ¬B ∧ ¬E) is given by?? = P(¬B)P(¬E)P(A|¬B ∧ ¬E)P(J|A)P(M|A) “Local” semantics: each node is conditionally independent

  • f its nondescendants given its parents

Theorem: Local semantics ⇔ global semantics

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-6

slide-8
SLIDE 8

Markov blanket

Each node is conditionally independent of all others given its Markov blanket: parents + children + children’s parents

. . . . . . U1 X Um Yn Znj Y

1

Z1j

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-7

slide-9
SLIDE 9

Constructing belief networks

Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics

  • 1. Choose an ordering of variables X1, . . . , Xn
  • 2. For i = 1 to n

add Xi to the network select parents from X1, . . . , Xi−1 such that P(Xi|Parents(Xi)) = P(Xi|X1, . . . , Xi−1) This choice of parents guarantees the global semantics: P(X1, . . . , Xn) =Π

n i = 1P(Xi|X1, . . . , Xi−1) (chain rule)

n i = 1P(Xi|Parents(Xi)) by construction

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-8

slide-10
SLIDE 10

Example

Suppose we choose the ordering M, J, A, B, E

MaryCalls JohnCalls

  • P(J|M) = P(J)?

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-9

slide-11
SLIDE 11

Example

Suppose we choose the ordering M, J, A, B, E

MaryCalls Alarm

  • JohnCalls

P(J|M) = P(J)? No P(A|J, M) = P(A|J)? P(A|J, M) = P(A)?

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-10

slide-12
SLIDE 12

Example

Suppose we choose the ordering M, J, A, B, E

MaryCalls Alarm

  • Burglary

JohnCalls

P(J|M) = P(J)? No P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No P(B|A, J, M) = P(B|A)? P(B|A, J, M) = P(B)?

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-11

slide-13
SLIDE 13

Example

Suppose we choose the ordering M, J, A, B, E

MaryCalls Alarm

  • Burglary

Earthquake JohnCalls

P(J|M) = P(J)? No P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No P(B|A, J, M) = P(B|A)? Yes P(B|A, J, M) = P(B)? No P(E|B, A, J, M) = P(E|A)? P(E|B, A, J, M) = P(E|A, B)?

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-12

slide-14
SLIDE 14

Example

Suppose we choose the ordering M, J, A, B, E

MaryCalls Alarm

  • Burglary

Earthquake JohnCalls

P(J|M) = P(J)? No P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No P(B|A, J, M) = P(B|A)? Yes P(B|A, J, M) = P(B)? No P(E|B, A, J, M) = P(E|A)? No P(E|B, A, J, M) = P(E|A, B)? Yes

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-13

slide-15
SLIDE 15

Example: Car diagnosis

Initial evidence: engine won’t start Testable variables (thin ovals), diagnosis variables (thick ovals) Hidden variables (shaded) ensure sparse structure, reduce parameters

lights

  • no oil

no gas starter broken battery age

alternator broken fanbelt

broken battery

dead no charging battery

flat engine won’t start gas gauge fuel line

blocked

  • il light

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-14

slide-16
SLIDE 16

Example: Car insurance

Predict claim costs (medical, liability, property) given data on application form (other unshaded nodes)

SocioEcon Age GoodStudent ExtraCar Mileage VehicleYear RiskAversion SeniorTrain DrivingSkill MakeModel DrivingHist DrivQuality Antilock Airbag CarValue HomeBase AntiTheft Theft OwnDamage PropertyCost LiabilityCost MedicalCost Cushioning Ruggedness Accident OtherCost OwnCost

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-15

slide-17
SLIDE 17

Inference in Bayesian networks

Instantiate some nodes (evidence nodes) and query other nodes. P(Burglary | JohnCalls)??

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-16

slide-18
SLIDE 18

Inference in Bayesian networks

Instantiate some nodes (evidence nodes) and query other nodes. P(Burglary | JohnCalls)??

  • Burglary only every 1000 days, but John calls 50 times in 1000 days,

i.e. for each burglary we receive 50 false alarms. P(Burglary | JohnCalls) = 0.016!

  • P(Burglary | JohnCalls, MaryCalls) = 0.29.

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-17

slide-19
SLIDE 19

Types of inference

  • 1. Diagnostic: From effects to causes

P(Burglary | JohnCalls) = 0.016

  • 2. Causal: From causes to effects

P(JohnCalls | Burglary) = 0.86

  • 3. Intercausal: between causes of common effect

P(Burglary | Alarm) = 0.376, but P(Burglary | Alarm, Earthquake) = 0.003.

  • 4. Mixed: Combinations of 1.-3.

P(Alarm | JohnCalls, ¬Earthquake) = 0.03

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-18

slide-20
SLIDE 20

Inference tasks

Queries: compute posterior marginal P(Xi|E = e) e.g., P(NoGas|Gauge = empty, Lights = on, Starts = false) Optimal decisions: decision networks include utility information; probabilistic inference required for P(outcome|action, evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor?

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-19

slide-21
SLIDE 21

Compact conditional distributions

CPT grows exponentially with no. of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(Parents(X)) for some function f E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican E.g., numerical relationships among continuous variables ∂Level ∂t = inflow + precipation - outflow - evaporation

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-20

slide-22
SLIDE 22

Compact conditional distributions contd.

Noisy-OR distributions model multiple noninteracting causes 1) Parents U1 . . . Uk include all causes (can add leak node) 2) Independent failure probability qi for each cause alone ⇒ P(X|U1 . . . Uj, ¬Uj+1 . . . ¬Uk) = 1 −Π

j i = 1qi

Cold Flu Malaria P(Fever) P(¬Fever) F F F 0.0 1.0 F F T 0.9 0.1 F T F 0.8 0.2 F T T 0.98 0.02 = 0.2 × 0.1 T F F 0.4 0.6 T F T 0.94 0.06 = 0.6 × 0.1 T T F 0.88 0.12 = 0.6 × 0.2 T T T 0.988 0.012 = 0.6 × 0.2 × 0.1

Number of parameters linear in number of parents

Based on AIMA Slides c

  • S. Russell and P. Norvig, 1998

Chapter 15.1–2 0-21