[PPT] - Final Defense Unified Prediction and Diagnosis in Engineering PowerPoint Presentation

SLIDE 1

Final Defense Unified Prediction and Diagnosis in Engineering Systems by means of Distributed Belief Networks Robert H. Dodier Joint Center for Energy Management University of Colorado at Boulder

1

SLIDE 2

Problem Domain and Proposed Solution Buildings contain lots of equipment, and there are a lots of buildings in the world. Equipment can break down and not be noticed for a while, or simply degrade and never be noticed. Buildings could run more efficiently (cheaper, less resources used) and comfortably, if we could reliably, automatically query the status of the building. I propose that we use belief networks to automate prediction and diagnosis over functionally and geographically distributed systems.

2

SLIDE 3

Problems not solved by existing methods Existing prediction methods: linear regression, neural networks, locally-weighted regression, first principles, ... These prediction methods don’t tell what to do about missing

bservations and model uncertainty.

Existing diagnostic methods: “statistical inference,” neural networks, rule-based expert systems, fuzzy logic, ... Formal problems: easy to invent situations in which fuzzy logic and certainty-factors logic give incorrect results — dependencies are handled incorrectly. Can’t bet on a degree of membership, a certainty factor, or a confidence level.

3

SLIDE 4

Unsolved problems, cont’d Need methods that don’t have formal deficiencies. Need methods for combining empirical information with expert knowledge, and prior information with new observations. Need methods for organizing complex structures.

4

SLIDE 5

* Probability as an extension of logic Classical logic works great with propositions that we consider entirely true or false. We want something similar for “iffy” propositions. Assuming that a reasoning system (i) represents uncertainty by numbers, (ii) is consistent with common sense and ordinary logic, and (iii) is internally consistent, we get a few equations to solve: F(x, F(y, z)) = F(F(x, y), z), S(S(x)) = x

5

SLIDE 6

* Probability as extended logic, cont’d The first equation yields the product rule p(A, B|C) = p(A|B, C) p(B|C), and the second yields the negation rule p(¬A|B) = 1 − p(A|B). Since every logical proposition can be built up using conjunction and negation, we’re done. This analysis (originated by R.T. Cox, 1946) shows that it’s meaningful to talk about the probability of any logical

proposition. Probability is not limited to repetitive events.

6

SLIDE 7

Interesting operations on probabilistic models Prediction: Compute p(effects | causes). E.g., What is energy use of an air handling unit when a heat exchanger is fouled? Diagnosis: Compute p(causes | effects). E.g., What’s the quantity of refrigerant within a chiller? Value of information: VOI is an index of the utility of measuring some presently-unknown variable. Which variable shall we measure next? Explanation: Find most likely values of hidden variables, or maybe identify the most influential variables. Explanation is not

yet implemented in riso.

7

SLIDE 8

Belief network = relations graph + conditional probabilities A large probabilistic model can be built up by considering

nly the conditional distribution of each variable given some
thers; then the joint distribution is just the product of all the

conditionals. It’s convenient to display the model as a directed graph. A graph is easy to think about, and independence can be verified using only the graph and ignoring the numbers.

8

SLIDE 9

Belief network = graph + probabilities, cont’d

A B D E C

p(A, B, C, D, E) = p(E|D) p(D|B, C) p(C|A) p(B|A) p(A)

9

SLIDE 10

Belief networks as building system models Represent heterogeneous information. In engineering problems, variables may be discrete or continuous; distributions may be well-known or special-purpose; relations based on empirical data or expert knowledge. B.n.’s can handle all these kinds of information. Not emphasized before due to

limitations of existing b.n. inference algorithms.

“Blueprint” of probabilistic relations. The b.n. is separate from the code used to transfer data and compute

inferences. The b.n. expresses what’s known about the

building — no need to browse the code, read the manual, or phone the last guy who worked on the program.

10

SLIDE 11

Belief networks as building system models, cont’d Hierarchical organization.

Natural to model functional hierarchies. E.g., Equipment

models are parts of a building, buildings are parts of a

campus. Use distributed belief networks for distributed

systems!

Represent two or more alternative organizations in one

belief network — e.g., grouping status variables by building and by function.

11

SLIDE 12

* Belief networks as building system models, cont’d Temporal dependence. It’s easy to form a complex belief network by linking together instantaneous models. E.g., hidden Markov model. However, the extra dependencies may make computations intractable. Both empirical and first-principles relations. The conditional distribution of a variable can be either empirical or derived from first-principles, or both. “Either-or” is easy; “both” needs some work. E.g., diagnostic models. Empirical data is

typically available for normal operation, but failure models must be constructed mostly from prior info.

12

SLIDE 13

* Details: π- and λ- messages Laws of probability require that we distinguish upstream and downstream evidence. In a simply-connected b.n. on a directed graph (called a “Bayesian network”) the posterior of X is the product of the predictive distribution πX and the likelihood function λX. The predictive distribution summarizes the π-messages coming down from parents. No evidence ⇒ πX is prior. The likelihood function summarizes the λ-messages coming up from children. No evidence ⇒ λX is non-informative.

13

SLIDE 14

* π- and λ- messages at a typical node

U1 U2 Y1 Y2 X

πU2,X pX|U1,U2 πU1,X λY1,X λY2,X

πX(X) = p(X|U1, U2) πU1,X(U1) πU2,X(U2) dU1 dU2 λX = λY1,X λY2,X pX|e = πX λX

14

SLIDE 15

Details: Handling computational problems in riso riso computes an exact result if an exact result is known for a given combination of π- and λ-messages and the node’s conditional distribution. AFAIK this is new — existing schemes are

entirely exact or entirely approximate.

Otherwise, riso attempts to compute an approximation (a monotone spline or mixture of Gaussians). riso postpones the approximation until it is needed — distributions are represented in the b.n. description in their

riginal forms; keep the b.n. description close to the

engineering description. General cases are handled by numerical integrations.

15

SLIDE 16

“Distributed” meets “belief network” Locating and connecting b.n.’s on different hosts. Each host runs

some code that knows how to find a b.n. description and load the software required to run the b.n.; if necessary, the code can be loaded across the

Internet. AFAIK riso is the first cross-host d.b.n. system.

Communicating π- and λ-messages between b.n.’s. The

messages are probability distributions and likelihood functions. A block of data containing the necessary parameters is constructed, sent across the Internet, then reconstituted into a variable in a program.

Coping with communication failures. A host might crash or the

process running a b.n. might be killed. If a child is lost, the child is

removed. If a parent is lost, the prior for that parent is substituted for any

π-message.

16

SLIDE 17

* Publishing information as distributed belief networks B.n. approach lends itself well to time-honored software development policies: break up your problem and get the pieces to talk to each other. Probability shows that messages should be distributions; mechanism could be function calls, or could be Internet data packets — difference is a trivial detail. Prevent wheel-reinvention by making your b.n. publicly available — e.g., weather service; building or equipment database. Connect many b.n.’s together to obtain summary information — e.g., monitoring service offered by consulting firm.

17

SLIDE 18

* Uncertainty in a 1st-principles model There exists an enormous body of information about buildings in the form of DOE-2, TRNSYS, etc. first-principles models; b.n.’s should make use of this. Probabilistic operations can extend first-principles models:

Propagating uncertainty from model parameters
Taking weather variability into account
Concise representation of distributions over results

Relatively easy to show how uncertainties are propagated through commonly-occurring modeling equations (e.g., response factors).

18

SLIDE 19

Application I: Selecting electricity rates Energy component is easy: mostly need to estimate average electricity use. Demand is a little harder: need maximum electricity use. Use first-principles building model — envelope loads modeled with transfer function (discrete convolution). Transfer fcn. for mean is always stable; variance blows up for massive walls/roofs. Compute expected cost using schedules and distributions over total energy use and maximum energy use. Expected cost is

interesting if distributions cross breakpoints in schedules.

19

SLIDE 20

Selecting electricity rates, cont’d Simple building model:

Cooling loads only
Roof is only envelope surface
Envelope heat transfer calculated w/ transfer function

(requires temperature and insolation models)

Glazing and infiltration ignored
Occupancy, lighting, and equipment are modeled by

schedule + noise. Calculations carried out for the month of July; T and I models from Denver TMY2 data.

20

SLIDE 21

One slice of the electrical use model

I To Tsol−air ˙ Qo ˙ Qi,gross ˙ Qstore ˙ Qi ˙ Qvent ˙ Qocc E ˙ Qtotal t ˙ Qltg+eq

Model is implemented as a d.b.n.: each slice is a b.n., 24 × 31 slices altogether.

21

SLIDE 22

Distributions over hourly demand and maximum demand

4840 4860 4880 4900 4920 4940 4960 4980 2 4 6 8 10 12 14

Hours since midnight, Dec. 31 Demand, kW

9 10 11 12 13 14 15 16 17 18 19 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Maximum demand, kW Probability density

Left: One simulation of hourly demand, corresponding to one sequence of instantiations for the cut-set variables. Error bars

show the uncertainty not accounted for by the cut-set variables. Right:

Distributions over max. demand in off-peak, mid-peak, and peak hours. Each is a mixture of 6 max. demand distributions, each one

f the 6 corresponding to one monthly simulation.

22

SLIDE 23

Application II: Sensor models, simple and otherwise Simple sensor model takes into account sensor status and a measurement model. One can also reconcile predictions of a variable with measurements... ...and reconcile multiple measurements of the same variable... ...and reconcile evidence about a variable X from measurements of X and from measurements of a related variable Y . Sensor models are good candidates for widespread use of probabilistic models in engineering problems.

23

SLIDE 24

A Simple Sensor Model

X S

X

Observed value is influenced by sensor status and the observed variable. In general, sensor states are “OK” and one or more failure

states. Specify what the sensor output looks like in each sensor state.

Calculate pX|

X for further calculations, and pS| X for status

diagnosis. When

X seems glitched, pS|

X(“OK”) ≈ 0 and pX| X ≈ pX;

stagger forward with that.

24

SLIDE 25

A Sensor Model for Related Variables

SX Y X SY

X
Y

Each variable has a simple sensor model. Measurements of one variable yield information about the

ther, because of the relation between the variables.

Each sensor model and the relation between the measured variables are all constructed separately, and laws of probability tell how to combine different forms of information.

25

SLIDE 26

* Application III: Heating Coil

Tw, mw Cpw Tdb.ent, mair Tdb.lvg Cpa UA

26

SLIDE 27

* Belief network model for the heating coil

status actual

bserved

status actual

bserved

status actual

bserved

status actual

bserved

nominal multiplier

actual status

bserved

Tdb,ent mw mair Tw Cpa Cpw UA Tdb,lvg

status actual

bserved

A distributed belief network to represent variables and their

measurements. “Strange magnitude” for the nominal Tdb,lvg.

27

SLIDE 28

* Using MI to assess relevance of variables We can assess “usefulness” of measurements as mutual information, MI(X, Y ) = pXY (x, y) log pY |X(y, x) pY (y) dx dy The fraction in the logarithm is the interesting part.

MI takes partial derivatives into account, like conventional

sensitivity analysis...

...and MI takes sensor noise into account,
and MI takes prior distributions into account. Informativeness
f an observation is lesser/greater if the prior of the measured variable is

relatively narrow/broad.

28

SLIDE 29

* Assessment by MI MI was computed with X = groups of measured variables and Y = Tdb,lvg. Most informative pair, triple, and quadruple of variables are shown. All single variables are shown.

Tw

mw mair Tdb,ent bits 2bits + + + + 2.57 5.95 + + + 2.29 4.87 + + 1.66 3.16 + 1.23 2.35 + 0.066 1.05 + 0.056 1.04 + 0.013 1.01

29

SLIDE 30

Application IV: Mixing Box Damper

From Plant To Zone Return Air

A schematic diagram of the fan-powered mixing box.

30

SLIDE 31

Mixing box damper, cont’d Three kinds of models incorporated into damper model:

Markov chain for damper status

. . . , S?[t − 1], S?[t], S?[t + 1], . . ..

Neural net from empirical data for DP = F(ZTO, p0).
Simple sensor models for damper, pressure, and

temperature measurements. Each local model is developed separately, then pasted into

verall model.

Could easily reuse local models, or replace existing ones.

31

SLIDE 32

A temporal belief network for the mixing box damper

p0 DP ZTO ˆ p0

DP

DP ? ZTO ? S ? p0 ?

ZTO

One slice of a temporal belief network to represent the mixing box damper. The damper status variable S? is connected to nodes in other time slices.

32

SLIDE 33

Smoothing over a glitched measurement

Damper Position vs. ZTO Time Steps Left: Data from damper, t = 1 through t = 10, showing glitched

measurement. Right: Pr( S?[6] = “normal” ) for same period.

33

SLIDE 34

* Predictive distributions in the mixing box model

3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Damper Position Probability Density

−2 2 4 6 8 10 −0.05 0.05 0.1 0.15 0.2 0.25 0.3

ZTO Probability Density

Left: Posterior of DP, given ZTO = 5.649 and ˆ p0 = 0.3223.

Gaussian mixture with 6 components. Right: Posterior of ZTO,

given DP = 4.1 and ˆ p0 = 0.3223. Monotone spline with 4500

support points.

34

SLIDE 35

Future research: Adjusting parameters in situ When an equipment unit is manufactured, we have a good idea about how it will function, from design specs and from similar units. So we can construct a fairly accurate model from prior information. But it would be nice if the model could be tuned once the unit is installed, to better predict and diagnose that particular unit. A belief network for learning (Buntine, 1994) is a model in which equipment model parameters appear as variables.

35

SLIDE 36

* A distributed belief network to infer equipment type

a, b, c, . . . Past time slices Future

Equip. type

36

SLIDE 37

A distributed belief network to jump-start new installations

˜ a,˜ b, ˜ c At the Factory Site 2 a1, b1, c1 a2, b2, c2 Site 1

37

SLIDE 38

* Working around intractability Belief networks for learning subsume various hacks. We should

study the b.n.’s first, then make shortcuts and approximations as necessary to get acceptable computational speed. Hacking first obscures the goal.

Special-case code will usually be much faster than general-purpose code like riso. Draw a belief network, then design

the code from the computations that need to happen. Example: In b.n.’s for learning, every new observation invalidates almost all previous computations. Pick an acceptable time delay for updates; accumulate observations for that long before recomputing inferences.

38

SLIDE 39

Take-Home Message: Information Fusion Express relations between variables by conditional distributions. Treat uncertainty in observations, parameters, and hypotheses symetrically. Laws of probability permit incremental development.

Conditional probability models don’t need to be revised when additional nodes are added elsewhere in the network or sub-networks are merged; also, some calculations aren’t disturbed by new nodes.

“Think locally, compute globally,” and “Think once, compute

ften.”

39

SLIDE 40

Take-Home Message: Information Fusion, cont’d Belief networks are especially useful when one has prior information about the qualitative structure or connectivity of a set of variables. This applies well to building systems — lots of

functional and geographical organization.

The weakness/strength of b.n.’s is that the belief network provides a framework; in a particular application, there’s a lot

f work to be done to fill in the framework.

40