Analysis of Survival Times Using Bayesian Networks Helge Langseth - - PowerPoint PPT Presentation

analysis of survival times using bayesian networks
SMART_READER_LITE
LIVE PREVIEW

Analysis of Survival Times Using Bayesian Networks Helge Langseth - - PowerPoint PPT Presentation

Analysis of Survival Times Using Bayesian Networks Helge Langseth Presented at ESREL 98 Trondheim, Norway, 16-19 June 1998 NTNU Two types of statistical models Neyman categorises statistical models into two groups: Interpolating


slide-1
SLIDE 1

Analysis of Survival Times Using Bayesian Networks

Helge Langseth Presented at ESREL ‘98 Trondheim, Norway, 16-19 June 1998

NTNU

slide-2
SLIDE 2

NTNU

Slide no.: 2

Two types of statistical models

Neyman categorises statistical models into two groups:

  • Interpolating models

Used merely to capture rough effects in the data

  • Explorative models

Used to explore the underlying process which generates the data we have observed

slide-3
SLIDE 3

NTNU

Slide no.: 3

Scope

With a database as a starting point, we want to build an explorative model to pinpoint how to reduce the rate of critical failures in a system components. Our main goal is to build a model to gain understanding about how the covariates contribute to the system’s survival times.

slide-4
SLIDE 4

NTNU

Slide no.: 4

The History of Graphical Models

  • Graphical models in statistics can be dated back to

Wright’s notation in 1921.

  • The calculation complexity did however, render

the Bayesian Networks neglected for 60 years

  • In the 1980’ties, effective algorithms for exact

calculations on graphs, and later on computer intensive methods like Markov-Chain Monte- Carlo brought the Bayesian Networks back into the light, and up on the Top 5 Statistical Buzz- Word of the Week.

slide-5
SLIDE 5

NTNU

Slide no.: 5

Bayesian Networks

Age Exposure To Toxic Gender Smoking Cancer Serum Calcium Lung Tumour

slide-6
SLIDE 6

NTNU

Slide no.: 6

Conditional Independence

Age Exposure To Toxic Gender Smoking Cancer Serum Calcium Lung Tumour

Cancer is independent of Age and Gender given Exposure To Toxic and Smoking

slide-7
SLIDE 7

NTNU

Slide no.: 7

“Fundamental Theorem”

Every multidimensional statistical distribution function can be represented by a Bayesian Network.

∏ ∏

= = −

= =

n i i n i i i n

x f x x x x f x x x f

1 1 1 2 1 2 1

) rs" predecesso All |" ( ) ,..., , | ( ) ,..., , (

1 2 3 4 n

slide-8
SLIDE 8

NTNU

Slide no.: 8

Nodes are Probability Tables

Gender Smoking Age Exposure To Toxic Cancer Serum Calcium Lung Tumour Age Exposed To Toxic Material In (25,65) Not In (25, 65) True 5 % 1% False 95 % 99%

slide-9
SLIDE 9

NTNU

Slide no.: 9

Where do the Networks come from?

Situation:

We want to build a model to analyse a multidimensional vector X.

Aid:

To do so, we have N i.i.d. realisations of X, x1, …, xN AND / OR a domain expert.

Unknowns:

  • The network structure
  • The parameters in the local node tables
slide-10
SLIDE 10

NTNU

Slide no.: 10

Generating Networks:

  • Initialize Network

repeat

  • Propose some Change to the structure
  • Fit Parameters to the new structure
  • Evaluate the new network according to

some measure (like BIC, AIC, MDL)

  • If the New network is Better than the

previous, then Keep the Change until Finished

slide-11
SLIDE 11

NTNU

Slide no.: 11

Bayesian Networks are used in:

  • In “expert systems”, mostly in medical domains

(e.g. the MUNIN system)

  • In decision support systems (e.g. for NASA)
  • In analysis of dynamic systems (e.g. speech

recognition, the BAT-Mobile)

slide-12
SLIDE 12

NTNU

Slide no.: 12

Bayesian Networks, Summary:

  • An estimate of the multidimensional density
  • Easy to understand for non-statisticians (e.g. a

domain expert)

  • The representation is optimized for tasks like

– Prediction – Classification – Decision support

  • Can incorporate prior domain knowledge:

– “Top down analysis”: Expert knowledge – “Bottom up” analysis: Data driven system verification

slide-13
SLIDE 13

NTNU

Slide no.: 13

Reliability Analysis

  • Data-set: 219 Gas-Turbines with 2921 failures and

300 censored survival times from the OREDA-IV database

  • Each failure is described by ten covariates, e.g.,

System Type, Manufacturer, Actual/Planned PM,...

  • We have special interest in Time To Fail and

Failure Severity (Critical or Degraded)

  • Problem to solve: “How can we reduce the

frequency of critical failures?”

slide-14
SLIDE 14

NTNU

Slide no.: 14

Generated Network

Actual PM Planned PM System Code Manufact. Operating Mode Sub unit Design Class Location

Installation Code

Environ- ment Severity Class Time to Fail

slide-15
SLIDE 15

NTNU

Slide no.: 15

“Clique” Graph

Environ. Location System Subunit TTF PM Severity Location: Installation Code Location PM: Planned PM Actual PM System: System Code Operating Mode Manufacturer Environment: Installation Code Environment System Code Design Class

slide-16
SLIDE 16

NTNU

Slide no.: 16

Model Verification

1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 Bayesian network Cox regression

slide-17
SLIDE 17

NTNU

Slide no.: 17

Conclusions

  • We have generated a Bayesian Network to analyse

a data-set from the OREDA IV database.

  • The Bayesian Network enabled both Qualitative

and Quantitative analysis of the data-set.

  • To verify the calculations, the numerical results

where compared to those found by Cox regression. The results of the two methods were at the same level.