Bayesian cluster detection via adjacency modelling Craig Anderson - - PowerPoint PPT Presentation

▶

Apr 14, 2023 182 likes •461 views

Bayesian cluster detection via adjacency modelling Craig Anderson University of Technology Sydney Bayes on the Beach 2015 Acknowledgements Co-authors Dr Duncan Lee (University of Glasgow). Dr Nema Dean (University of Glasgow). Funding

SLIDE 1

Bayesian cluster detection via adjacency modelling

Craig Anderson

University of Technology Sydney

Bayes on the Beach 2015

SLIDE 2

Acknowledgements

Co-authors

Dr Duncan Lee (University of Glasgow). Dr Nema Dean (University of Glasgow).

Funding

Carnegie Trust for the Universities of Scotland. ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).

2/26

SLIDE 3

Motivation

Want to model respiratory admissions in Glasgow, Scotland. Glasgow is a city with many health inequalities. Need a model which accounts for these inequalities. Standard spatial modelling techniques are unsuitable.

3/26

SLIDE 4

Glasgow case study

4/26

SLIDE 5

The Glasgow effect

Glasgow has the lowest life expectancy in the UK (73 for men, 78.5 for women). One in four children will not live beyond 65. Epidemiologists call this the ‘Glasgow effect’. Huge health inequalities within the city. Life expectancy ranges from 59 in Parkhead to 80 in Jordanhill & Kelvinside.

5/26

SLIDE 6

Why do we need spatial modelling?

Disease risk often varies across a geographical region. Nearby areas tend to have more in common than those further apart. Model structure must account for this. Identifying high-risk areas is first step to fixing health issues.

6/26

SLIDE 7

Respiratory data

Case study of respiratory hospital admissions in Greater Glasgow & Clyde Health Board. Region divided into 271 non-overlapping ‘Intermediate Geographies’ (IGs). Each IG has roughly the same population. Number of admissions in each IG is recorded (Yi). Also compute expected number of admissions (Ei).

7/26

SLIDE 8

SIR (Yi/Ei) for Glasgow respiratory admissions

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

8/26

SLIDE 9

Modelling risk

Disease risk commonly modelled with a Poisson GLM. Random effect included to account for spatial variation. Yi|Ei, Ri ∼ Poisson(EiRi) i = 1, ..., n ln(Ri) = xT

i β + φi

xT

i β is covariate information.

φi is the random effect term for area i.

9/26

SLIDE 10

Conditional autoregressive model

Simplest CAR model is the intrinsic model (Besag et al, 1991). φi|φ−i ∼ N

    

n

wijφj

n

wij , τ 2

n

wij

     φ−i is a vector of all random effects except φi. wij=1 if i and j are neighbours, 0 otherwise. τ 2 is a conditional variance term.

10/26

SLIDE 11

Fitted model for Glasgow respiratory admissions

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

11/26

SLIDE 12

Drawbacks of standard CAR models

Assumes constant spatial smoothness across the study region. In reality, risk varies differently across the region. Extreme values smoothed towards mean - contrary to aim. Prefer a method which allows more flexible smoothing.

12/26

SLIDE 13

Alternative smoothing approach

One approach is to introduce ‘boundaries’ in the risk surface. Areas separated by boundaries are not smoothed. We want to identify ‘closed’ boundaries which fully enclose a group of areal units. This approach involves grouping together similar neighbouring areas - ie clustering. Allows identification of clusters of high (or low) disease risk.

13/26

SLIDE 14

Agglomerative Hierarchical Clustering

1 Initially consider every object to be a ‘singleton’ cluster of size

2 Evaluate a dissimilarity measure for each possible pair. 3 Merge together the two most similar clusters. 4 Return to step 2 and repeat until all clusters are merged.

14/26

SLIDE 15

Spatial Agglomerative Hierarchical Clustering

1 Initially consider every object to be a ‘singleton’ cluster of size

2 Evaluate a dissimilarity measure for each possible pair. 3 Merge together the two most similar neighbouring clusters. 4 Return to step 2 and repeat until all clusters are merged.

15/26

SLIDE 16

Application to Glasgow SIR data

5 Clusters

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

20 Clusters

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

10 Clusters

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

30 Clusters

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

16/26

SLIDE 17

Selecting the best cluster structure

The algorithm produces n possible cluster structures. Need to find a way to select the most suitable structure. Subjective methods could be used in some examples, but not ideal in general. Need to find an objective method to select a structure.

17/26

SLIDE 18

Incorporating into a model

Need an approach which incorporates the choice of cluster structure into the model. Our cluster structures have a natural ordering (C1, . . . Cn). Use this ordering to include the number of clusters as a model parameter. Induce clustering via correlation structure of random effects.

18/26

SLIDE 19

Altering the correlation structure

Change the neighbourhood matrix to account for clusters. wij = 1 if i, j are neighbours AND lie in the same cluster, wij = 0 otherwise. But this could produce singleton clusters with no neighbours. Use localised CAR prior proposed by Lee et al (2014). Includes global random effect φ∗.

19/26

SLIDE 20

Lee et al (2014)

Extend neighbourhood matrix as follows:

w∗ wT

∗

w∗ = (w1∗ . . . , wn∗)

wi∗ = 1 if area i has at least one neighbour in a different cluster. wi∗ = 0 otherwise.

20/26

SLIDE 21

Lee et al (2014)

Localised CAR (LCAR) prior takes the form: φi| φ−i ∼ N

n

j=1 ˆ

wijφj + ˆ wi∗φ∗

n

j=1 ˆ

wij + ˆ wi∗ + ǫ , τ 2

n

j=1 ˆ

wij + ˆ wi∗ + ǫ

φ∗|

φ−∗ ∼ N

n

j=1 ˆ

wj∗φj

n

j=1 ˆ

wj∗ + ǫ, τ 2

n

j=1 ˆ

wj∗ + ǫ

21/26

SLIDE 22

Model

Random effect model: Yi|Ei, Ri ∼ Poisson(EiRi) ln(Ri) = β0 + φi

φ ∼ LCAR( ˆ W ) ˆ W ∼ Discrete( W1, . . . , Wn; π1, . . . , πn) πj =

exp(−jθ)

exp(−iθ)

θ ∼ Uniform(0,1)

22/26

SLIDE 23

Posterior density of ˆ W

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Density of Cluster Choices

Number of Clusters Density 0.00 0.05 0.10 0.15 0.20 0.25 23/26

SLIDE 24

Glasgow application

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

24/26

SLIDE 25

Discussion

Presented a method which allow for more localised smoothing. Picks out geographical clusters of disease risk. Lots of applications in epidemiology and public health. Could help identify factors which are causing high-risk clusters.

25/26

SLIDE 26

Potential future work

Single-stage model which doesn’t require prior clustering. Alternative spatial correlation structures. Consider applications to different forms of spatial data. Develop spatio-temporal disease clustering methodology.

26/26