Identifying the Food and Location Source of Large-Scale Outbreaks of - - PowerPoint PPT Presentation

identifying the food and location source of large scale
SMART_READER_LITE
LIVE PREVIEW

Identifying the Food and Location Source of Large-Scale Outbreaks of - - PowerPoint PPT Presentation

Identifying the Food and Location Source of Large-Scale Outbreaks of Foodborne Disease ABIGAIL HORN P o s t d o c t o r a l S c i e n t i s t , G e r m a n F e d e r a l I n s t i t u t e f o r R i s k A s s e s s m e n t ( B f R ) B f R


slide-1
SLIDE 1

ABIGAIL HORN

P o s t d o c t o r a l S c i e n t i s t , G e r m a n F e d e r a l I n s t i t u t e f o r R i s k A s s e s s m e n t ( B f R ) B f R C o - a u t h o r s : M a t t h i a s F i l t e r , M a r c e l F u h r m a n n , A n n e m a r i e K ä s b o h r e r , A r m i n W e i s e r O t h e r C o - a u t h o r s : H a n n o F r i e d r i c h ( K L U ) , A n d r e a s B a l s t e r ( K L U ) , E l e n a P o l o z o v a ( M I T )

I A F P 2 0 1 8 | T e c h n i c a l S e s s i o n I – M o d e l i n g a n d R i s k A s s e s s m e n t A U G U S T 9 , 2 0 1 8

Identifying the Food and Location Source of Large-Scale Outbreaks of Foodborne Disease

slide-2
SLIDE 2

Motivation: Foodborne Disease Outbreaks

  • 4321 illnesses (3816 in Germany)
  • 54 deaths
  • 16 countries with cases
  • 9 weeks to identify source

Impact of 2011 sprout E.coli /EHEC outbreak:

Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M., an der Heiden, M., et al (2011). “Epidemic profile of Shiga-toxin producing Escherichia coli O104: H4 outbreak in Germany.” New England Journal of Medicine, 365(19), 1771-1780.

slide-3
SLIDE 3

May 2 Outbreak Identified

  • CDC. E.coli Germany outbreak update. cdc.gov. Archived from the original on 27 June 2011.

Bundesinstitut für Risikobewertung (BfR). “Fenugreek seeds with high probability for EHEC O104: H4 responsible outbreak” 30 June 2011. Frank, Christina, et al. "Epidemic profile of Shiga-toxin–producing Escherichia coli O104: H4 outbreak in Germany." New England Journal of Medicine 365.19 (2011): 1771-1780.

Could cases have been averted?

Date of illness onset

slide-4
SLIDE 4

May 2 Outbreak Identified

June 10 RKI confirms sprouts as food source

  • CDC. E.coli Germany outbreak update. cdc.gov. Archived from the original on 27 June 2011.

Bundesinstitut für Risikobewertung (BfR). “Fenugreek seeds with high probability for EHEC O104: H4 responsible outbreak” 30 June 2011. Frank, Christina, et al. "Epidemic profile of Shiga-toxin–producing Escherichia coli O104: H4 outbreak in Germany." New England Journal of Medicine 365.19 (2011): 1771-1780.

Could cases have been averted?

Date of illness onset

slide-5
SLIDE 5

May 2 Outbreak Identified

June 10 RKI confirms sprouts as food source June 30 BfR confirms

  • rganic farm in

Uelzen as location source

  • CDC. E.coli Germany outbreak update. cdc.gov. Archived from the original on 27 June 2011.

Bundesinstitut für Risikobewertung (BfR). “Fenugreek seeds with high probability for EHEC O104: H4 responsible outbreak” 30 June 2011. Frank, Christina, et al. "Epidemic profile of Shiga-toxin–producing Escherichia coli O104: H4 outbreak in Germany." New England Journal of Medicine 365.19 (2011): 1771-1780.

Could cases have been averted?

Date of illness onset

slide-6
SLIDE 6

Food Supply Network Model

Identifying the Location Source

slide-7
SLIDE 7

Food Supply Network Model

Identifying the Location Source

Illness Case Report Data

Reported Cases

slide-8
SLIDE 8

Food Supply Network Model

Identifying the Location Source

Illness Case Report Data

Reported Cases Top Ranked Source

Source Localization Algorithm

slide-9
SLIDE 9

Food Supply Network Model

Identifying the Location Source

Illness Case Report Data

Reported Cases Top Ranked Source

Source Localization Model

0.00 0.05 0.10 0.15 0.20 0.25 0.30 5 3 15 9 8 4 1 13 12 11 2 6 10 7 14

probability Source ID

Illustrative PMF and Source Ranking

slide-10
SLIDE 10

German food supply network model

slide-11
SLIDE 11

German spatial commodity flow model

Consumption

402 regions

Agriculture

402 regions 50 Export countries

Processing

402 regions Trade 225 Individual Warehouses Imports Exports 50 Import countries

Balster, Andreas and Hanno Friedrich (2018). "Dynamic freight flow modelling for risk evaluation in food supply“. In: Transportation Research Part E. https://doi.org/10.1016/j.tre.2018.03.002

slide-12
SLIDE 12

51 commodity groups: industrial interactions

Balster, Andreas and Hanno Friedrich (2018). "Dynamic freight flow modelling for risk evaluation in food supply“. In: Transportation Research Part E. https://doi.org/10.1016/j.tre.2018.03.002

slide-13
SLIDE 13

Inputs: Spatial production & consumption data

Production of sugar beets § 30.000.000 t/year Production of sugar § 5.000.000 t/year Production of confectionaries § 3.000.000 t/year t/km² ≤ 10 ≤ 100 ≤ 200 ≤ 500 > 500 … 100% 30% …

slide-14
SLIDE 14

Modeling process: Estimating links

1) Between categories of food:

Research to identify industrial interactions E.g. Sugar, milk products, eggs, grains à production of confectionaries

(2) (1)

51 commodity groups 59 groups of actors 452 regions

2) Between regions:

Estimate flows using gravity model

>> Calibration necessary: German Federal Transport Master Plan data

Tij = AiBjOiDj exp(−β ⋅dij)

Modeling Links:

slide-15
SLIDE 15

Identifying the outbreak location source

slide-16
SLIDE 16

Network Source Localization Problem Statement

Assume

  • Network model
  • Probabilistic model of transmission process

Outbreak process

  • At time ts a single source s* is contaminated, all others
  • susceptible. Contamination spreads through the network

resulting in cases of illness Observe reports of illness

  • At a subset of network node locations

Objective

  • Estimate the source location given the locations of
  • bservations of illness, based on the network structure

and the transmission process

slide-17
SLIDE 17

Features of Existing Source Identification Approaches (And Literature Gap)

Features of Existing Source Identification Approaches (1) (2) (3) (4) (5) (6) Source identification methodology of existing work Assumes SI(R) model/status Assumes complete

  • bservations

Ignores weights Only shortest paths Only dominant paths Assumes

  • bservation

times Rumor centrality (14) X X X Betweenness centrality (15) X X X Eigenvector centrality (16,17) X X Message passing (18) X X Belief propagation (19) X X Gaussian (20,21) X X Four-metric (22) X X X Monte Carlo (22) X X X Jordan centrality (24,25) X X Effective Distance (26,27) X

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-18
SLIDE 18

Distinguishing features of foodborne disease transmission

1) A transport (diffusion), not epidemiological (contagion), transmission process 2) Observations are sparse: most nodes not observable 3) Observations are far from the source 4) Similar path lengths 5) Multiple candidate paths

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-19
SLIDE 19

Example: Multiple candidate paths

Uelzen: Frankfurt Uelzen: Paderborn Shorter distances à High probability “designated” path

slide-20
SLIDE 20

Example: Multiple candidate paths

Uelzen: Frankfurt Longer distances à Multiple similar probability paths

slide-21
SLIDE 21

Random Walk Contamination Transmission Model

Random Walk model of contamination diffusion through the network

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

The Markov transition probabilities, taken together define the Markov transition matrix for the process,

The transition probabilities pij are defined in terms of the weights as

pij = wij

V

wij

V j

∈ 0,1

[ ], i ≠ j

P = P

Q

P

R

O IR ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥

Transitions between transient nodes Sub-matrix representing transitions from transient to absorbing nodes Sub-matrix representing Absorption

P Xn+1 = j Xn = i

( ) = pij

slide-22
SLIDE 22

1. Use network structure to filter the feasible source set 2. Determine the probability that any feasible source is the true source s*, given the observation set Probabilistically optimal solution:

  • Bayesian inference:
  • Maximum likelihood estimator:

Maximum Likelihood Source Estimator

P s∗ = s Θ

( ) = P s∗ = s ( )P Θ s∗ = s

( ) P Θ

( )

s ∈Ω s ∈Ω

Θ

ˆ s = argmax

s∈ Ω

P s∗ = s

( )P Θ s∗ = s

( )

Prior probability Transmission model

slide-23
SLIDE 23

Source estimator:

Source Estimator: Main Result

ˆ s = argmax

s∈ Ω

I − P

Q

( )

−1 P R

⎡ ⎣ ⎤ ⎦so

O

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-24
SLIDE 24

Source estimator:

Source Estimator: Main Result

ˆ s = argmax

s∈ Ω

I − P

Q

( )

−1 P R

⎡ ⎣ ⎤ ⎦so

O

Considering all contaminated nodes o Probability of traveling through the network from s to o

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-25
SLIDE 25

Source estimator:

Source Estimator: Main Result

ˆ s = argmax

s∈ Ω

I − P

Q

( )

−1 P R

⎡ ⎣ ⎤ ⎦so

O

Considering all contaminated nodes o Probability of traveling through the network from s to o

Contribution: Accounts for all possible combinations of paths of travel through the network à Increases accuracy in non-tree-like networks à The probabilistically exact (maximum likelihood) solution

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-26
SLIDE 26

Evaluation: 2011 EHEC Outbreak

Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M., an der Heiden, M., et al (2011). “Epidemic profile of Shiga-toxin producing Escherichia coli O104: H4 outbreak in Germany.” New England Journal of Medicine, 365(19), 1771-1780.

  • 4321 illnesses (3816 in Germany)
  • 54 deaths
  • 16 countries with cases
  • 9 weeks to identify source

Impact of 2011 sprout E.coli /EHEC outbreak:

slide-27
SLIDE 27

2011 EHEC outbreak: Source Identification Results

Outbreak Week Our model: Rank of source loca5on 1 38 2 3 3 2 4 2 5 1 6 1 7 1 8 1 9 1 10 1 11 2

= with

  • ur model

= actual traceback date

slide-28
SLIDE 28

2011 EHEC outbreak: PMF Results by Map

Week 2 Week 3 Week 4 Week 5 Week 6

Horn, A., Friedrich, H. “Locating the Source of Large-scale Diffusion of Foodborne Contamination,” preprint. https://arxiv.org/abs/1805.03137

slide-29
SLIDE 29

Identifying the food vector source of the outbreak

slide-30
SLIDE 30

Identifying the food vector source

Which food network + transmission model best supports the observed outbreak data?

…For which food network do we detect the highest “signal”?

slide-31
SLIDE 31

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

slide-32
SLIDE 32

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

PMF resulting from applying source identification algorithm to EHEC case data

slide-33
SLIDE 33

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

PMF resulting from applying source identification algorithm to EHEC case data More structure in pmf values à Stronger “signal” of outbreak data + network agreement

slide-34
SLIDE 34

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

PMF resulting from applying source identification algorithm to EHEC case data Faster descent in pmf values à Stronger “signal” of outbreak data + network agreement

slide-35
SLIDE 35

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

PMF resulting from applying source identification algorithm to EHEC case data Faster descent in pmf values à Stronger “signal” of outbreak data + network agreement

slide-36
SLIDE 36

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

More “information” in the pmf distribution à Stronger “signal” of outbreak data + network match How to measure the signal strength? à Compare rank-ordered pmf to uniform distribution = traceback pmf over sources, EHEC case data = uniform pmf PMF resulting from applying source identification algorithm to EHEC case data

slide-37
SLIDE 37

For each food item network, find the probability distribution (pmf) resulting from applying the source identification estimator

Analyze PMF to measure “signal strength”

More “information” in the pmf distribution à Stronger “signal” of outbreak data + network match = traceback pmf over sources, EHEC case data = uniform pmf PMF resulting from applying source identification algorithm to EHEC case data How to measure the signal strength? à Compare rank-ordered pmf to uniform distribution to measure the amount of structure or information

slide-38
SLIDE 38

Measure the “signal strength” by comparing the pmf resulting from

  • utbreak data to a uniformly distributed pmf

Signal strength metric: Mean Absolute Error (MAE)

Horn, A., Friedrich, H., Balster, A., Polozova, E., Fuhrmann, M., Weiser, A., Kaesbohrer, A., and Filter,

  • M. “Identifying Large-Scale Outbreaks of Foodborne Disease,” in preparation.

Measuring “signal strength” using Mean Absolute Error metric

Signal Strength = 1 |Ω| X

s∈Ω

|PTB(s) − Punif(s)| X

slide-39
SLIDE 39

MAE allows us to discriminate between networks

Analyze PMF to measure “signal strength”

= traceback pmf over sources, EHEC case data, vegetable network = uniform pmf

slide-40
SLIDE 40

MAE allows us to discriminate between networks

Analyze PMF to measure “signal strength”

More “structure” in the traceback pmf distribution à Steeper decay, first to cross uniform = traceback pmf over sources, EHEC case data, vegetable network = uniform pmf = traceback pmf over sources, EHEC case data, egg network MAE will always choose the traceback pmf function further from uniform pmf

slide-41
SLIDE 41

Comparison Across Network Structures

Number of Cases

PMFs computed @500 illnesses 500 illnesses

Signal Strength as a function of Number of Cases of Illness Reported Normalized Signal Strength

slide-42
SLIDE 42

What does a signal vs. no signal look like?

Simulated Outbreak à Single true source Randomly Sampled Cases à No single source

slide-43
SLIDE 43

What does a signal vs. no signal look like?

Simulated Outbreak à Visible Signal Randomly Sampled Cases à No Signal Number of Cases Signal Strength Number of Cases

True network match

slide-44
SLIDE 44

Evaluation: 2011 EHEC Outbreak

Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M., an der Heiden, M., et al (2011). “Epidemic profile of Shiga-toxin producing Escherichia coli O104: H4 outbreak in Germany.” New England Journal of Medicine, 365(19), 1771-1780.

  • 4321 illnesses (3816 in Germany)
  • 54 deaths
  • 16 countries with cases
  • 9 weeks to identify source

Impact of 2011 sprout E.coli /EHEC outbreak:

slide-45
SLIDE 45

EHEC 2011: Food Vector Identification Results

Number of Cases Signal Strength

Strongest signal à best network match: Vegetables Network

1000 illnesses

Horn, A., Friedrich, H., Balster, A., Polozova, E., Fuhrmann, M., Weiser, A., Kaesbohrer, A., and Filter,

  • M. “Identifying Large-Scale Outbreaks of Foodborne Disease,” in preparation.
slide-46
SLIDE 46

100 illnesses

Number of Cases

EHEC 2011: Food Vector Identification Results

Number of Cases Signal Strength

Strongest signal à best network match: Vegetables Network

1000 illnesses

slide-47
SLIDE 47

Case Study: Listeria 13a/54 Contamination Through Pork Belly Product (2012 – 2015)

slide-48
SLIDE 48

Listeria 13a/54 + Pork Belly Outbreak, 2012 – 2015

Outbreak Timeline

Epidemic Characteristics:

  • 77 Cases
  • 3.5 year timeline

Hypothesized Food Sources: Meats:

  • Preserved meats (e.g. sausage, salami)
  • Cooked sausages (e.g. Lyoner, bologna)
  • Meat cuts: Pork, minced meat, chicken

Dairy: Milk, butter, yogurt Vegetables: Head lettuce, radishes, tomatoes

Wilking, H., Vagen, S., Stark, K., Frank, C., Prager, R., Flieger, A., Tietze, E., Halbedel, S., Al-Dahouk, S., Kleta, S. (2015). Lagebericht zum Ausbruch von Listeriose in Süddeutschland, 2012-2015. Internal report.

slide-49
SLIDE 49

Listeria 13a/54: Network Identification Results

Number of Cases Signal Strength

True network match: Processed beef and pork products (refrigerated)

Able to distinguish correct food group

Horn, A., Friedrich, H., Balster, A., Polozova, E., Fuhrmann, M., Weiser, A., Kaesbohrer, A., and Filter,

  • M. “Identifying Large-Scale Outbreaks of Foodborne Disease,” in preparation.
slide-50
SLIDE 50

Summary

1) Exact food supply network data across companies does not exist; we model it in spatially aggregated form using publicly available data 2) Food networks structurally unique; important to consider these unique features in source identification problem 3) Spatial model is aggregated and estimated, but case studies suggest it may be sufficient to identify food vector of an

  • utbreak, for various outbreak scenarios

4) More generally: We propose a completely novel set of methods to answer the question “which food item caused this

  • utbreak” >> Extensions to other research problems
slide-51
SLIDE 51

Ongoing/Future Work

Application specific: 1) Characterizing accuracy > Simulate outbreaks using independent model/data source 2) Further developing network model > Implementation, improvements, and extensions at BfR > Developing in other country frameworks: USA… Method specific: 1) Analyze spatial dimension of PMF 2) Extending signal identification methods to other research problems: detecting outbreak, optimal response time

slide-52
SLIDE 52

Project Materials

Source localization

  • Horn, A., Friedrich, H. “Locating the Source of Large-scale

Diffusion of Foodborne Contamination,” preprint. https:// arxiv.org/abs/1805.03137 Network model

  • Balster, Andreas and Hanno Friedrich (2018). "Dynamic freight

flow modelling for risk evaluation in food supply.” In: Transportation Research Part E. https://doi.org/10.1016/j.tre. 2018.03.002 Food item identification

  • Horn, A., Friedrich, H., Balster, A., Polozova, E., Fuhrmann, M.,

Weiser, A., Kaesbohrer, A., and Filter, M. “Identifying Large- Scale Outbreaks of Foodborne Disease,” in preparation.

slide-53
SLIDE 53

Funding Sources

slide-54
SLIDE 54

Thank You!!!

Questions?

abbylhorn@alum.mit.edu www.Abigail-Horn.com