Estimating Environmental Exposure using Cell Tower Data Owais - - PowerPoint PPT Presentation

estimating environmental exposure using cell tower data
SMART_READER_LITE
LIVE PREVIEW

Estimating Environmental Exposure using Cell Tower Data Owais - - PowerPoint PPT Presentation

Estimating Environmental Exposure using Cell Tower Data Owais Gilani, Bucknell University Michael Kane, Yale University Simon Urbanek, AT&T Labs Research Outline Motivation : Why should we care about environmental exposure? Background :


slide-1
SLIDE 1

Estimating Environmental Exposure using Cell Tower Data

Owais Gilani, Bucknell University Michael Kane, Yale University Simon Urbanek, AT&T Labs Research

slide-2
SLIDE 2

Outline

Motivation: Why should we care about environmental exposure? Background: How is exposure done now? Approach: What are we doing to improve it? Results: A demo showing community-level exposure incorporating human mobility. Conclusion: Mobility is an important next step in exposure modeling

slide-3
SLIDE 3

Why should we care about environmental exposure?

slide-4
SLIDE 4

Exposure is linked to asthma

Over 8% of the U.S. population has asthma. 1.5 million emergency room visits in 2015. In 2015 3,615 people died of asthma. Asthmatics are 40% more likely to have acute episodes

  • n high pollution days.
slide-5
SLIDE 5

Exposure is linked to cancer

"... outdoor air pollution is a 'Category 1'- or definite cause of cancer." 41% of Americans will be diagnosed with cancer and 21% will die from it. Environmental pollution causes at least 2% of cases.

slide-6
SLIDE 6

Exposure is linked to poor fetal development

Air pollution significantly increases the risk of low birth weight in babies, leading to lifelong damage to health. Cutting pollution to that guideline would prevent 300- 350 babies a year being born with low weight in London per year. Globally, 90% of children globally are exposed to air pollution above WHO guidelines.

slide-7
SLIDE 7

Pollution Types

Ozone Higher concentrations in summer Generated by cars, industrial facilities, etc. Fine particulate matter (PM 2.5) Higher concentrations in winter Increased mortality from lung cancer and heart disease Nitrogen Oxide NOx From fuel combustion Highest concentration on roads but also from energy production

slide-8
SLIDE 8

How is exposure modeled now?

slide-9
SLIDE 9

First, you model pollution

The U.S. has 12 pollution monitoring sites, 12 of which are in CT. Get temperature, wind speed, wind direction, and the minimum distance to primary and secondary roads for census tracts. Model the pollution for each of the census tracts.

slide-10
SLIDE 10

library(spTimer) x_train <- read.csv("pollution_data_train.csv") x_test <- read.csv("pollution_data_test.csv") # Gaussian Process model. pfit <- spT.Gibbs(formula = ozone.ppb ~ Temp2 + WindSpeed2 + minDistPrim+minDistSec, data = x, model = "GP", coords = ~ Longitude + Latitude) # Spatial Prediction. preds <- predict(pfit, newdata = x_test, newcoords = ~ Longitude + Latitude) # Compare modeled results with actuals. spT.validation(x_test$ozone.ppb, c(preds$Median))

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23

How do we model exposure?

We have: Estimate of the amount of ozone in the air at any location. Population estimates for any census tract We can multiply the amount of pollution by the number

  • f people in the census tract to get the distribution of

average exposure in the state of CT.

slide-24
SLIDE 24
slide-25
SLIDE 25

Does anyone see problems with this?

slide-26
SLIDE 26

How does cell data help?

slide-27
SLIDE 27
slide-28
SLIDE 28

For a user: Get the sequence of tower check-ins and their duration. Find the daily user exposure based on his/her location. Then, find the distribution of exposures for any census tract

Our approach

slide-29
SLIDE 29

A note on anonymity from the CDC website

Data obtained from the National Center for Health Statistics: Compressed Mortality, Multiple Cause of Death, Linked Birth / Infant Death records and Natality, are also covered by the following policy: The Public Health Service Act (42 U.S.C. 242m(d)) provides that the data collected by the National Center for Health Statistics (NCHS) may be used only for the purpose for which they were obtained; any effort to determine the identity of any reported cases, or to use the information for any purpose other than for statistical reporting and analysis, is against the

  • law. Therefore users will:

Use these data for statistical reporting and analysis only. For sub-national geography, do not present or publish death or birth counts of 9 or fewer or rates based on counts of nine or fewer (in figures, graphs, maps, table, etc.). Make no attempt to learn the identity of any person or establishment included in these data. Make no disclosure or other use of the identity of any person or establishment discovered inadvertently and advise the Director, NCHS of any such discovery.

slide-30
SLIDE 30
slide-31
SLIDE 31

Research Questions

How much does ozone exposure vary in CT commuters when we don't assume people are stationary? Where do we see the largest difference in

  • zone exposure in the two models?
slide-32
SLIDE 32

What is a CT commuter?

An AT&T user with a device that is connected to the network. Someone who is in CT for the entire day. Someone who checks into at least 3 towers within a 500m buffer of a primary or secondary road.

slide-33
SLIDE 33

Are these really only "commuters?"

slide-34
SLIDE 34

Procedure

Split on users/devices. For each user check-in, get the towers location, time on the tower (in seconds). Join with the exposure at the tower lat, lon, time.

slide-35
SLIDE 35

What does typical code look like?

slide-36
SLIDE 36

library(rgdal) census_tract_gen <- function(tract_shapefile, tract_id, projection=CRS(paste("+proj=utm +zone=17 +ellps=WGS84 +datum=WGS84", "+units=m +no_defs "))) { function(lon, lat) { x <- cbind(lon, lat) x.dat <- data.frame(id=1:nrow(x)) xx <- SpatialPointsDataFrame(x, x.dat) proj4string(xx) = CRS("+init=epsg:4326") xx.proj <- spTransform(xx, projection) shf.proj <- spTransform(tract_shapefile, projection) xx.tract <- xx.proj %over% shf.proj as.character(xx.tract[, tract_id]) } } ct <- readOGR("ShapeFiles/CT Census Tracts/", "tl_2016_09_tract") census_tract <- census_tract_gen(tract_shapefile=ct, tract_id="GEOID")

slide-37
SLIDE 37

So what are the results?

slide-38
SLIDE 38

Conclusions

The absolute cumulative difference is up to 0.05 PPH. Even with its low-spatial variation, assuming no spatial variation introduces significant bias.