Estimating Environmental Exposure using Cell Tower Data
Owais Gilani, Bucknell University Michael Kane, Yale University Simon Urbanek, AT&T Labs Research
Estimating Environmental Exposure using Cell Tower Data Owais - - PowerPoint PPT Presentation
Estimating Environmental Exposure using Cell Tower Data Owais Gilani, Bucknell University Michael Kane, Yale University Simon Urbanek, AT&T Labs Research Outline Motivation : Why should we care about environmental exposure? Background :
Owais Gilani, Bucknell University Michael Kane, Yale University Simon Urbanek, AT&T Labs Research
Motivation: Why should we care about environmental exposure? Background: How is exposure done now? Approach: What are we doing to improve it? Results: A demo showing community-level exposure incorporating human mobility. Conclusion: Mobility is an important next step in exposure modeling
Over 8% of the U.S. population has asthma. 1.5 million emergency room visits in 2015. In 2015 3,615 people died of asthma. Asthmatics are 40% more likely to have acute episodes
"... outdoor air pollution is a 'Category 1'- or definite cause of cancer." 41% of Americans will be diagnosed with cancer and 21% will die from it. Environmental pollution causes at least 2% of cases.
Air pollution significantly increases the risk of low birth weight in babies, leading to lifelong damage to health. Cutting pollution to that guideline would prevent 300- 350 babies a year being born with low weight in London per year. Globally, 90% of children globally are exposed to air pollution above WHO guidelines.
Ozone Higher concentrations in summer Generated by cars, industrial facilities, etc. Fine particulate matter (PM 2.5) Higher concentrations in winter Increased mortality from lung cancer and heart disease Nitrogen Oxide NOx From fuel combustion Highest concentration on roads but also from energy production
The U.S. has 12 pollution monitoring sites, 12 of which are in CT. Get temperature, wind speed, wind direction, and the minimum distance to primary and secondary roads for census tracts. Model the pollution for each of the census tracts.
library(spTimer) x_train <- read.csv("pollution_data_train.csv") x_test <- read.csv("pollution_data_test.csv") # Gaussian Process model. pfit <- spT.Gibbs(formula = ozone.ppb ~ Temp2 + WindSpeed2 + minDistPrim+minDistSec, data = x, model = "GP", coords = ~ Longitude + Latitude) # Spatial Prediction. preds <- predict(pfit, newdata = x_test, newcoords = ~ Longitude + Latitude) # Compare modeled results with actuals. spT.validation(x_test$ozone.ppb, c(preds$Median))
We have: Estimate of the amount of ozone in the air at any location. Population estimates for any census tract We can multiply the amount of pollution by the number
average exposure in the state of CT.
For a user: Get the sequence of tower check-ins and their duration. Find the daily user exposure based on his/her location. Then, find the distribution of exposures for any census tract
Data obtained from the National Center for Health Statistics: Compressed Mortality, Multiple Cause of Death, Linked Birth / Infant Death records and Natality, are also covered by the following policy: The Public Health Service Act (42 U.S.C. 242m(d)) provides that the data collected by the National Center for Health Statistics (NCHS) may be used only for the purpose for which they were obtained; any effort to determine the identity of any reported cases, or to use the information for any purpose other than for statistical reporting and analysis, is against the
Use these data for statistical reporting and analysis only. For sub-national geography, do not present or publish death or birth counts of 9 or fewer or rates based on counts of nine or fewer (in figures, graphs, maps, table, etc.). Make no attempt to learn the identity of any person or establishment included in these data. Make no disclosure or other use of the identity of any person or establishment discovered inadvertently and advise the Director, NCHS of any such discovery.
How much does ozone exposure vary in CT commuters when we don't assume people are stationary? Where do we see the largest difference in
An AT&T user with a device that is connected to the network. Someone who is in CT for the entire day. Someone who checks into at least 3 towers within a 500m buffer of a primary or secondary road.
Split on users/devices. For each user check-in, get the towers location, time on the tower (in seconds). Join with the exposure at the tower lat, lon, time.
library(rgdal) census_tract_gen <- function(tract_shapefile, tract_id, projection=CRS(paste("+proj=utm +zone=17 +ellps=WGS84 +datum=WGS84", "+units=m +no_defs "))) { function(lon, lat) { x <- cbind(lon, lat) x.dat <- data.frame(id=1:nrow(x)) xx <- SpatialPointsDataFrame(x, x.dat) proj4string(xx) = CRS("+init=epsg:4326") xx.proj <- spTransform(xx, projection) shf.proj <- spTransform(tract_shapefile, projection) xx.tract <- xx.proj %over% shf.proj as.character(xx.tract[, tract_id]) } } ct <- readOGR("ShapeFiles/CT Census Tracts/", "tl_2016_09_tract") census_tract <- census_tract_gen(tract_shapefile=ct, tract_id="GEOID")