Cyberinfrastructure Tools for Precision Agriculture in the 21st - - PowerPoint PPT Presentation

cyberinfrastructure tools for precision agriculture in
SMART_READER_LITE
LIVE PREVIEW

Cyberinfrastructure Tools for Precision Agriculture in the 21st - - PowerPoint PPT Presentation

Cyberinfrastructure Tools for Precision Agriculture in the 21st Century Michela Taufer The University of Tennessee Knoxville Contributors and collaborators U. Delaware: Ricardo Llamas, Mario Guevara, and Rodrigo Vargas UTK: Danny


slide-1
SLIDE 1

Cyberinfrastructure Tools for Precision Agriculture in the 21st Century

Michela Taufer The University of Tennessee Knoxville

slide-2
SLIDE 2

Contributors and collaborators

  • U. Delaware: Ricardo Llamas, Mario Guevara, and

Rodrigo Vargas

  • UTK: Danny Rorabaugh, Kae Suarez, Leobardo

Valera, Ria Patel, and David Icove

  • ORNL: Jimmy Landmesser
  • UIUC: Craig Willis and Victoria Stodden

2

slide-3
SLIDE 3

Sponsors and supporters

  • NSF OAC 1854312 CIF21 DIBBs: PD: Cyberinfrastructure Tools for

Precision Agriculture in the 21st Century (PIs: Taufer and Vargas)

  • NSF OAC 1941443 EAGER: Reproducibility and Cyberinfrastructure

for Computational and Data-Enabled Science (PIs: Stodden and Taufer

  • IBM Shared University Research (SUR) Award
  • NSF XSEDE JetStream: Allocations EAR180011 and TRA180041

Many thanks to Jeremy Fischer, IU

3

slide-4
SLIDE 4

Multiscale computational modeling

Software ecosystem Time Length Time Length Scientific scales

M Stan, Material Today, 12, 2009, 20-28 https://ajw-group.mit.edu/multiscale-modeling-clays

slide-5
SLIDE 5

Multiscale data modeling (MSDM)

5

sec hour day cm m km

Scientific scales Time Software ecosystem Length

?

slide-6
SLIDE 6

Hidden (forgotten?) software ecosystem

“Only a small fraction of real-world ML systems is composed of the ML code” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips () Hidden Technical Debt in Machine Learning Systems

6

slide-7
SLIDE 7

Hidden (forgotten?) software ecosystem

“Only a small fraction of real-world ML systems is composed of the ML code” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips () Hidden Technical Debt in Machine Learning Systems

7

slide-8
SLIDE 8

Feature extraction

8

Linear regression: map ligands into 3-D point representation

  • M. Taufer, T. Estrada, and T. Johnston. Algorithms for In Situ Data Analytics of

Next Generation Molecular Dynamics Workflows. Numerical algorithms for high- performance computational science. Issue of Philosophical Transactions A., 2019. Protein-ligand docking

slide-9
SLIDE 9

Feature extraction

9

Linear regression: map ligands into 3-D point representation Numerical analyses: map secondary structures into eigenvalues

  • M. Taufer, T. Estrada, and T. Johnston. Algorithms for In Situ Data Analytics of

Next Generation Molecular Dynamics Workflows. Numerical algorithms for high- performance computational science. Issue of Philosophical Transactions A., 2019. Protein-ligand docking Protein folding

slide-10
SLIDE 10

Feature extraction

10

Linear regression: map ligands into 3-D point representation Numerical analyses: map secondary structures into eigenvalues Deep leaning: map both secondary and ternary structures into tensors

  • M. Taufer, T. Estrada, and T. Johnston. Algorithms for In Situ Data Analytics of

Next Generation Molecular Dynamics Workflows. Numerical algorithms for high- performance computational science. Issue of Philosophical Transactions A., 2019. Protein-ligand docking Protein folding Protein engineering

slide-11
SLIDE 11

Hidden (forgotten?) software ecosystem

“Only a small fraction of real-world ML systems is composed of the ML code” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips () Hidden Technical Debt in Machine Learning Systems

11

slide-12
SLIDE 12

Data collection at the edge

12

Point Field Measurements

slide-13
SLIDE 13

Data collection at the edge

13

Point Field Measurements Remote Sensor Measurements

slide-14
SLIDE 14

Challenges in MMDM

  • Design and implement robust and sustainable

software ecosystems

  • Combine analytics and computing across

heterogenous platforms (i.e., HPC, Cloud, and edge computing)

  • Build trust in results through reproducibility,

replicability, and transparency (RRT)

14

slide-15
SLIDE 15

Relevance of soil moisture data

  • Satellite-borne remote sensing

technology § Infrared to radio § Active and passive

15 Precision agriculture

Environmental sciences

slide-16
SLIDE 16

Workflows for precision agriculture

16

Data Generation M u l t i s c a l e d a t a Data Analytics

A4MD analytics A4MD analytics Analytics

representations + algorithms

Data prediction Fine-grained, complete data Coarse-grained, incomplete data

Weather Data NOAA

Landscape Surface DSM Soil Moisture ESA-CCI

Computation

Soil moisture leveraged for:

  • Environmental

sciences

  • Precision agriculture

Application Data Feedback

16

Weather Data NOAA

Landscape Surface DSM Fine-grained Soil Moisture

slide-17
SLIDE 17

Collaborators: Rodrigo Varga’s Group (UD) Platform: NSF XSEDE Jetstream NSF OAC 1854312 CIF21 DIBBs: PD: Cyberinfrastructure Tools for Precision Agriculture in the 21st Century

Design and implement a software ecosystem for precision agriculture

17

slide-18
SLIDE 18

Weather Data NOAA

Landscape Surface DSM Fine-grained Soil Moisture

Data analytics for soil moisture

18

Data Generation S a t e l l i t e a n d s e n s

  • r

s Data Analytics

A4MD analytics A4MD analytics Analytics

representations + algorithms

Data prediction Fine-grained, complete data Coarse-grained, incomplete data

Weather Data NOAA

Landscape Surface DSM Soil Moisture ESA-CCI

Computation

Soil moisture leveraged for:

  • Environmental

sciences

  • Precision agriculture

Application Data Feedback

18

slide-19
SLIDE 19

Challenge 1: incomplete soil moisture data (I)

(Liu et al. 2011 HESS, Liu et al. 2012 RSE)

Visualization example of the ESA-Climate Change Initiative Soil Moisture database with a coarse pixel size of 27x27km

19

Satellites collect raster data across the surface

  • f the Earth
slide-20
SLIDE 20
  • Dec. 2000 Average

Soil Moisture (m3/m3)

20

ESA-CCI soil moisture database, http://www.esa-soilmoisture-cci.org

Causes of missing data:

  • snow/ice cover
  • frozen surface
  • dense vegetation
  • extremely dry surface

Challenge 1: incomplete soil moisture data (II)

20

slide-21
SLIDE 21

Challenge 2: coarse-grained soil moisture data (I)

Original Resolution 27 km × 27 km Desired Resolution 1 km × 1 km

21

Image source: McPherson et al., Using coarse-grained occurrence data to predict species distributions at finer spatial resolutions—possibilities and limitations, Ecological Modeling 192:499–522, 2006.

21

slide-22
SLIDE 22

Challenge 2: coarse-grained soil moisture data (II)

Original product ESA CCI (m3 m-3, mean 2013) 27 x 27 km of spatial resolution 15 x 15 km of spatial resolution

  • M. Guevara , M. Taufer, and R. Vargas. Gap-Free Annual Soil Moisture Global

across 15km Grids: 1991-2016. Earth System Science Data, 2019.

22

slide-23
SLIDE 23

Integration of multiscale data: from satellites …

23

R Llamas, M Guevara, D Rorabaugh, M Taufer, R Vargas. Spatial Gap-Filling of ESA CCI Satellite-Derived Soil Moisture based on Geostatistical Techniques and Multiple Regression. Remote Sensing, 2020.

Region of interest Satellite data

slide-24
SLIDE 24

Terrain parameters

Global Historical Climatology Network (GHCN) and other local data (field measurements)

Region of interest Satellite data

… to terrain, climate, and weather data

R Llamas, M Guevara, D Rorabaugh, M Taufer, R Vargas. Spatial Gap-Filling of ESA CCI Satellite-Derived Soil Moisture based on Geostatistical Techniques and Multiple Regression. Remote Sensing, 2020.

24

slide-25
SLIDE 25

Example of terrain parameters: water wetness index

Shaw et al., 2016 GRL, Moore 2012, Geomorphology.

25

slide-26
SLIDE 26

SOMOSPIE: SOil MOisture SPatial Inference Engine

26

Region selection ML-based software suite

RF

Feature extraction Analysis tools

kNN HYPPO RF

Ecoregion

KKNN

Data collection Data storage <lang., long., sm> Predictions

Satellite data

d d predictions

  • bservations

predictions

  • bservations

d

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-27
SLIDE 27

SOMOSPIE: SOil MOisture SPatial Inference Engine

27

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

Region selection ML-based software suite

RF

Feature extraction Analysis tools

kNN HYPPO RF

Ecoregion

KKNN

Data collection Data storage <lang., long., sm> <x1, x2, … , xn> Predictions

Terrain parameters Satellite data

d d predictions

  • bservations

predictions

  • bservations

d

slide-28
SLIDE 28

Region selection: format of regions of interest

("NEON", "Mid Atlantic") ("CEC", "8.5.1") ("BOX", "-77_-75_37_40") ("STATE", "Delaware")

28

Longitude Latitude Longitude Latitude Longitude Latitude Longitude Latitude

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-29
SLIDE 29

Algorithmic solutions: ML-based software suite

29 KKNN:

à Use local data à Compute k and distance kernel using cross validation automatically à Compute weighted means with the kernel (many values)

Surrogate based model (SBM):

à Use all sampled data à Use regression to generate one single polynomial model (single polynomial model)

Random Forest

à Compute weighted mean of 500 prediction trees

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-30
SLIDE 30

Algorithmic solutions: ML-based software suite

30 KKNN:

à Use local data à Compute k and distance kernel using cross validation automatically à Compute weighted means with the kernel (many values)

Surrogate based model (SBM):

à Use all sampled data à Use regression to generate one single polynomial model (single polynomial model)

Random Forest

à Compute weighted mean of 500 prediction trees

HYPPO (Hybrid Piecewise Polynomial Modeling): à Use local data à Determine local polynomial degree using cross validation à Use regression to generate local polynomial model (many polynomial models) 1 2 3 4

1 2 3 3 3 4 4 2

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-31
SLIDE 31

Computational solutions: Jupyter + XSEDE Jetstream

31

slide-32
SLIDE 32

Computational solutions: Jupyter + XSEDE Jetstream

32

slide-33
SLIDE 33

Computational solutions: Jupyter + XSEDE Jetstream

33

slide-34
SLIDE 34

Use case I: from 27x27km to 1x1km

34

Random Forest fine-grained predictions 1x1km) Original satellite data (27x27km) Longitude Longitude Latitude Latitude Soil moisture

Fine-grained modeling of Mid-Atlantic region in April 2017:

  • Terrain parameters: Elevation, Slope, and Wetness Index

Level III Ecoregions of the Continental United States (CEVLv3)

  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-35
SLIDE 35

Use case I: from 27x27km to 1x1km

35

HYPPO fine-grained predictions Original satellite data HYPPO polynomial degrees Longitude Latitude Longitude Latitude Longitude Latitude 0 1. 2. 3

Fine-grained modeling of Mid-Atlantic region in April 2017:

  • Terrain parameters: Elevation, Slope, and Wetness Index
  • D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A Modular

SOil MOisture SPatial Inference Engine based on Data Driven Decisions. eScinece 2019.

slide-36
SLIDE 36

Use case II: Local scale predictions - 1x1m resolution

36 Predictions

  • M. Guevara , M. Taufer, and R. Vargas. Gap-Free Annual Soil Moisture Global

across 15km Grids: 1991-2016. Earth System Science Data, 2019.

slide-37
SLIDE 37

Use case II: Local scale predictions - 1x1m resolution

37 Predictions Uncertainties

  • M. Guevara , M. Taufer, and R. Vargas. Gap-Free Annual Soil Moisture Global

across 15km Grids: 1991-2016. Earth System Science Data, 2019.

slide-38
SLIDE 38

Collabororator: David Icove’s group (UTK) Planform: Tellico cluster (IBM Power9 system) – supported by 2019 IBM Shared University Research (SUR) Award

Combine computing and analytics: integration of soil moisture predictions into controlled (or prescribed) burn

38

slide-39
SLIDE 39

Soil moisture data for simulating controlled burn

39

Data Generation M u l t i s c a l e d a t a Data Analytics

A4MD analytics A4MD analytics Analytics

representations + algorithms

Data prediction Fine-grained, complete data Coarse-grained, incomplete data

Weather Data NOAA

Landscape Surface DSM Soil Moisture ESA-CCI

Computation

Soil moisture leveraged for:

  • Controlled or

prescribed burn

Application Data Feedback

Weather Data NOAA

Landscape Surface DSM Fine-grained Soil Moisture

39

slide-40
SLIDE 40
  • Simulation of the 2016 Gatlinburg

wildfire

  • Software:

▪ Fire Dynamics Simulator (FDS) -

large-eddy simulation (LES) for low-speed flows

  • Platform:

▪ IBM Power9 cluster at UTK

  • Simulation specs:

▪ 120m⨯120m⨯100m domain ▪ 5 frames/sec temp. resolution

Elephant in the room: the soil moisture

Soil moisture is missing in FDS

Firestarting area

40

slide-41
SLIDE 41

FDS simulations Soil moisture layer - 1x1m

41

slide-42
SLIDE 42

FDS simulations Soil moisture layer - 1x1m

42

slide-43
SLIDE 43

FDS simulations Soil moisture layer - 1x1m

43

slide-44
SLIDE 44

FDS simulations Soil moisture layer - 1x1m

44

slide-45
SLIDE 45

Collaborator: Victoria Stodden’s Group (UIUC) Platform: NSF XSEDE Jetstream NSF OAC 1941443 EAGER: Reproducibility and Cyberinfrastructure for Computational and Data-Enabled Science

Build trust in results through reproducibility, replicability, and transparency

45

slide-46
SLIDE 46

Leveraging other NSF projects: Whole Tale

  • Building an open platform for computational reproducibility

Create and publish executable research objects ("Tales")

  • Simplify process of creating & verifying reproducible

computational artifacts for scientific discovery

Easy-to-access cloud- based computing environments Transparent access to research data Export and publish executable research

  • bjects

This material is based upon work supported by the National Science Foundation under Grant No. OAC-1541450

slide-47
SLIDE 47
slide-48
SLIDE 48

Capturing metadata

slide-49
SLIDE 49

Plug into SOMOSPIE GitHub Enable replicability of results

slide-50
SLIDE 50

https://github.com/TauferLab/SOMOSPIE/releases/latest