VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to - - PowerPoint PPT Presentation

virtual conference
SMART_READER_LITE
LIVE PREVIEW

VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to - - PowerPoint PPT Presentation

32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to Predictive Model Building for Undergraduate Researchers Hasthika Rupasinghe * Lasanthi Watagoda * Alan


slide-1
SLIDE 1

32nd International Conference on Technology in Collegiate Mathematics

ictcm.com | #ICTCM

VIRTUAL CONFERENCE

slide-2
SLIDE 2

A Unified Introduction to Predictive Model Building for Undergraduate Researchers

Hasthika Rupasinghe * Lasanthi Watagoda * Alan Arnholt

Appalachian State University

ICTCM 2020

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 1 / 14

slide-3
SLIDE 3

Outline

1 Problem 2 Our approach 3 Classroom trials 4 Structure

Guided Lab I: Data Cleaning Guided Lab II: Linear Model Fitting Guided Lab III: Non–Linear Model Fitting

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 2 / 14

slide-4
SLIDE 4

Problems

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

slide-5
SLIDE 5

Problems

Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

slide-6
SLIDE 6

Problems

Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks. One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

slide-7
SLIDE 7

Problems

Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks. One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed. A second challenge is the plethora of R packages and differing syntax among R packages one may choose to implement the numerous statistical learning algorithms.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

slide-8
SLIDE 8

Our approach

This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers:

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

slide-9
SLIDE 9

Our approach

This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

slide-10
SLIDE 10

Our approach

This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

slide-11
SLIDE 11

Our approach

This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax Practice reproducible research

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

slide-12
SLIDE 12

Note:

Instructors:

The material in this article is suitable for use in classes where the instructors have advanced degrees in statistics and experience using R in the classroom.

Students:

Must have some knowledge in linear regression models (for Lab II) and classification models (for Lab III).

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 5 / 14

slide-13
SLIDE 13

Classroom tested

The guided labs have been used with two undergraduate classes. These labs were implemented in the courses where the students were already using R, R Markdown, and had been exposed to ggplot2. Data Science II — STT 3860 where the students used the guided project also has as prerequisites:

a standard undergraduate (non-calculus based) introductory statistics course a data visualization and management course (Data Science I — STT 2860).

Statistical Data Analysis II (STT 3851) has a prerequisite Statistical Data Analysis I (STT 3850)

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 6 / 14

slide-14
SLIDE 14

Structure

The Guided Labs are hosted on the Rstudio cloud and on GitHub: Questioning and Cleaning the bodyfat data Lab:

GitHub repository rstudio.cloud project

Linear models with the bodyfat data Lab:

GitHub repository rstudio.cloud project

Non-linear models with the bodyfat data Lab:

GitHub repository rstudio.cloud project

Instructor manual

Instructors are welcome to email: hasthika@appstate.edu, lasanthi@appstate.edu or arnholtat@appstate.edu to get an instructor version of the labs.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 7 / 14

slide-15
SLIDE 15

Data

Boston Data

The Boston data set from the MASS package written by Ripley (2019) is used to illustrate various steps in predictive model building.

BodyFat Data

We use the data set provided in the article Fitting Percentage of Body Fat to Simple Body Measurements, Johnson (1996)

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 8 / 14

slide-16
SLIDE 16

Lab I: Questioning and Cleaning the Body Fat Data

Guided Lab I: Data Cleaning

https://rstudio.cloud/project/1164604

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14

slide-17
SLIDE 17

Lab I: Questioning and Cleaning the Body Fat Data

Guided Lab I: Data Cleaning

https://rstudio.cloud/project/1164604 The purpose of this activity is to have the reader critically question, evaluate, and clean the original BodyFat data.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14

slide-18
SLIDE 18

Lab II: Fitting Linear Regression Models to Body Fat Data

Guided Lab II: Linear Model Fitting

https://rstudio.cloud/project/323646

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14

slide-19
SLIDE 19

Lab II: Fitting Linear Regression Models to Body Fat Data

Guided Lab II: Linear Model Fitting

https://rstudio.cloud/project/323646 The purpose of this activity is to have the reader create several regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14

slide-20
SLIDE 20

Lab III: Fitting Non-Linear Regression Models to Body Fat Data

Guided Lab III: Non–Linear Model Fitting

https://rstudio.cloud/project/1169242

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14

slide-21
SLIDE 21

Lab III: Fitting Non-Linear Regression Models to Body Fat Data

Guided Lab III: Non–Linear Model Fitting

https://rstudio.cloud/project/1169242 The purpose of this activity is to have the reader create several non-linear regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14

slide-22
SLIDE 22

References

  • 1. Francis J. Anscombe, Graphs in statistical analysis, The American

Statistician, 27 (1973), 17-21.

  • 2. A. Azzalini and A.W. Bowman, A look at some data on the Old

Faithful geyser, Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366.

  • 3. P

. Bickel and J.W. O’Connell, Is there a sex bias in graduate admissions?, Science, 187 (1975), 398-404.

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 12 / 14

slide-23
SLIDE 23

Thank You!

Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 13 / 14

slide-24
SLIDE 24

32nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE

#ICTCM

Contact Information

Names: Hasthika Rupasinghe, Lasanthi Watagoda and Alan Arnholt Titles: Dr., Dr. and Dr. Institution: Appalachian State University, Boone, NC

Emails: hasthika@appstate.edu, lasanthi@appstate.edu and arnholtat@appstate.edu