Reproducibility 1 Good practice Workshop 3 2 Aim In this session - - PowerPoint PPT Presentation

reproducibility 1 good practice
SMART_READER_LITE
LIVE PREVIEW

Reproducibility 1 Good practice Workshop 3 2 Aim In this session - - PowerPoint PPT Presentation

1 Reproducibility 1 Good practice Workshop 3 2 Aim In this session you will practice creating reproducible work. Your success in achieving this will allow you easily recycle code for the assessment. Objectives By following the slides and


slide-1
SLIDE 1

1

Reproducibility 1 Good practice

Workshop 3

slide-2
SLIDE 2

2

Aim

In this session you will practice creating reproducible work. Your success in achieving this will allow you easily recycle code for the assessment.

Objectives

By following the slides and applying the techniques to the workshop examples the successful student will be able to:

  • explain the importance and principles of reproducible research
  • follow good programming practice in terms of directory structure, code

formatting and design, variable naming and comments

  • design reproducible analyses and evaluate their success in doing so
slide-3
SLIDE 3

Rationale: Extension of scientific good practice

3

Reproducible: scripting Repeatable: protocol, lab book

Explanatory variables

Choose / set / manipulate

Experiments

(tests of ideas)

Response variables

measure

Experimental design Analyse Visualise Interpret and report

slide-4
SLIDE 4

4

Extension of scientific good practice

Lab book for computational work Readability - Could future you or others understand what you did and why? Could you repeat? Reproducibility - Could you (or others) recreate everything from data import to results communication? Reproducibility plus - Could you track development of reproducible work

slide-5
SLIDE 5

5

Reproducibility

Is best practice, important in research collaboration and mandatory in many industry settings Will likely become mandatory for science publication and funding Will ultimately make your life much easier Requires time, diligence and practice Some reproducibility is better than none Has ‘impact’

slide-6
SLIDE 6

6

Reproducibility continuum

Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility

slide-7
SLIDE 7

7

Reproducibility continuum

Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility

Required here Today Next week

slide-8
SLIDE 8

8

Organise your files

slide-9
SLIDE 9

9

Reproducibility: Script everything

Write everything down. Already introduced: Getting started in every prac

1. Start RStudio. 2. Make a new script file called workshop1.R 3. Set your working directory (to script file location?) - in the script

slide-10
SLIDE 10

10

Reproducibility: Organisation and Comments

For .R scripts Use plain text format (easy in RStudio) Divide the script into sections Use comments extensively Use space extensively

slide-11
SLIDE 11

11

Reproducibility: Code

“The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code” names(pigeon)[1] <- "interorbital" hist(pigeon$interorbital, xlim = c(8, 14), main = NULL, xlab = "Width (mm)", ylab = "Number of pigeons", col = "grey")

Formatting and style Most important: be consistent Spaces after commas, around

  • perators (except :)

Indentation of blocks, layout Limit width

names(pigeon)[1]<-"interorbital" hist(pigeon$interorbital,xlim=c(8,14),main=NULL,xlab="Width (mm)",ylab="Number of pigeons",col="grey") http://yihui.name/formatR/ Yihui Xie (2016) http://adv-r.had.co.nz/Style.html

slide-12
SLIDE 12

12

Reproducibility: Code

Emulate others! Formatting and style Naming “There are only two hard things in Computer Science: cache invalidation and naming things.” Files: localisation.R Variables: lowercase, meaningful, use _ between words max_value Most important: be consistent

slide-13
SLIDE 13

13

Reproducibility: Code

‘Algorithmically’ / ‘algebraically’ ‘superparametrically Code which expresses the structure of the problem/solution

> sum(3, 5, 6, 7, 8) / 5 [1] 5.8 > (3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2 [1] 14.8

> x <- c(3, 5, 6, 7, 8) > aver <- sum(x) / length(x) > sum((x - aver)^2) [1] 14.8

slide-14
SLIDE 14

Citing packages

  • Packages should be cited – and R helps with that too:

> citation("MASS") To cite the MASS package in publications use: Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 > citation("ggplot2") To cite the ggplot2 in publications, please use:

  • H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York,

2009.

14

slide-15
SLIDE 15

15

Reproducibility continuum

Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility

Required here