1
Reproducibility 1 Good practice Workshop 3 2 Aim In this session - - PowerPoint PPT Presentation
Reproducibility 1 Good practice Workshop 3 2 Aim In this session - - PowerPoint PPT Presentation
1 Reproducibility 1 Good practice Workshop 3 2 Aim In this session you will practice creating reproducible work. Your success in achieving this will allow you easily recycle code for the assessment. Objectives By following the slides and
2
Aim
In this session you will practice creating reproducible work. Your success in achieving this will allow you easily recycle code for the assessment.
Objectives
By following the slides and applying the techniques to the workshop examples the successful student will be able to:
- explain the importance and principles of reproducible research
- follow good programming practice in terms of directory structure, code
formatting and design, variable naming and comments
- design reproducible analyses and evaluate their success in doing so
Rationale: Extension of scientific good practice
3
Reproducible: scripting Repeatable: protocol, lab book
Explanatory variables
Choose / set / manipulate
Experiments
(tests of ideas)
Response variables
measure
Experimental design Analyse Visualise Interpret and report
4
Extension of scientific good practice
Lab book for computational work Readability - Could future you or others understand what you did and why? Could you repeat? Reproducibility - Could you (or others) recreate everything from data import to results communication? Reproducibility plus - Could you track development of reproducible work
5
Reproducibility
Is best practice, important in research collaboration and mandatory in many industry settings Will likely become mandatory for science publication and funding Will ultimately make your life much easier Requires time, diligence and practice Some reproducibility is better than none Has ‘impact’
6
Reproducibility continuum
Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility
7
Reproducibility continuum
Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility
Required here Today Next week
8
Organise your files
9
Reproducibility: Script everything
Write everything down. Already introduced: Getting started in every prac
1. Start RStudio. 2. Make a new script file called workshop1.R 3. Set your working directory (to script file location?) - in the script
10
Reproducibility: Organisation and Comments
For .R scripts Use plain text format (easy in RStudio) Divide the script into sections Use comments extensively Use space extensively
11
Reproducibility: Code
“The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code” names(pigeon)[1] <- "interorbital" hist(pigeon$interorbital, xlim = c(8, 14), main = NULL, xlab = "Width (mm)", ylab = "Number of pigeons", col = "grey")
Formatting and style Most important: be consistent Spaces after commas, around
- perators (except :)
Indentation of blocks, layout Limit width
names(pigeon)[1]<-"interorbital" hist(pigeon$interorbital,xlim=c(8,14),main=NULL,xlab="Width (mm)",ylab="Number of pigeons",col="grey") http://yihui.name/formatR/ Yihui Xie (2016) http://adv-r.had.co.nz/Style.html
12
Reproducibility: Code
Emulate others! Formatting and style Naming “There are only two hard things in Computer Science: cache invalidation and naming things.” Files: localisation.R Variables: lowercase, meaningful, use _ between words max_value Most important: be consistent
13
Reproducibility: Code
‘Algorithmically’ / ‘algebraically’ ‘superparametrically Code which expresses the structure of the problem/solution
> sum(3, 5, 6, 7, 8) / 5 [1] 5.8 > (3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2 [1] 14.8
> x <- c(3, 5, 6, 7, 8) > aver <- sum(x) / length(x) > sum((x - aver)^2) [1] 14.8
Citing packages
- Packages should be cited – and R helps with that too:
> citation("MASS") To cite the MASS package in publications use: Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 > citation("ggplot2") To cite the ggplot2 in publications, please use:
- H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York,
2009.
14
15
Reproducibility continuum
Organise your files Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code Increasing reproducibility