By Olga Dethlefsen
If it is not reproducible it is not scientific. A few words on reproducibility and a short intro to Knitr using RStudio.
Automated software execution, version control systems, software frameworks and containers, literate programming, workflow management systems to name just few. Reproducible research is finally entering bioinformatics world. And rightly so! The ability to duplicate the study by an independent person is among the main principles of the scientific methods. In bioinformatics, this can be simply translated into being able to obtain the same results, given access to the original data and computational methods, e.g. scripts.
Advantages? Plenty. Reproducibility provides insight into all computational procedures. It empowers others to build upon methods and results. It is essential to be able to take bioinformatics findings seriously, to accept them as scientific observations. Challenges? Many. The rapidly changing bioinformatics world of ample data formats, technologies, tools, pipelines and complex interdisciplinary solutions does not make striving for reproducibility easy. Neither the pressure to get the research out while relevant leaves plenty of room for ensuring results reproducibility.
While indeed a lot of work lies ahead of us before we can set the reproducibility standards, there are already good practices to choose from for those wanting to do bioinformatics right. Literate programming, that is interspersing text, execution code and output within the same document, is a good start. As with anything in bioinformatics, there is more than one way to work with these dynamics reports. R Markdown and Jupiter notebooks have been gaining popularity due to the easiness of use and high functionality. For those using R on a daily basis and wanting a precise control of document for writing journal articles, reports, thesis or books, LaTex together with Knitr R package offers a professional way to go.
It is not news but an unfortunate fact that computational findings often cannot be reproduced, falling short of scientific standards. Hence any efforts to promote reproducibility are welcome. And why not start with one’s own work by using literate programming? For those interested in Knitr one way to get going is shown below. For everyone else out there passionate about bioinformatics: please help us make bioinformatics a true scientific discipline by ensuring reproducibility.
By default R Studio is set-up to weave .Rna files using older Sweave not Knitr. Open RStudio and change to Weave Rna files using knitr under RStudio -> Preferences -> Sweave
Open a new file by selecting File -> New File -> R Sweave
This opens a new file with .Rnw extension, which stands for “R noweb”. Noweb is a literate programming tool created by Norman Ramsey. The input contains program source code mixed with documentation in forms of so-called chunks, either documentation chunks or code chunks
By default R Studio opens an empty article class with three lines.
Type anything in LaTex between begin and end lines, e.g.
If you are used to LaTex you will recognize the commands immediately. The good news is that you can configure .Rnw using all LaTex functionality. And if you are new to LaTex, well, there is plenty to learn and a good place to start could be for example on the latex project website following this tutorial or this one…
Press the “Compile PDF” button found on the top of the window or alternatively under File -> Compile PDF. This prompts you to name your project and to save it under a selected location. This also starts the process of “weaving” a .Rnw file by executing R code and inserting it into the LaTex document, enabling the creation of dynamic reports that are updated automatically if data or analyses change. A successful weaving will first produce a .tex file with evaluated and inserted R code and outputs. Secondly, based on .tex file, a .pdf file will be created which appears as a preview with R Studio, given default R Studio settings. Not shown? Check RStudio -> Preferences -> Sweave -> PDF preview options.
There was no R code to weave in the above example but it was a good way to check whether LaTex and R Studio setup are correct. Now we are ready to embed the computational chunks with R code into the document.
Code chunks begin with «»= and end with @ like this
In the chunk body a standard R language syntax follows. Here is a chunk with a standard linear regression analysis
This adds the following to the .pdf report pretty much in a same way as printing the R output to the console.
Plots can be generated within code chunks as usual in R
The same goes for tables. Here is an example using xtable package to control the table display options
The advantage of using Latex and Knitr is that there are no limitations to what can be done. One can use all advanced LaTex functionalities and control the output chunks in a great detail. The Knitr chunk options are a natural progression to the ones used in R Markdown and you can read more about them on the Knitr website. The minimal code examples to get going can be also found there.
P.S. The tutorial was written using