knitr and R markdown from .Rnw, .Rmd and .R files

22 Jan 2017 | all notes

Sweave, knitr and R markdown are three iterations of a simple tool that allows you to easily generate reports containing R code, which gets automatically executed and included in the output report alongside any figures produced by the code. The canonical case is to generate (LaTeX-based) pdf’s from R, although other programming languages as well as output formats are supported as well.

The easiest way to install the tool(s) in R is by running

install.packages("rmarkdown")

This automatically installs knitr as a dependency.

While the report generation process (also known as ‘knitting’ or rendering) is pretty much the same in all cases, there are three different syntaxes/markup styles to choose from when writing your report and embedding the R code, the choice of which will depend on your needs and preferences:

.Rnw

.Rnw files are essentially LaTeX files interspersed with R code chunks which get executed when you convert them to .tex files. (If you wish to convert your report on to a standalone pdf your source file needs to have a full LaTeX preamble etc.) Using the .Rnw format not only allows you to keep full control over the formatting, you can also generate .tex files not containing a preamble, which you can then \include{} in bigger multi-file LaTeX documents you might be working on.

The R code chunk syntax in .Rnw files looks as follows:

<<chunkname, fig.height=5, fig.width=5>>=
# R code goes here
samples <- rnorm(30)
hist(samples) # draw histogram
@

All options (like fig.height) that can be passed to a code chunk are documented here. If you provide a figure caption using fig.cap then any figures are automatically given LaTeX labels based on the chunk names that you can use to refer to them from elsewhere in your document, such as in Figure~\ref{fig:chunkname}. You can also easily include the values from R variables that were set by code in any preceding code chunk in your LaTeX document, e.g. to display the value of the first sample generated above you simply put the expression \Sexpr{samples[1]} where you want the value to be printed in your LaTeX document.

Calling

library(knitr)
knit('filename.Rnw')

will execute the code chunks and produce an output file filename.tex, alongside any figures that are created by the code (which are put in a subdirectory of the location of the .Rnw file).

For more elaborate examples of what you can do in code chunks see the knitr demo page (as well as the different code engines for how to automatically run code chunks of other languages such as python).

.Rmd

.Rmd files work similar to .Rnw files above, except you write your basic file in Markdown syntax rather than straight LaTeX, where you include larger code chunks using the syntax

```{r chunkname, fig.height=5, fig.width=5}
# R code goes here
samples <- rnorm(30)
hist(samples) # draw histogram
```

and smaller, in-line code references (a la \Sexpr{} above) by inserting `{r} samples[1]`.

Calling

library(rmarkdown)
render('filename.Rmd')

will by default simply execute the code chunks and render the whole report to a .html output file, but the output format can be controlled in two ways: the first is to pass the requested output format (see here for supported formats) to render() via the output_format option.

The other, more powerful way is to include a YAML header like this in the .Rmd file, which allows you to control not only the output format but also many other properties of the generated report:

---
title: "The title of my report: so good"
author: "Author name(s)"
output: pdf_document
---

Output formats other than markdown and html typically require other dependencies, especially pandoc. Many examples and information on R markdown and available rendering engines (as well as many other functionalities such as bibliography generation) can be found on the R markdown website.

Caveat: the default LaTeX report style used by the pdf_document output format makes use a couple of LaTeX packages which are not necessarily part of every LaTeX installation, in my case I still needed to install the texlive-framed and textlive-titling packages to be able to build the pdf reports (without fiddling with knitr to use different styles, which is also possible). In case any .sty files are missing render() will simply spit out the corresponding LaTeX error messages.

.R

Creating R markdown reports straight from .R files kind of turns the whole idea of it upside down: instead of writing markup documents that contain R code chunks, in this case you are simply writing an R code file, with additional Markdown or LaTeX that you put in the comments of your R code. This format is primarily useful when you’re writing a report that has a lot of code with relatively little description in between, and when you want to be able to do stuff like load the entire R script using source() (in which case the markup will simply be ignored because it’s commented out). The general rule is that the content of comment lines that start with #' (note the apostrophe) will be rendered as markdown, while normal comments starting simply with # are considered part of the code chunks.

In this format code chunks don’t have to be started (or named) explicitly, although options can still be passed to individual chunks by using the #+ name, options... syntax, e.g.

#' ---
#' title: The YAML header also has to be commented like this
#' ---

# This comment is just a normal R comment and will be rendered in the
# output report as part of the code, i.e. this won't look good: $x_i$

#' This one here on the other hand will be rendered in LaTeX: $x_i$

#+ chunkname, fig.cap="Here's a damn fine histogram of some data"
hist(rnorm(30))

#+ dummychunktohidethecomments, echo=FALSE
# because of echo=FALSE, any comments or code until the start of the
# next chunk are not shown in the output document

Report creation works in the same way as with the .Rmd format, i.e.

library(rmarkdown)
render('filename.R')

and can be controlled by all the same YAML header options.

Comments