Skip to contents

pkgpurl facilitates R package authoring using a literate programming approach. The main idea behind this is to write all of the R source code in R Markdown files (Rmd/*.Rmd), which allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format. The main object of pkgpurl is to provide a standardized way to compile the bare .R files from the prose-enhanced and thus more human-oriented .Rmd files.

The basic idea behind the concept this package implements originates from Yihui Xie. See his blog post Write An R Package Using Literate Programming Techniques for more details, it’s definitively worth reading. This package’s function pkgpurl::purl_rmd() is just a less cumbersome alternative to the Makefile approach outlined by him.

Pros and cons

The R Markdown format provides several advantages over the bare R source format when developing an R package:

👍 Mix Markdown and Code

It allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format instead of having to rely on # comments only. In general, this should encourage to actually record code-accompanying information because you’re able to use the full spectrum of Pandoc’s Markdown syntax like inline formatting, lists, tables, quotations or math1.

It is especially powerful in combination with the Visual R Markdown feature introduced in RStudio 1.4, which – in addition to the visual editor – offers a feature whose utility can hardly be overestimated: Pandoc Markdown canonicalization (on file save2). For example, it allows paragraphs being wrapped automatically at the desired line width; or to write a minimal sloppy pipe table that is automatically normalized to a beautifully formatted and actually readable one.

The relevant editor options which adjust the canonical Markdown generation can either be set

  • per .Rmd file, e.g.

    ---
    editor_options:
      markdown:
        wrap: 160
        references:
          location: section
        canonical: true
    ---
  • or per project in the usual PACKAGE_NAME.Rproj file, e.g.

    MarkdownWrap: Column
    MarkdownWrapAtColumn: 160
    MarkdownReferences: Section
    MarkdownCanonical: Yes

    (I’d recommend to set them per project, so they apply to the whole package including any .Rmd vignettes.)

👍 All your code in a single, well-structured file

The traditional recommendation to not lose overview of your package’s R source code is to split it over multiple files. The popular (and very useful) book R Packages gives the following advice:

If it’s very hard to predict which file a function lives in, that suggests it’s time to separate your functions into more files or reconsider how you are naming your functions and/or files.

I think this is just ridiculous.

Instead, I encourage you to keep all your code (as far as possible) in a single file Rmd/PACKAGE_NAME.Rmd and structure it according to the rules described here, which even allows the pkgdown Reference: index to be automatically in sync with the source code structure. As a result, you re-organize (and thus most likely improve) your package’s code structure whenever you intend to improve the pkgdown reference – and vice versa. For a basic example, see this very package’s main source file.

Keeping all code in a single file frees you from the traditional hassle of finding a viable (but in the end still unsatisfactory) way to organize your R source code across multiple files. Of course, there are still good reasons to outsource code into separate files in certain situations, which nothing is stopping you from doing. You can also exclude whole .Rmd files from purling using the .nopurl.Rmd filename suffix.

👍 Improved overview and navigation

You can rely on RStudio’s code outline to easily navigate through longer .Rmd files. IMHO it provides significantly better usability than the code section standard of .R files. It makes it easy to find your way around source files that are thousands of lines long.

RStudio’s Go to File/Function shortcut works the same for .Rmd files as it does for .R files.

👍 Improved visual clarity

If you use RStudio or any other editor with proper R Markdown syntax highlighting, you will probably like the gained visual clarity for distinguishing individual functions/code parts (by putting them in separate R code chunks). This also facilitates creating a meaningful document structure (in Markdown) alongside the actual R source code.

👍 Easily toggle code inclusion

You can put development-only code which never lands in the generated R source files (and thus the R package) in separate code chunks with the chunk option purl = FALSE. This turns out to be very convenient in certain situations.

For example, this is a good way to reproducibly document the generation of cleaned versions of exported data as well as internal data. This avoids having to outsource the code to separate files under data-raw/ and adding the directory to .Rbuildignore, i.e. no need to use usethis::use_data_raw(). Instead, you just set purl = FALSE for the relevant code chunk(s). You can (and should) still use usethis::use_data() (optionally with overwrite = TRUE) to generate the files under data/ holding external package data as well as the R/sysdata.rda file (using internal = TRUE) holding internal package data.

👍 Easily toggle styler

If you use styler to auto-format your code globally by setting knitr::opts_chunk$set(tidy = "styler"), you can still opt-out on a per-chunk basis by setting tidy = FALSE. This gives pleasant flexibility.

Unfortunately, there are also a few notable drawbacks of the R Markdown format:

👎 Additional workflow step

The pkgpurl approach on writing R packages in the R Markdown format introduces one additional step at the very beginning of typical package development workflows: Running pkgpurl::purl_rmd() to generate the R/*.gen.R files from the original Rmd/*.Rmd sources before documenting/checking/testing/building the package. Given sufficient user demand, this could probably be integrated into devtools’ functions in the future, so that no additional action has to be taken by the user when relying on RStudio’s built-in package building infrastructure.

For the time being, it’s recommended to set up a custom shortcut3 for one or both of pkgpurl::purl_rmd() and pkgpurl::process_pkg() which are registered as RStudio add-ins.

👎 Differing setup

Setting up a new project to write an R package in the R Markdown differs slightly from the classic approach. A suitable convenience function like create_rmd_package() to set up all the necessary parts could probably be added to usethis in the future.

For the time being, you can use my ready-to-go R Markdown Package Development Template as a starting point for creating new R packages in the R Markdown format.

👎 Unwieldy debugging

Debugging can be a bit more laborious since line numbers in warning and error messages always refer to the generated R/*.gen.R file(s), not the underlying Rmd/*.Rmd source code file(s). If need be, you first have to look up the line numbers in the R/*.gen.R file(s) to understand which function / code parts cause the issue in order to know where to fix it in the Rmd/*.Rmd source(s).

👎 Missing roxygen2 auto-completion

Other than in .R files, RStudio currently doesn’t support auto-completion of roxygen2 tags in .Rmd files and its Reflow Comment command doesn’t properly work on them. These are known issues which will hopefully be resolved in the near future.

Installation

To install the latest development version of pkgpurl, run the following in R:

if (!("remotes" %in% rownames(installed.packages()))) {
  install.packages(pkgs = "remotes",
                   repos = "https://cloud.r-project.org/")
}

remotes::install_gitlab(repo = "rpkg.dev/pkgpurl")

Usage

The (function) reference is found here.

Package configuration

Some of pkgpurl’s functionality is controlled via package-specific global configuration which can either be set via R options or environment variables (the former take precedence). This configuration includes:

Description R option Environment variable Default value
Whether or not to add a copyright notice at the beginning of the generated .R files as recommended by e.g. the GNU licenses. The notice consists of the name and description of the program and the word Copyright (C) followed by the release years and the name(s) of the copyright holder(s), or if not specified, the author(s). The year is always the current year. All the other information is extracted from the package’s DESCRIPTION file. pkgpurl.add_copyright_notice R_PKGPURL_ADD_COPYRIGHT_NOTICE
Whether or not to add a license notice at the beginning of the generated .R files as recommended by e.g. the GNU licenses. The license is determined from the package’s DESCRIPTION file and currently only the AGPL-3.0-or-later license is supported. pkgpurl.add_license_notice R_PKGPURL_ADD_LICENSE_NOTICE
Whether or not to overwrite pkgdown’s reference index in the configuration file _pkgdown.yml with an auto-generated one based on the main input file as described in pkgpurl::gen_pkgdown_ref(). pkgpurl.gen_pkgdown_ref R_PKGPURL_GEN_PKGDOWN_REF

Development

R Markdown format

This package’s source code is written in the R Markdown file format to facilitate practices commonly referred to as literate programming. It allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format instead of having to rely on # comments only.

All the .gen.R suffixed R source code found under R/ is generated from the respective R Markdown counterparts under Rmd/ using pkgpurl::purl_rmd()4. Always make changes only to the .Rmd files – never the .R files – and then run pkgpurl::purl_rmd() to regenerate the R source files.

Coding style

This package borrows a lot of the Tidyverse design philosophies. The R code adheres to the principles specified in the Tidyverse Design Guide wherever possible and is formatted according to the Tidyverse Style Guide (TSG) with the following exceptions:

  • Line width is limited to 160 characters, double the limit proposed by the TSG (80 characters is ridiculously little given today’s high-resolution wide screen monitors).

    Furthermore, the preferred style for breaking long lines differs. Instead of wrapping directly after an expression’s opening bracket as suggested by the TSG, we prefer two fewer line breaks and indent subsequent lines within the expression by its opening bracket:

    # TSG proposes this
    do_something_very_complicated(
      something = "that",
      requires = many,
      arguments = "some of which may be long"
    )
    
    # we prefer this
    do_something_very_complicated(something = "that",
                                  requires = many,
                                  arguments = "some of which may be long")

    This results in less vertical and more horizontal spread of the code and better readability in pipes.

  • Usage of magrittr’s compound assignment pipe-operator %<>% is desirable5.

  • Usage of R’s right-hand assignment operator -> is not allowed6.

  • R source code is not split over several files as suggested by the TSG but instead is (as far as possible) kept in the single file Rmd/pkgpurl.Rmd which is well-structured thanks to its Markdown support.

As far as possible, these deviations from the TSG plus some additional restrictions are formally specified in pkgpurl::default_linters, which is (by default) used in pkgpurl::lint_rmd(), which in turn is the recommended way to lint this package.