BioConductor

Introduction

BioConductor is a repository of R packages that facilitates rigorous and reproducible analysis of data from current and emerging biological assays.

BioConductor delivers releases where a set of packages is published at once and intended for compatibility only with a certain version of R. This is in contrast to CRAN where packages are added continuously with no reference to particular versions of R. Additionally, BioConductor also comes with its own installation tool, BiocManager::install().

Problem Statement

Given the structural differences between BioConductor and CRAN repositories, it is not straightforward to work with both types. This applies to BioConductor package installation, especially when using renv, but also when publishing apps and documents on Connect that use BioConductor packages.

This document details various scenarios for working with BioConductor and its packages:

  • Installing BioConductor packages via BiocManager::install(), install.packages(), and via renv()
  • Publishing content to Connect that makes use of BioConductor packages

Prerequisites

The following prerequisites are needed for the below to work:

  • A version of R installed
  • CRAN repository configured and reported by options()$repos
  • BiocManager package installed, either at user level or as part of the system default packages
  • Subsequently, any reference to “R profile” implies that the setting can be put either in Rprofile.site (global setting) or .Rprofile (user or project level setting).

Installing and working with packages

Public CRAN and BioConductor repositories

The BioConductor way

As per BioConductor, any BioConductor and CRAN package can be installed via

BiocManager::install("PackageName")

Publishing Shiny Apps that make use of BioConductor packages to Connect is not possible for this setup. BiocManager::install() temporarily adds the BioConductor repository for the duration of the install process. During the publishing process rsconnect no longer has any knowledge about BioConductor.

The CRAN way

install.packages() by default is restricted to CRAN repositories only. BioConductor packages can be installed via install.packages() when setting

options(repos=c(BiocManager::repositories()))

in the R profile.

Note

The above setting is independent of the R version and will always use the most recent BioConductor release compatible with your R Version. Users that want to use a specific BioConductor release need to set this version as version parameter in BiocManager::repositories(), e.g. version="3.13".

Publishing Shiny Apps that make use of BioConductor packages to Connect is perfectly fine for this setup.

Using renv

The general workflow is described on the renv webpage. By default renv::init() will only pick up packages from CRAN. In order to make it also use BioConductor packages, you need to add bioconductor=TRUE as a parameter, i.e. 

renv::init(bioconductor=TRUE)

which will use the most recent BioConductor release compatible with your R Version.

Note

In case you would like to use a different BioConductor release, replace TRUE with the BioConductor version string, e.g. bioconductor="3.13".

Any renv initialised in such a way can be restored with renv::restore() and only uses the information in renv.lock. Users that are interested in the details will realise that the BioConductor version is defined in renv.lock, e.g. 

  "Bioconductor": {
    "Version": "3.14"
  },

Publishing Shiny Apps that make use of BioConductor packages to Connect will only work if you again add

options(repos=c(BiocManager::repositories()))

to your R profile.

Public Package Manager

The Public Package Manager is a service provided by Posit. It mirrors both CRAN and BioConductor repositories. In addition it provides time-based snapshots for CRAN similar to MRAN but in addition offers package binaries for many Enterprise Linux distributions. Additionally the repository URLs can be made immutable against any future change in package metadata for maximum reproducibility.

In order to use CRAN and BioConductor packages from Public Package Manager, you will need to point both the CRAN repo definition and the BioConductor mirror to the one from Oublic Package Manager via

options(repos=c(CRAN="https://packagemanager.posit.co/cran/latest"))
options(BioC_mirror = "https://packagemanager.posit.co/bioconductor")

in your R profile. The above will make the latest package versions from CRAN and BioConductor available.

Use of time-based CRAN snapshots

Time-based snapshots can be used for increased reproducibility, especially in environments where the users do not make use of renv for fixing their R package versions. By setting a time-based snapshot, any R package installation without a specific package version definition will install the most recent version available at the given snapshot. Such snapshots and their respective URL can be selected by clicking on a calendar date in the Section “Repository URL” of the “Setup” page for CRAN and selecting “Freeze”. For the snapshot of Nov 26th, 2021, the URL is “https://packagemanager.posit.co/cran/2021-11-26”.

If time-based CRAN snapshots are used, it is advisable to set the dates to a time when the BioConductor version compatible](https://bioconductor.org/about/release-announcements/#release-versions) with the R version was released to ensure compatibility.

Immutability/Lock of of CRAN Package Data

In addition to plain time-based snapshots, the package data available for a given date can be locked against future changes by selecting “Lock Package Data” in the “Repository URL” section of the Setup Page for the CRAN repo. For Nov 26th, 2021, the URL is “https://packagemanager.posit.co/cran/2021-11-26+MTo2NTMyOTYwOzhBNzEyRTVE”.

This feature ensures reproducibility even if changes are made in Package Manager, for example a change to the database scheme. Further information can be found in the Package Manager Admin Guide.

Private Package Manager

If a private Package Manager is used, the largest flexibility is possible. The only required change compared to using public Package Manager is the hostname.

Additional capabilities come into play via the creation of custom CRAN-like repositories that can mix BioConductor releases with latest or time-based snapshots of CRAN.

The basic idea is to create a repo that is subscribed to both an appropriate BioConductor release and CRAN. Additional package sources (e.g. local, git based) can be subscribed to this repo, too. The benefit of this solution is that the newly created repo can be used with snapshots.

More details on this approach are in the Package Manager admin guide and in the Package Manager Quickstart.

Once such a setup is in place, this custom repo only can be set in the repo definition using the same approach as outlined for public Package Manager - only install.packages() or renv::install() can be used. The use of BiocManager::install() is no longer needed.

For a custom repo named “bioconductor-3.14” that contains both latest CRAN and BioConductor release 3.14, the appropriate repo setting would be

options(repos=c(pRSPM="https://hostname-of-private-rspm/bioconductor-3.14/latest"))

where hostname-of-private-rspm corresponds to the DNS name of your local/private Package Manager.

Summary

  • Working with BioConductor packages for code development is possible
    • in the absence of Package Manager for all scenarios described, e.g. 
    • If public Package Manager is used, additionally Bioc_Mirror needs to be set and pointed to the respective URL of the bioconductor repository
    • For private Package Manager and the usage of the “CRAN like” repository that includes both CRAN and BioConductor repos only this single combined repository needs to be defined in the R profile.
  • Publishing to Connect is possible
    • in the absence of Package Manager only using the CRAN way and renv(), i.e. when persistently defining the BioConductor repositories in the R profile.
    • in the presence of Package Manager for any of the described uses.