Managing Packages within Posit Team

Managing open source packages for data science work spans several different environments and, often, teams. For that reason, deeply understanding package management is difficult.

This page is designed to help teams figure out how to manage the packages in their environment with a minimum of conceptual learning.

For those who want a deeper conceptual treatment, please check out our environments section, and our two webinars on package management.

Package Management Overview

Packages are installed from repositories to libraries.

With Posit Team, there are three components to managing packages:

  1. An IT/admin configures Package Manager as the centralized package repository for Posit Team (or chooses not to)
  2. An IT/admin configures the default package settings on Workbench
  3. Individual data scientists manage the package libraries for their particular projects

Flow chart showing Admin configuring repositories, then Admin setting server defaults then Data Scientist managing libraries

Most teams find that adopting this model simplifies package management for admins and data scientists alike, including in environments with strong security and validation requirements.

1. Configuring Repositories

Package Manager, one of the components of Posit Team, is the repository for R and Python packages used by Workbench and Connect.

A private Package Manager instance is a requirement to successfully run Posit Team when:

  • The environment is offline or air-gapped so Workbench and Connect will not have direct internet access to public Package Manager
  • Packages must be validated into the environment
  • Data scientists are developing private packages for internal use

In most organizations, Package Manager is configured and administered by an IT/admin who has SSH access to the server. In some teams, an IT/admin sets up the Package Manager server and data scientists are responsible for managing the actual package sets present.

Package Manager can host one or more repositories that include public CRAN packages and private packages, as well as BioConductor and PyPI repositories. Many organizations are unsure of what repository configuration is right for them. The flow chart below is designed to help teams figure out which repository configuration is best for them.

Click on dark blue for relevant documentation.

Configure Repositories
Admin
Configure Repositories…
Yes
Yes
No
No
Environment offline?
Need to validate packages?
Private packages?
Environment offline?…
Public Package Manager
Public Pa…
Use Private Package Manager
Use Private Pa…
No
No
Yes
Yes
Have Private Packages
Have Private Packages
Configure local package source
Configure local pa…
No
No
Need to validate packages?
Need to validate…
Yes
Yes
No
No
Different package sets per R version?
Different package sets per R…
Multiple Curated CRAN repos
Multiple Curated…
Single full CRAN Repo
Single full CRAN…
Single Curated CRAN Repo
Single Curated CR…
Package Manager Configuration
Package Manag…
Combine as needed with…
Combine as needed wit…
Viewer does not support full SVG 1.1

2. Set Workbench Defaults

Setting a Default Repository on Workbench

Once the Package Manager is configured, server admins should configure it as the default repository(ies) on Workbench.

flowchart for setting default repositories in Workbench.

For more information on how to actually set the default repository in Workbench, please see this article

Installing Base Package Sets

Admins frequently ask whether they can install base package sets for all users.

This is possible, but is usually unnecessary.

Once configuring an appropriate default repository, standard R install.packages and Python pip install commands will install from the correct repositories.

The main reason to install a base package set is to reduce duplicate package installs across users. Package sizes tend to be modest, so this is rarely an issue in practice.1

Should your organization decide to do server-wide package installs, they can be accomplished by doing standard installs in both R and Python as a sudo user. Packages must be installed per version of R/Python.

flowchart for installing base pacakges for all users. If it is a requirement, admin installs the pacakges with sudo for each R and Python version.

For example, after SSH-ing into the server, an admin could do

$ sudo /opt/R/3.6.2/bin/R

followed by

> install.packages("my-pkg")

In Python, this would be done directly with the pip utility

$ sudo /opt/python/3.7.3/bin/pip install my-pkg

3. Manage Libraries

Once admins have properly configured default repositories on Workbench, normal package installs should just work.

Increasingly, Data Scientists are snapshotting and restoring libraries on a per-project basis, which allows for project-level dependency isolation.

Project-level isolation can be achieved with the renv package in R and using virtualenv in Python.2

Flow chart for managing project libraries

This virtual environment workflow makes it easy to create isolated environments on a per-project basis without doing repetitive package installs, and also allows for easy sharing of project dependencies across data scientists. For more information on how it works, see this page.

Frequently Asked Questions

These are common questions from IT/Admins about configuring package management for Posit Team.

What about system requirements?

Most R and Python packages have no system dependencies other than the language itself.

However, some packages depend on separate external libraries.

One of the benefits of using Workbench instances is that these system libraries only have to be installed a few times rather than on each user’s laptop.

Package Manager provides a list of required system libraries and install commands at both the package and the repository level.

To see the requirements for an individual package, search for the package in the search bar.

To see the requirements for a whole repository, click on the setup tab for that repository and scroll down. Choose your OS to get the relevant install commands.

gif of getting system requirements for a repo

gif of getting system requirements for a repo

What if I need to validate packages into my environment?

Package Manager allows for the creation of curated package sets, which can be validated before they are made available to users. Details on how this works are in the Package Manager admin guide.

Admins then often wonder how to lock users into those package sets.

The best way to accomplish this is to

  1. Set the right default repository in Workbench
  2. Disallow access to other repositories as needed

Fully disallowing access to public repositories can be accomplished via networking rules. There is no way to disallow Workbench users from changing their repositories, but networking rules can prevent those repositories from being accessed.

It is also possible to disallow changing the installation repository in the RStudio Pro GUI by setting the allow-r-cran-repos-edit = 0 in /etc/rstudio/rsession.conf.

What if my organzation requires offline/air-gapped operations?

This is one of the reasons Package Manager exists.

Workbench and Connect need access to a package repository to install packages. There are no other internet connectivity requirements for the products themselves.

Diagram of Posit Team networking

Diagram of Posit Team networking

If possible, we recommend allowing outbound access from Package Manager to the Posit sync service, to make sync-ing new packages easier. Package Manager does include utilities for fully offline operation.

Are there special considerations for Connect deployments?

When a deployment to Connect occurs, a package manifest is generated, whether implicitly (push-backed publishing) or explicitly (git- or api-backed publishing). This manifest captures the current state of the packages in the deploy environment, including the original installation repository.

If deployments are happening exclusively from Workbench, usually no further configuration is needed.

If deployments are happening from desktop environments, it is worthwhile to configure the RPackageRepository setting to point to a binary package repository for the server’s OS on Package Manager.

Any system libraries that need to be installed to Workbench will need to be installed on the Connect server as well.

Eagle-eyed users may notice that Connect uses packrat, renv’s predecessor, to deploy R packages to Connect. This build of packrat is heavily customized, and using renv to manage project environments is entirely consistent with Connect’s deployment process.

How do I get the repository URL from Package Manager?

Getting the right repository from Package Manager is a 4 step process.

  1. Select the correct repo and navigate to the Setup tab.
  2. Switch the repo type from Source to Binary and select the correct OS.
  3. If needed, choose the date for the snapshot.
  4. Copy the URL from the box.

gif of choosing repository

gif of choosing repository


Footnotes

Footnotes

  1. It used to be the case that binary R packages were unavailable on Linux, so package installs took a very long time on Workbench and Connect and there were many compile-time package dependencies. Now that both public and private Package Manager makes these binaries available, these issues are much reduced.↩︎

  2. There are many virtual environment managers in Python, and you should use the one that is standard for your organization. If your organization doesn’t have a standard, we have seen virtualenv/venv work for many.↩︎