Managing Packages within Posit Team
Managing open source packages for data science work spans several different environments and, often, teams. For that reason, deeply understanding package management is difficult.
This page is designed to help teams figure out how to manage the packages in their environment with a minimum of conceptual learning.
For those who want a deeper conceptual treatment, please check out our environments section, and our two webinars on package management.
Package Management Overview
Packages are installed from repositories to libraries.
With Posit Team, there are three components to managing packages:
- An IT/admin configures Package Manager as the centralized package repository for Posit Team (or chooses not to)
- An IT/admin configures the default package settings on Workbench
- Individual data scientists manage the package libraries for their particular projects
Most teams find that adopting this model simplifies package management for admins and data scientists alike, including in environments with strong security and validation requirements.
1. Configuring Repositories
Package Manager, one of the components of Posit Team, is the repository for R and Python packages used by Workbench and Connect.
A private Package Manager instance is a requirement to successfully run Posit Team when:
- The environment is offline or air-gapped so Workbench and Connect will not have direct internet access to public Package Manager
- Packages must be validated into the environment
- Data scientists are developing private packages for internal use
In most organizations, Package Manager is configured and administered by an IT/admin who has SSH access to the server. In some teams, an IT/admin sets up the Package Manager server and data scientists are responsible for managing the actual package sets present.
Package Manager can host one or more repositories that include public CRAN packages and private packages, as well as BioConductor and PyPI repositories. Many organizations are unsure of what repository configuration is right for them. The flow chart below is designed to help teams figure out which repository configuration is best for them.
Click on dark blue for relevant documentation.
2. Set Workbench Defaults
Setting a Default Repository on Workbench
Once the Package Manager is configured, server admins should configure it as the default repository(ies) on Workbench.
For more information on how to actually set the default repository in Workbench, please see this article
Installing Base Package Sets
Admins frequently ask whether they can install base package sets for all users.
This is possible, but is usually unnecessary.
Once configuring an appropriate default repository, standard R install.packages
and Python pip install
commands will install from the correct repositories.
The main reason to install a base package set is to reduce duplicate package installs across users. Package sizes tend to be modest, so this is rarely an issue in practice.1
Should your organization decide to do server-wide package installs, they can be accomplished by doing standard installs in both R and Python as a sudo
user. Packages must be installed per version of R/Python.
For example, after SSH-ing into the server, an admin could do
$ sudo /opt/R/3.6.2/bin/R
followed by
> install.packages("my-pkg")
In Python, this would be done directly with the pip
utility
$ sudo /opt/python/3.7.3/bin/pip install my-pkg
3. Manage Libraries
Once admins have properly configured default repositories on Workbench, normal package installs should just work.
Increasingly, Data Scientists are snapshotting and restoring libraries on a per-project basis, which allows for project-level dependency isolation.
Project-level isolation can be achieved with the renv
package in R and using virtualenv
in Python.2
This virtual environment workflow makes it easy to create isolated environments on a per-project basis without doing repetitive package installs, and also allows for easy sharing of project dependencies across data scientists. For more information on how it works, see this page.
Frequently Asked Questions
These are common questions from IT/Admins about configuring package management for Posit Team.
What about system requirements?
Most R and Python packages have no system dependencies other than the language itself.
However, some packages depend on separate external libraries.
One of the benefits of using Workbench instances is that these system libraries only have to be installed a few times rather than on each user’s laptop.
Package Manager provides a list of required system libraries and install commands at both the package and the repository level.
To see the requirements for an individual package, search for the package in the search bar.
To see the requirements for a whole repository, click on the setup tab for that repository and scroll down. Choose your OS to get the relevant install commands.
What if I need to validate packages into my environment?
Package Manager allows for the creation of curated package sets, which can be validated before they are made available to users. Details on how this works are in the Package Manager admin guide.
Admins then often wonder how to lock users into those package sets.
The best way to accomplish this is to
- Set the right default repository in Workbench
- Disallow access to other repositories as needed
Fully disallowing access to public repositories can be accomplished via networking rules. There is no way to disallow Workbench users from changing their repositories, but networking rules can prevent those repositories from being accessed.
It is also possible to disallow changing the installation repository in the RStudio Pro GUI by setting the allow-r-cran-repos-edit = 0
in /etc/rstudio/rsession.conf
.
What if my organzation requires offline/air-gapped operations?
This is one of the reasons Package Manager exists.
Workbench and Connect need access to a package repository to install packages. There are no other internet connectivity requirements for the products themselves.
If possible, we recommend allowing outbound access from Package Manager to the Posit sync service, to make sync-ing new packages easier. Package Manager does include utilities for fully offline operation.
Are there special considerations for Connect deployments?
When a deployment to Connect occurs, a package manifest is generated, whether implicitly (push-backed publishing) or explicitly (git- or api-backed publishing). This manifest captures the current state of the packages in the deploy environment, including the original installation repository.
If deployments are happening exclusively from Workbench, usually no further configuration is needed.
If deployments are happening from desktop environments, it is worthwhile to configure the RPackageRepository
setting to point to a binary package repository for the server’s OS on Package Manager.
Any system libraries that need to be installed to Workbench will need to be installed on the Connect server as well.
Eagle-eyed users may notice that Connect uses packrat
, renv
’s predecessor, to deploy R packages to Connect. This build of packrat
is heavily customized, and using renv
to manage project environments is entirely consistent with Connect’s deployment process.
How do I get the repository URL from Package Manager?
Getting the right repository from Package Manager is a 4 step process.
- Select the correct repo and navigate to the
Setup
tab. - Switch the repo type from
Source
toBinary
and select the correct OS. - If needed, choose the date for the snapshot.
- Copy the URL from the box.
Footnotes
Footnotes
It used to be the case that binary R packages were unavailable on Linux, so package installs took a very long time on Workbench and Connect and there were many compile-time package dependencies. Now that both public and private Package Manager makes these binaries available, these issues are much reduced.↩︎
There are many virtual environment managers in Python, and you should use the one that is standard for your organization. If your organization doesn’t have a standard, we have seen
virtualenv
/venv
work for many.↩︎