Using AWS managed File Systems with Workbench

In multi-node configurations of Workbench, shared storage is a requirement for users’ Linux home directories and a convenience for users’ shared data. Commonly, this shared storage is provided by a mounted NFS volume, or AWS EFS (see Using Amazon EFS with Posit Team). AWS FSx is another managed file storage service available on AWS as an alternative to EFS. It provides options to use Linux-compatible file systems such as Lustre and ZFS. It is important to note that Workbench is fully compatible with any given file system as long as it supports extended POSIX ACLs.

This article summarizes considerations and provides benchmarking results for AWS FSx for Lustre and AWS FSx OpenZFS against AWS EFS to assist teams in evaluating these options. See Using Amazon EFS (Elastic File System) with Posit Team for EFS performance testing and configuration recommendations.

Overall, sharing user data and home directories worked well with AWS FSx for Lustre and AWS FSx OpenZFS, and were comparable to EFS. Similar to EFS, AWS FSx OpenZFS is not compatible with Workbench’s Project Sharing functionality due to a lack of support for access control lists (ACLs). While FSx for Lustre does support extended POSIX ACLs, it is not functional with Project Sharing due to a suspected issue in Lustre that is currently being diagnosed. You can find details on specific testing of the file storage systems below.

Performance Provide shared user data Provide shared home directories Compatible with Workbench Project Sharing
Mounted NFS
AWS FSx Lustre

Our Test Environment

For testing, we used an open source file system performance benchmarking tool called fsbench to evaluate FSx as a shared file system provider for RStudio implementations.

Our Workbench architecture for this benchmarking uses 2 EC2 instances of type t2.large with the following configurations for both FSx for Lustre and OpenZFS:

Type AWS FSx Lustre AWS FSx OpenZFS
Deployment Type Persistent Persistent
Storage Type SSD SSD
Throughput 50 MB/s 64 MB/s
Storage Capacity 1.2Tib 1.2TiB
Other Lustre Version: 2.10 Provisioned IOPS: Automatic, Deployment Type: Single AZ

Using FSx to provide user data in Workbench

One reason you may want to use FSx is to serve user data in Workbench. User data in Workbench is any data that individuals need to complete their specific tasks. This can include files such as those shared via a shared file system (Google Drive, Sharepoint, etc.), or any data that is not being accessed through a database. You may use FSx mounts to provide external storage space for user data, and for this use case, Workbench performance will be dependent on FSx settings. If you encounter any issues while trying to access data on FSx with this setup, please reach out to AWS Support.

Using FSx to provide shared home directories in Workbench

Another reason you may want to use FSx is to provide user home directories, especially in a High Availability configuration of Workbench where it is a requirement to have shared user home directories across the cluster. Workbench uses the user home directory as the location for all configuration and project files. The home directory is essential in managing user configuration files and maintaining a consistent experience across the network. In this scenario, RStudio’s performance depends on what kind of shared storage is being used for setting up home directories. Our testing showed that FSx for Lustre and OpenZFS had similar performances in installation of R Packages and the read/write of files.

Project Sharing in Workbench

Project Sharing is a feature of Workbench that enables users to work together on RStudio Projects. To use Project Sharing, the directories hosting the projects to be shared must be on a volume that supports Access Control Lists (ACLs). Workbench uses ACLs to grant collaborators access to shared projects; ordinary file permissions are not modified.

Based on our findings, EFS, OpenZFS and Lustre do not work with project sharing due to their lack of support for ACLs.

Further work

It is important to note that the setup we used for this testing was minimal, and might not reflect the actual architecture (EC2 instance types, FSx throughput, etc.) that you might be considering. We highly encourage you to perform your own benchmarking and acceptance testing. You can use the fsbench tool, which is the same tool we have used in this effort.