Using AWS managed File Systems with Workbench
In multi-node configurations of Workbench, shared storage is a requirement for users’ Linux home directories and a convenience for users’ shared data. Commonly, this shared storage is provided by a mounted NFS volume, or AWS EFS (see Using Amazon EFS with Posit Team). AWS FSx is another managed file storage service available on AWS as an alternative to EFS. It provides options to use Linux-compatible file systems such as Lustre and ZFS. It is important to note that Workbench is fully compatible with any given file system as long as it supports extended POSIX ACLs.
This article summarizes considerations and provides benchmarking results for AWS FSx for Lustre and AWS FSx OpenZFS against AWS EFS to assist teams in evaluating these options. See Using Amazon EFS (Elastic File System) with Posit Team for EFS performance testing and configuration recommendations.
Overall, sharing user data and home directories worked well with AWS FSx for Lustre and AWS FSx OpenZFS, and were comparable to EFS. Similar to EFS, AWS FSx OpenZFS is not compatible with Workbench’s Project Sharing functionality due to a lack of support for access control lists (ACLs). While FSx for Lustre does support extended POSIX ACLs, it is not functional with Project Sharing due to a suspected issue in Lustre that is currently being diagnosed. You can find details on specific testing of the file storage systems below.
Performance | Provide shared user data | Provide shared home directories | Compatible with Workbench Project Sharing | |
---|---|---|---|---|
Mounted NFS | ✅ | ✅ | ✅ | ✅ |
AWS EFS | ✅ | ✅ | ✅ | ❌ |
AWS FSx Lustre | ✅ | ✅ | ✅ | ❌ |
AWS FSx OpenZFS | ✅ | ✅ | ✅ | ❌ |
Our Test Environment
For testing, we used an open source file system performance benchmarking tool called fsbench to evaluate FSx as a shared file system provider for RStudio implementations.
Our Workbench architecture for this benchmarking uses 2 EC2 instances of type t2.large with the following configurations for both FSx for Lustre and OpenZFS:
Type | AWS FSx Lustre | AWS FSx OpenZFS |
---|---|---|
Deployment Type | Persistent | Persistent |
Storage Type | SSD | SSD |
Throughput | 50 MB/s | 64 MB/s |
Storage Capacity | 1.2Tib | 1.2TiB |
Other | Lustre Version: 2.10 | Provisioned IOPS: Automatic, Deployment Type: Single AZ |
Using FSx to provide user data in Workbench
One reason you may want to use FSx is to serve user data in Workbench. User data in Workbench is any data that individuals need to complete their specific tasks. This can include files such as those shared via a shared file system (Google Drive, Sharepoint, etc.), or any data that is not being accessed through a database. You may use FSx mounts to provide external storage space for user data, and for this use case, Workbench performance will be dependent on FSx settings. If you encounter any issues while trying to access data on FSx with this setup, please reach out to AWS Support.
Project Sharing in Workbench
Project Sharing is a feature of Workbench that enables users to work together on RStudio Projects. To use Project Sharing, the directories hosting the projects to be shared must be on a volume that supports Access Control Lists (ACLs). Workbench uses ACLs to grant collaborators access to shared projects; ordinary file permissions are not modified.
Based on our findings, EFS, OpenZFS and Lustre do not work with project sharing due to their lack of support for ACLs.
Further work
It is important to note that the setup we used for this testing was minimal, and might not reflect the actual architecture (EC2 instance types, FSx throughput, etc.) that you might be considering. We highly encourage you to perform your own benchmarking and acceptance testing. You can use the fsbench tool, which is the same tool we have used in this effort.