Architecture for RStudio on SageMaker
This article uses different terms to describe the different elements of “RStudio on SageMaker.” To offer clarification:
RStudio or RStudio IDE refers to the Integrated Development Environment (IDE) for code development.
Posit Workbench is the centralized, multi-user, professionally-licensed application that launches IDEs.
RStudio on SageMaker is the implementation of Posit Workbench within a SageMaker Domain in which RStudio IDE sessions are made available to SageMaker users.
You may see references to RStudio Workbench or RStudio Server Pro in AWS documentation. These are previous names for Posit Workbench, and are being phased out as product and company rebranding updates are made in AWS documentation.
Architecture Overview
Amazon SageMaker provides a managed environment for machine learning, where architecture and infrastructure management is handled by Amazon. RStudio on SageMaker brings the RStudio IDE to the SageMaker managed environment. The implementation of RStudio on SageMaker presents a modified architecture from a Posit Workbench implementation that is installed on your own, self-managed infrastructure. See Differences from Workbench on Self-Managed Infrastructure below for specific implementation differences.
The diagram below provides an overview of the SageMaker Domain and the Posit Workbench implementation within. Of note, Posit Workbench within SageMaker only launches RStudio sessions. SageMaker Studio is responsible for launching Jupyter sessions.
SageMaker Architecture Key Components
AWS SageMaker Domain
RStudio is installed within a SageMaker Domain. The Posit Workbench integration document Enable RStudio on AWS SageMaker describes how to enable RStudio.
AWS Elastic File System (EFS)
An EFS volume is automatically created in the SageMaker Domain the first time a user onboards to Amazon SageMaker. Each user has a home directory on the EFS, and the home directory is persistent across sessions. Users can access their home directory from any RStudio or SageMaker Studio session. See Manage Your Amazon EFS Storage Volume in SageMaker Studio for more details.
Posit Workbench and Session EC2s
Posit Workbench runs on a persistent EC2. Workbench launches RStudio sessions via the Launcher into on-demand EC2 instances. Sessions are launched with either a default container image or a custom image attached to the SageMaker domain by an administrator. The instance types are selectable by the user. Because the EFS-backed user home directory is accessible to each session, work can be started in one instance type, and then resumed in a different instance type if user resource needs change.
SageMaker Studio and Session EC2s
An EC2 instance will be running the SageMaker Studio software. SageMaker Studio can launch new SageMaker Studio Sessions into additional EC2 instances, including Jupyter sessions.
Differences from Workbench on Self-Managed Infrastructure
The implementation of RStudio on SageMaker has notable differences from a Posit Workbench implementation that is installed on your own, self-managed infrastructure.
Specifically:
Amazon manages the infrastructure, configuration files, default container image, version of Workbench, and versions of R available.
RStudio on SageMaker only launches RStudio IDE sessions; other IDEs supported by Workbench (i.e., Jupyter Lab, Jupyter Notebook, and VS Code) are not enabled.
Project Sharing in RStudio is not currently supported by RStudio on SageMaker.
Workbench Jobs are not currently supported by RStudio on SageMaker.
Currently there is not a direct means for mounting external file systems to the SageMaker Domain.