Architecture for RStudio on SageMaker

A Note on Nomenclature

This article uses different terms to describe the different elements of “RStudio on SageMaker.” To offer clarification:

RStudio or RStudio IDE refers to the Integrated Development Environment (IDE) for code development.

Posit Workbench is the centralized, multi-user, professionally-licensed application that launches IDEs.

RStudio on SageMaker is the implementation of Posit Workbench within a SageMaker Domain in which RStudio IDE sessions are made available to SageMaker users.

You may see references to RStudio Workbench or RStudio Server Pro in AWS documentation. These are previous names for Posit Workbench, and are being phased out as product and company rebranding updates are made in AWS documentation.

Architecture Overview

Amazon SageMaker provides a managed environment for machine learning, where architecture and infrastructure management is handled by Amazon. RStudio on SageMaker brings the RStudio IDE to the SageMaker managed environment. The implementation of RStudio on SageMaker presents a modified architecture from a Posit Workbench implementation that is installed on your own, self-managed infrastructure. See Differences from Workbench on Self-Managed Infrastructure below for specific implementation differences.

The diagram below provides an overview of the SageMaker Domain and the Posit Workbench implementation within. Of note, Posit Workbench within SageMaker only launches RStudio sessions. SageMaker Studio is responsible for launching Jupyter sessions.

flowchart LR

    subgraph vpc["AWS Virtual Private Cloud (VPC)"]

        subgraph sagemakerDomain["AWS SageMaker Domain"]
            
            subgraph sagemakerStudioEC2["EC2 (SageMaker Studio)"]
                sagemakerStudio("SageMaker Studio")
            end
            
            subgraph workbenchEC2["EC2 (Posit Workbench)"]
                workbench("Posit Workbench")
                launcher("Launcher")
                workbench---launcher
            end

            efs[["\n\nAWS Elastic File System (EFS)\n\n\n"]]


            subgraph ec2SpecC ["EC2 (e.g. T3 Large)"]
                sagemakerStudioSession1("SageMaker Studio Session #1")
                sagemakerStudioSession2("SageMaker Studio Session #2")
            end
            
            subgraph ec2SpecA ["EC2 (e.g. T5 Large)"]
                rstudioIdeSession1("RStudio IDE Session #1")
                rstudioIdeSession2("RStudio IDE Session #2")
            end

            
            subgraph ec2SpecB ["EC2 (e.g. T3 Medium)"]
                rstudioIdeSession3("RStudio IDE Session #3")
            end

            efs-.-sagemakerStudioEC2
            efs-.-workbenchEC2
            efs-.-ec2SpecA
            efs-.-ec2SpecB
            efs-.-ec2SpecC
            sagemakerStudio---sagemakerStudioSession1
            sagemakerStudio---sagemakerStudioSession2
            launcher---rstudioIdeSession1
            launcher---rstudioIdeSession2
            launcher---rstudioIdeSession3

        end
    end

    classDef ec2Class fill:#c6c7cc
    classDef server fill:#FAEEE9,stroke:#ab4d26
    classDef product fill:#447099,stroke:#213D4F,color:#F2F2F2
    classDef session fill:#7494B1,color:#F2F2F2,stroke:#213D4F
    classDef element fill:#C2C2C4,stroke:#213D4F
    
    class sagemakerStudioEC2,workbenchEC2,ec2SpecA,ec2SpecB,ec2SpecC server
    class workbench,launcher product
    class rstudioIdeSession1,rstudioIdeSession2,rstudioIdeSession3 session
    class efs,sagemakerStudio,sagemakerStudioSession1,sagemakerStudioSession2 element
    
    style vpc fill:#f6f6f7
    style sagemakerDomain fill:#f6f6f7,stroke-dasharray: 5 5

SageMaker Architecture Key Components

AWS SageMaker Domain

RStudio is installed within a SageMaker Domain. The Posit Workbench integration document Enable RStudio on AWS SageMaker describes how to enable RStudio.

AWS Elastic File System (EFS)

An EFS volume is automatically created in the SageMaker Domain the first time a user onboards to Amazon SageMaker. Each user has a home directory on the EFS, and the home directory is persistent across sessions. Users can access their home directory from any RStudio or SageMaker Studio session. See Manage Your Amazon EFS Storage Volume in SageMaker Studio for more details.

Posit Workbench and Session EC2s

Posit Workbench runs on a persistent EC2. Workbench launches RStudio sessions via the Launcher into on-demand EC2 instances. Sessions are launched with either a default container image or a custom image attached to the SageMaker domain by an administrator. The instance types are selectable by the user. Because the EFS-backed user home directory is accessible to each session, work can be started in one instance type, and then resumed in a different instance type if user resource needs change.

Screenshot of Posit Workbench in SageMaker

Screenshot of Posit Workbench in SageMaker

SageMaker Studio and Session EC2s

An EC2 instance will be running the SageMaker Studio software. SageMaker Studio can launch new SageMaker Studio Sessions into additional EC2 instances, including Jupyter sessions.

SageMaker Studio running in SageMaker

SageMaker Studio running in SageMaker

Differences from Workbench on Self-Managed Infrastructure

The implementation of RStudio on SageMaker has notable differences from a Posit Workbench implementation that is installed on your own, self-managed infrastructure.

Specifically:

  • Amazon manages the infrastructure, configuration files, default container image, version of Workbench, and versions of R available.

  • RStudio on SageMaker only launches RStudio IDE sessions; other IDEs supported by Workbench (i.e., Jupyter Lab, Jupyter Notebook, and VS Code) are not enabled.

  • Project Sharing in RStudio is not currently supported by RStudio on SageMaker.

  • Workbench Jobs are not currently supported by RStudio on SageMaker.

  • Currently there is not a direct means for mounting external file systems to the SageMaker Domain.