5. Monitoring

In this section you will learn:

How to enable and configure Workbench server monitoring endpoints
How monitor server status and performance
How to integrate Workbench logs and metrics into external monitoring tools
How to set user/group resource limits

Server Monitoring Endpoints

Workbench provides two endpoints that facilitate server health and performance monitoring:

Server health check endpoint
Admin dashboard endpoint

These endpoints must be enabled in the rserver.conf file.

Server Health Check Monitoring

The health check endpoint is used to monitor server status. Enable this health check with the server-health-check-enabled configuration setting in the rserver.conf file. For example:

# /etc/rstudio/rserver.conf
server-health-check-enabled=1

After enabling this setting, restart the server and visit

http://<workbench-address>/health-check

If Workbench is running, the default output will show as follows:

active-sessions: 0
idle-seconds: 43
cpu-percent: 0.0
memory-percent: 7.8
swap-percent: 0.0
load-average: 0.1
license-status: Activated
license-days-left: 249
license-allow-product-usage: 1

You can customize the output format to return alternate formats, such as XML or JSON, which are commonly used to parse data with an external monitoring system.

Admin Dashboard

The Admin Dashboard presents an administrator’s view of the Workbench server. In addition to giving access to some administrative tasks such as locking users and session management, the dashboard provides performance and log data including:

Monitoring of active sessions and their CPU and memory utilization
Historical usage data for individual server users (session time, memory, CPU, logs)
Historical server statistics (CPU, memory, active sessions, system load)
Searchable server log (view all messages or just those for individual users)

You must enable the dashboard. In the rserver.conf file, set admin-enabled=1. For example:

# /etc/rstudio/rserver.conf
admin-enabled=1

After enabling this setting, restart the server and visit

http://<workbench-address>/admin

You can optionally specify groups that can access the dashboard.

RRD Logs for Server Resource Usage

Workbench monitors per-user and system-wide resource usage. This information is written by default to a set of RRD (Round Robin Database) files, and is what is presented by the Admin Dashboard.

The RRD format is consumable by RRDtool, a Linux utility designed for time series data such as network bandwidth, temperatures, or CPU load. RRD data is stored in a circular buffer, thus the system storage footprint remains constant over time.

RRD files are stored, by default, in /var/lib/rstudio-server/monitor/rrd. Storage of Workbench system-wide monitoring data requires about 20 MB of disk space, and about 3.5 MB for each user. If you have a large number of users, you may elect to specify an alternate volume for monitoring data. This is specified in the monitor-data-path configuration setting.

External Monitoring

The Workbench server monitoring endpoints and logs can be shipped to an external monitoring tool. An external monitoring tool can check a server’s health, aggregate server metrics and logs for performance monitoring, and send alerts based on events or custom thresholds that you define.

Metrics and log aggregators import data using a standard protocol. Workbench natively uses the Carbon protocol and exposes metrics using aGraphite service. This is a push-based service where metrics are pushed to the central monitoring service. Read more on configuring Workbench for Graphite in the Using Graphite section of the Admin Guide.

As an alternative to Graphite, Prometheus offers a pull-based approach to monitoring. Workbench does not natively export to Prometheus, however the Prometheus project offers a Graphite Exporter that will convert Graphite data into a Prometheus endpoint. Read more on configuring Workbench for Prometheus monitoring in this support article.

There are many external metrics and log aggregators that utilize either a Graphite or Prometheus service. You should seek to use whichever service your organization may already have in place. A non-comprehensive, un-ranked list of providers include:

Datadog
Nagios
Prometheus and Grafana
Solarwinds
Zabbix

Controlling Resource Utilization

Performance monitoring is a valuable practice in administering a production server. In some cases, administrators may also place guardrails around user behavior to avoid system resource constraints or unavailability due to greedy consumption, runaway processes, or unexpected events.

Guardrails around resource utilization can be established at the Linux server level and via Workbench configuration.

Server-Level Controls

Server-level controls will place guardrails around Workbench itself. From this level, you can limit the CPU and memory that Workbench, and therefore user session processes, can consume. This is done by setting limits at the systemd unit service level.

Refer to the systemd documentation for your Linux distribution for guidance on this configuration.

Workbench Session Controls

Global, group, and user profiles can be defined to specify the behavior of Workbench sessions and provide definition around desired system resource usage.

When sessions are running local to the Workbench server, profiles are defined in /etc/rstudio/profiles. You can provide specifications for:

Version of R used
CPU affinity (i.e., which set of cores the session should be bound to)
Scheduling priority (i.e., nice value)
Resource limits (maximum memory, processes, open files, etc.)
R session timeouts (amount of idle time which triggers session suspend)
R session kill timeouts (amount of idle time which triggers a session to be destroyed and cleaned up)

Profiles can still be used in distributed environments, with a few considerations.

In a Load Balanced architecture, resource limits are applied to a session after it has been assigned to a server in a cluster.
If sessions are launched via the Job Launcher (i.e., when starting sessions with Kubernetes or Slurm), scheduling and priority should be configured in the Job Launcher. See User and Group Profiles with Kubernetes or Slurm Configuration for more information.

Monitoring Lab

🚀 Launch the exercise environment!

In the exercise environment you will get experience:

Enabling Workbench server health checks
reviewing the Admin dashboard and logs
Setting sessions controls

Go to: 6. Troubleshooting