Deploying Data for Content On Connect

Apps and reports on Connect usually rely on one or more data sets. Depending on the resources your organization has and the use case, it might make sense to store that data in a database, a flat file like a csv, or some other kind of interface.

This article will give an overview in terms of how to update data for an app or report on Connect.

Summary

Data Method Recommended use case
Data in the App Bundle Data is updated as often or less often than the app code and doesn’t need to be shared across projects.
Database Gold standard for data storage and access.
Pins Lightweight datasets, ephemeral data, models.
Blob Storage Golden standard for storing large amounts of unstructured data.
Directory on Connect Typically the method of last resort for unstructured data as it requires SSH access to the server for setup and often requires code changes between development and deployment. For very large files, it may be the only option.
System integrations Case by case basis depending on the system, typically through a package that is a wrapper for that systems API.

Data in the App Bundle

If your data lives inside your app directory and is only updated as often as the app code, you can upload your data with your code and access it with a relative file path.

Data in the Connect Bundle

In this configuration, your code would access the data at the same relative path regardless of whether the code is being developed locally or is deployed to Connect.

The main limitation to including the data in the Connect bundle is that you must deploy everything (data and code) at the same time.

Note

If you update the data more often than the app code, use a different method.

Data Outside the App Bundle

If your data is updated more frequently than your app, the data can live in a database, in a pin on Connect, in a separate directory on Connect, or be accessed using other means.

Database

Apps, reports, and APIs on Connect can pull data from a live connection to a database.

Live Database Connection

If you’re publishing an app or API, you might want all of the data pulled in at start up time or you might want a live connection that will pull data as input is received from the user.1 Either of these patterns can work.

Pulling Data from a Database on Connect

If you are pulling data directly from a database into a Shiny app, you’ll want to consider how you will protect your credentials. We have recommendations for these topics and more within our database best practices pages.

Pins

Pins is an R package that allows for easy storage and retrieval of data, models, and other R objects. Pins can be a good choice when you don’t have write access to a database or when the data you’re trying to save is something like a model that won’t fit nicely into most databases.

Using a Pin on Connect

You can create a pin with the pins::pin_write command and retrieve the data with pins::pin_read. A major benefit of pins is that your code won’t have to change at all when you deploy – the read and write commands will work in both the IDE during development and on Connect.

The pins page has more details on how to use pins.

Here is an example of how to use pins with either a Shiny app or Plumber API.

Blob Storage

Blob storage is the golden standard for storing unstructured data. It provides more options for access from developers compared to the challenges with file mounts experienced with traditional file storage. It comes with advantages over traditional file storage with low latency, integration with backup systems, and overall typically reduced cost.

Connection to your Blob storage is through forming the connection in your R or Python code, typically facilitated with a package. Common Blob storage providers include:

Similar to when managing connections to a database you’ll want to consider how you will protect your credentials. We have recommendations within our securing credentials page.

Directory on Connect

Content on Connect can use the server’s file system to store data. This option is usually a last resort because it requires SSH access to the server for setup, and often requires code changes between the IDE and Connect. For very large files, it may be the only option.

Content running on Connect is sandboxed within the server, so you must use an absolute path to access data, and must manually ensure that the relevant directory has the proper read/write permissions.2

Using Data from Persistent Storage on Connect

One potential difficulty in using data from persistent storage is that the data path will probably change between the RStudio IDE and Connect, unless the directory is in the same location on both computers. You can use the config package to have different paths in the development and deployed environment.

Other Methods for Accessing Data

Data could be stored in a separate system accessible through an underlying API (Application Programming Interface). In this case authentication/access is handled on a system by system basis.

Developers can interact directly with an API using packages such as httr and jsonlite in R or requests in Python. Developers can also build and deploy their own APIs for handling integrations, for example using Plumber in R or Flask in Python, that can then be deployed and accessed on Connect.

For some APIs, in order to make them more accessible, the interface is bundled as a package for developers to use. Some examples include Microsoft365R (see our additional writeup here), spotifyr, rtweet, googlesheets4, rdrop2, aws.s3, and many more. For up to date information on the best practices for how to use a package see it’s documentation.

Special Considerations for Shiny Apps

Pull Data on a Scheduler

Shiny apps work entirely on a “pull” model, so Shiny will need to check if the data is updated, as opposed to the new data “pushing” itself into the app.

If you want to write a general data-pulling function, shiny::reactivePoll allows you to periodically check a resource for changes and run an arbitrary function if it has. You can also use shiny::invalidateLater to invalidate a reactive on a schedule.

There are also several useful functions that are specially designed to schedule data refreshes. If you’re using a pin, pins::pin_reactive allows you to check for updates to a pin on a schedule. Similarly, shiny::reactiveFileReader allows you to check for updates to a file on persistent storage.

Speedy data loading

Regardless of how and where you load data, faster is always better. Some tips to speed load times for Shiny apps:

Do less! Reducing the amount of data transmitted is the best way to reduce load times. Often, pre-aggregating data will prove sufficient for quick data loads.

Use Shiny Scoping + Connect Settings Content outside a Shiny app’s server block (i.e. in global) will only be loaded once per process on Connect. Setting Connect’s Min Processes to 1 or greater will ensure that data is loaded before the first user arrives.3

Footnotes

  1. How to close the connection when the content closes will depend on the content type. For example, you can register onStop calls in Shiny, or an exit hook in Plumber.↩︎

  2. The user who runs the content will need access. By default this is the rstudio-connect user. You can check which user a piece of content will run as under the Access tab on the content in Connect under Who runs this content on the server.↩︎

  3. Reactive content only runs inside Shiny server blocks. If you want to pre-load data and have the data respond to user input or reactive polling, you can load the data in the global context and then update it using the global assignment operator (<<-).↩︎