Dev/Test/Prod with Posit Team

It is common for an analysis project to lead into a second phase. In this second phase, one or several data products are developed. A data product could be a dashboard, report, API or ETL process. It takes the insights gathered during the analysis phase and makes them available on a permanent basis to stakeholders.

Unlike the experimental nature of data analysis, a data product has to work consistently when consumed. This means that the code for the data product will need to be developed in a more formal manner. Development can occur in three basic stages:

The product is developed and tested by the developer
One, or a few, stakeholders test the product for functionality
The product is made available to all stakeholders

Each of these stages occurs in separate environments, respectively referred to as:

Development
Testing
Production

After the product is successfully tested in each stage, the code is then promoted to the next stage.

Code promotion in Posit Team

flowchart  BT

subgraph Dev
    dev(Posit Workbench)
end

g(Hosted Git)

subgraph Test
    test(Posit Connect)
end

subgraph Prod
    prod(Posit Connect)
end

Dev <--> g
Dev --> Test
g --> Prod
g --> Test

classDef server fill:#FAEEE9,stroke:#ab4d26
classDef product fill:#447099,stroke:#213D4F,color:#F2F2F2
classDef session fill:#7494B1,color:#F2F2F2,stroke:#213D4F
classDef element fill:#C2C2C4,stroke:#213D4F

class Dev,Test,Prod server
class dev,test,prod session
class g element

Development

With Posit Team, code development and testing are done within two of our products. As illustrated in this section’s diagram, development and unit testing happen in Workbench. Only developers, such as the data scientists or data analysts, need access to Workbench. They perform unit testing of the product before making it available to other stakeholders.

Testing

Once the data product is ready for review, then the developer will deploy the data product to Connect. The stakeholders who are responsible for making sure that the product works as expected are then able to access it via Connect. This is called User Acceptance testing (UAT).

The data product may depend on external assets, such as databases or shared drives. It is important to make sure that they are still accessible to the data product once deployed to Connect. This is called Integrated testing.

Production

After all testing is completed, the data product is made available to all stakeholders for consumption. In some cases, when the data product is a script that performs data transformation, or ETL, the last stage is to also schedule the frequency in which the script is to run. These steps are completed within the Connect product.

Deployment with Connect

There are a few ways to deploy content to Connect. By deployment, we mean moving the code, the dependent files, and the metadata concerning R and/or Python, and the packages that the data product uses. To learn about available options to deploy to Connect, see our article on deployments.

Package Manager

Here are two scenarios in which using Package Manager is needed for a successful promotion of code:

Some organizations do not allow servers to have access to the Internet. Actions, such as patching and upgrades are performed offline. This is called an air-gapped environment. This means that Workbench and Connect will not be able to download packages on-demand. Package Manager allows for someone in the enterprise to download CRAN manually and then perform the update offline. Package Manager becomes the source of packages for the other two products.
Many organizations use a combination of Workbench and the open-source desktop version of RStudio, called RStudio Desktop. Access to different sources of packages will vary from software that runs on someone’s laptop than the access of a central server. Using Package Manager ensures that both are able to access the exact same packages.

flowchart  BT

g(Hosted Git)


subgraph Prod
    prod(Posit Connect)
end

subgraph Test
    test(Posit Connect)
end

subgraph Dev
    dev(Posit Workbench)
end


rspm(Posit Package Manager)


Dev <--> g
Dev --> Test
g --> Prod
g --> Test
rspm -.-> Prod
rspm -.-> Dev
rspm -.-> Test

classDef server fill:#FAEEE9,stroke:#ab4d26
classDef product fill:#447099,stroke:#213D4F,color:#F2F2F2
classDef session fill:#7494B1,color:#F2F2F2,stroke:#213D4F
classDef element fill:#C2C2C4,stroke:#213D4F

class Dev,Test,Prod server
class dev,test,prod,rspm session
class g element

Server Environments

Minimal

We recommend that each component of Posit Team is installed in its own, independent server environment. Server environment here refers to a single server, or a cluster of multiple servers, such as those used to provide High Availability. There should be at minimum three server environments. In this mode, the Test and Production stages will occur in the same server environment.

flowchart  BT

dev(Posit Workbench)
test(/e29aefd4-ebb2-44f6-849c-ec8d8e66f170)
prod(/production-app)

subgraph Dev
    dev
end

rspm(Posit Package Manager)
g(Hosted Git)

subgraph Connect[Posit Connect]

    subgraph Test
        test
    end
    subgraph Prod
        prod
    end
    
end


Dev<-->g
Dev-->Test
g-->Prod
g-->Test
rspm-.->Prod
rspm-.->Dev
rspm-.->Test

classDef server fill:#FAEEE9,stroke:#ab4d26
classDef product fill:#447099,stroke:#213D4F,color:#F2F2F2
classDef session fill:#7494B1,color:#F2F2F2,stroke:#213D4F
classDef element fill:#C2C2C4,stroke:#213D4F

class Dev,Prod,Test server
class dev,test,prod,rspm,Connect session
class g element

Separate Test and Production

A preferable setup may be to have a separate server environment for Test and Production. This ensures that resources needed to serve data products that are already in Production will not be impacted by ongoing tests. Another reason to have separate server environments is to limit who can publish data products to Production. For example, the developer is able to deploy a data product to the Test server environment, but will need to request that IT deploy the final product to the Production server. That ensures that there are no changes made in the official version of the data product that were not fully tested and approved.

flowchart  BT

g(Hosted Git)


subgraph Prod
    prod(Posit Connect)
end

subgraph Test
    test(Posit Connect)
end

subgraph Dev
    dev(Posit Workbench)
end


rspm(Posit Package Manager)


Dev <--> g
Dev --> Test
g --> Prod
g --> Test
rspm -.-> Prod
rspm -.-> Dev
rspm -.-> Test

classDef server fill:#FAEEE9,stroke:#ab4d26
classDef product fill:#447099,stroke:#213D4F,color:#F2F2F2
classDef session fill:#7494B1,color:#F2F2F2,stroke:#213D4F
classDef element fill:#C2C2C4,stroke:#213D4F

class Dev,Prod,Test server
class dev,test,prod,rspm,Connect session
class g element

Testing server upgrades

Eventually, the servers themselves will need to be patched or upgraded. For example, the Posit software installed on the server may need to be upgraded. Before upgrading the servers used for code development and deployment, it is a good idea to test the changes in a separate server environment. These are called staging servers. These server environments are meant to mirror the servers that are in regular use. The staging servers are infrequently used, and usually only IT and maybe some developers will have access to them. They are meant to only confirm that software upgrades were successful.

Appendix

Why not a cron job inside Workbench?

There are cases when an R or Python script needs to run on a regular basis, and also for the foreseeable future. It is very common that over time those scripts grows, both in number and importance. Depending on a single developer to run all of the scripts becomes a problem. The solution for that is to automate the scripts.

Please be aware that at this point, those scripts are no longer considered to be “in-development.” When the enterprise, or a team in the enterprise, depends on these scripts to run on a regular and consistent basis, that is a Production script. As such, these should be moved to Connect.

There is also a practical reason to move the scripts to Connect. The cron job depends on the same user, with the same version of R or Python, and version of the packages to run the script on a regular frequency. Connect handles all the dependencies and the scheduling in a safe and consistent manner.

Connect isolates each data product that is deployed to it, so there are no issues with some data products using one version of a given package, while other data products use a different version of the same package. Connect makes sure that no package version collision exists.