Get Started

Explore production-grade data science

Welcome to Solutions!

This site is full of resources to help your data science or analytics team mature and thrive. This Getting Started page is an introduction to the concepts and philosophy that guide what it means for data science to be production-grade.

What is “Production-Grade?” data science?

Your goal is to deliver value from your analytics work.

Creating an analysis that is correct and useful isn’t enough to deliver value.

Production-grade data science is about taking your work - whether an app, a publication, or an API - and making sure that it’s available and accessible those who need it when they need it.

What is Production-grade

Production-grade means that your infrastructure and workflows keep your data science work:

  • Correct: the data product works as expected
  • Available: unplanned outages are rare or nonexistent
  • Safe: data, functionality, and code are all kept safe from unauthorized users or unintended alteration
  • Snappy: fast response times, ability to predict needed capacity for expanded traffic
  • Sturdy: design and test to minimize the likelihood that changes will break things

Considerations in a production-grade environment

As the one aiming to have impact with your data science products, there are a number of key considerations as you work in a production-grade environment as you go from development to deployment.

Write Effective Code

Anything goes as you’re developing your data science product for the first time – making discoveries and exploring the data.

But when that prduct is shared and scaled, efficiency, optimization, and ease of updating become much more important. This section will help you understand how to write code that accomplishes your data science goals and is production-grade.

Connect to Data Sources and Systems

It’s impossible to do analytics without connecting to outside data sources and systems.

This section includes advice on how to secure credentials, access data while maintaing permissions in deployment, and make analytics work available to other tools and systems.

Manage Packages and Reproduce Environments

Most of the work in R and Python is done via open source packages and libraries that extend the base language.

But managing package versions and dependencies can be a struggle.

How can you make sure code will just work when you put it into production, or come back to it months from now, or add a collaborator?

This section will help you design a package management strategy in a partnership between data scientists, IT Admins, and Security teams.

Secure Access

Putting data science into production requires a focus on security. Do you have the tooling to make sure only the right people can access or alter your projects? And that changes will only happen intentionally?

This section will help you figure out how to control access to data, applications, and deployed content.

Implement Operational Patterns

Good workflows are essential to seamless collaboration and deployment of data science products.

This section will include advice on the workflows and motions that are essential to a maturing team. These include best practices with version control and operational workflows like dev-test-prod, deploying with CI/CD, and auditing and monitoring content.

Choose the Right Architecture

One component of production-grade architecture is right-sizing the underlying infrastructure for the current need, preparing for future growth, integrating with external systems, managing workloads, and meeting requirements for availability.

This section will help you identify best practices for choosing the infrastructure to underly your open source data science platform.

Data science tooling in a production-grade environment

There are three distinct needs in a data science environment:

  1. Developers need a place to write code.
  2. Developers need a place to deploy projects for running and sharing.
  3. Developers and admins need a package source to distribute and manage the packages used to support the data science projects.

The three Posit professional products match these three needs:

  1. Development happens in Workbench.
  2. Deployment happens to Connect.
  3. Package Manager allows admins to manage packages and developers to use them.

In unison as the Posit Team bundle, or individually implemented, this tooling is uniquely suited to address the development motions and key considerations of open source data science in enterprise environments.