Python with Jupyter Notebooks
Publishing Jupyter Notebooks
You can publish Jupyter notebooks to Connect. The Jupyter Notebook extension for Connect (rsconnect-jupyter
) allows you to publish Jupyter notebooks with the press of a button. Once published on Connect, these notebooks can be scheduled for updates or refreshed on demand.
Jupyter end-to-end flow and best practices
Jupyter Notebooks were designed to be run in a single-user environment, where the user can also act in the role of administrator. As a result, notebooks do not have built-in support for:
- managing virtual environments
- managing package installation
- easily accessing a terminal
JupyterLab is a more fully-featured IDE and as such can be easier to use (for example, it provides readier access to a terminal). However, Posit currently offers no JupyterLab extensions, so all publishing from JupyterLab must be done via the rsconnect-python
CLI.
Below, we outline one possible path for working with Jupyter Notebooks that provides you, the developer, with a consistent, isolated and reproducible environment that works well on Workbench and simplifies publishing to Connect.
Creating a new project
We first create a directory to choose our project and virtual environment.
To do this, we use the Jupyter terminal:
Remember to replace <PROJECT-NAME>
with an appropriate name for your project. Avoid using spaces in the directory names, as this can cause problems with registering your new Jupyter kernel in later steps.
$ mkdir <PROJECT-NAME>
$ cd <PROJECT-NAME>
Next, create a virtual environment for your project. In this example, we’re using venv
as the name of our virtual environment. You can use any name you prefer.
$ /opt/python/3.9.7/bin/python -m venv venv
Then activate your virtual environment.
$ . ./venv/bin/activate
Now register your virtual environment as a Jupyter kernel.
(venv)$ python -m pip install ipykernel
(venv)$ python -m ipykernel install --name "<PROJECT-NAME>" --user
You can now install additional packages using pip
if you’d like, alternatively, you can begin working inside a notebook.
Using Jupyter Notebooks
In the Jupyter UI, navigate to your project folder and create a new notebook using your newly registered kernel. If your kernel does not show up, you may need to refresh the page in your browser.
Ensure the Notebook is saved to the project directory you created earlier.
You may now use Jupyter as normal.
If you need to install additional packages, you must either return to the command line, navigate to the project directory, activate the virtual environment and install using pip
, or install within the notebook using the following commands:
import sys
!{sys.executable} -m pip install numpy
Take care never to use !pip install package
as this will use the system pip
and not the one associated with your virtual environment. This can result in packages being installed to your user environment instead of the virtual environment, or in some cases, packages failing to install altogether.
It is important to install in this way to ensure that packages are installed to the appropriate environment. Please remember however, that any such package installation commands should be removed from your Notebook prior to publication on Connect.
Publishing to Connect
There are two options for publishing to Connect:
- Use the push-button deployment in the Workbench hosted Jupyter Notebook. Push the “publish” button and follow the on-screen prompts.
- Install the
rsconnect-python
package and use thersconnect
command line tool. In your virtual environment, you can runrsconnect --help
for more info.
Checking your project into version control
Version control (for example, git) is an essential part of all good software development. In order to allow your collaborators to restore the virtual environment, you need to check in a requirements.txt
file.
This file can be created in one of two ways:
- Return to the terminal, navigate to the project directory, activate the virtual environment and then run
pip freeze > requirements.txt
- Within the Jupyter Notebook, create a cell that contains the following:
import sys
!{sys.executable} -m pip freeze > requirements.txt
Ensure the requirements.txt
file exists before removing this cell.
Managing Jupyter kernels
Over time, you may build up lots of available Jupyter kernels.
These can be managed from the command line. For example, to list all the available kernels:
$ jupyter kernelspec list
Or to remove an old unused kernel:
$ jupyter kernelspec remove <KERNEL-NAME>
Notes and acknowledgements
Some of the content in this document was adapted from other sources:
- Some of the info on working with different Kernels came from the RStudo Workbench documentation.
- Additional background on Jupyter Notebook environments and methods for installing packages inside of Jupyter Notebooks was adapted from a blog post by Jake VanderPlas.
- Further information on using virtual environments as kernels in Jupyter Notebooks was obtained from a blog post by Nikolai Janakiev.
- Results and advice from a study of 1.4 million Jupyter Notebooks.
Additional debugging info
The easiest mistake to make in the processes above is to add the new kernel from outside of the virtual environment.
The following example shows the kernel.json
two kernels registered from the same location. The first shows a kernel registered from outside a virtual environment and the second from within.
Notice how the first argv
parameter differs in each example.
From outside the virtual environment we have captured the main python installation instead of the virtual environment version:
$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jpy-test2/kernel.json
{
"argv": [
"/opt/python/3.9.6/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "jpy-test2",
"language": "python",
"metadata": {
"debugger": true
}
}
And from inside the virtual environment we correctly capture the path the python binary within the environment:
$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jupyter-test/kernel.json
{
"argv": [
"/usr/home/mark.sellors/jupyter-test/venv/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "jupyter-test",
"language": "python",
"metadata": {
"debugger": true
}