Python is awesome because you can do anything. In this case, I’m going to show you how to configure a Jupyter notebook with integrated Plotly support to draw interactive graphs. This is very useful if you have to analyse something.
Introduction
In case you don’t know it, the Jupyter project exist to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. All started with Python and R, but nowadays, it is compatible with almost any popular language.
I’m going to focus in how to create a Jupyter notebook for Python using a Docker image. This will allow you to replicate the environment wherever you want with a single command and keep all the dependencies in a single place.
One of the key things is that I’m going to explain how to integrate the Plotly library into the notebook so we will be able to create interactive plots that can be exported with the notebook itself. This create an amazing tool to share reports within your organization.
Docker environment
Finding the images is easy. If you google it, you will find the repository jupyter/docker-stacks that contains the images that you need. but, what do you need? The Jupyter environment is growing fast and sometimes is hard to know what are we looking for. The best place to start, as always, is the documentation page. Skipping the examples that shows you how to create a notebook in a single command, I’m going to enter directly in the section “Selecting an image” to understand the difference and select what we need. Go to the link to read about the images and then come back here. Keep in mind that I’m looking for a Python environment with at least Pandas and Plotly installed.
The image “jupyter/base-notebook” could be a good options, but there are still useful missing dependencies provided by “jupyter/minimal-notebook”. “jupyter/r-notebook” is discarded, we want Python. “jupyter/scipy-notebook” contains everything we need except Plotly. It even contain more things that we need, like seaborn. The following images diverge further more from our needs so I’m going to use the jupyter/minimal-notebook and install the dependencies myself. Let’s create a Docker container.
The Base Image
The documentation warns that the images are regularly updated with backward incompatible changes. For this reason, it is recommendable to use a constant docker tag instead of the $LATEST reference. You can go directly to docker hub and find the current latest tag to start a new project. In this moment, the tag is “jupyter/minimal-notebook:ad3574d3c5c7”. Let’s download it and create a minimal notebook. The following command will do it.
|
|
When it finishes, you will have the notebook running in http://localhost:8888. Check it out. The notebook requires you to give the token that has been printed in the terminal. Just copy and paste it.
Now you can create a new Python notebook by clicking on the top left button “new” and selecting “Python 3”. This will open a new windows and you will be ready to go. Try to type a few Python commands and press Shift+Enter
. This will execute the command and move to the next cell.
|
|
This is cool, but we need a lot of dependencies. If you try to run import pandas
, it will complain. To install the dependencies, we need a Dockerfile and a requirements.txt.
The Docker file
It is time to create a folder somewhere in your system to put everything on it. I would recommend you to put it under VSC with git. The first thing you should create is a Dockerfile
and a requirements.txt
with the following content.
|
|
|
|
Now we can give it a try by building and running the container.
|
|
tip
--rm
deletes the container at exit.-it
creates an interactive tty. is not necessary but i like to run containers with it when I’m exploring.-p 8888:8888
exposes the port 8888 to my localhost.jupyter
is the name of the image given by me when building it
This will print in the terminal a series of messages. You should look for the one displaying the localhost URL and the token. If you click the link, It will open the browser with your new jupyter notebook environment.
[I 09:32:25.749 NotebookApp] The Jupyter Notebook is running at:
[I 09:32:25.749 NotebookApp] http://75fcb2e2b029:8888/?token=da7ef85e9afb1568794b3c222f633c144ba9c010f9de26b2
[I 09:32:25.749 NotebookApp] or http://127.0.0.1:8888/?token=da7ef85e9afb1568794b3c222f633c144ba9c010f9de26b2
So it is done! We can star playing with it.
Enable JupyterLab (Optional)
Looks like the world is always moving and now we have the JupyterLab web interface instead of the classic Jupyter Notebook. We can easily enable it by adding an environment variable at the end of the Dockerfile. It is still a little bit unstable but I think that It will be quickly improved.
Also, I’ve added some essential extensions for me directly in the dockerfile.
|
|
From my point of view, this is a must just for the vim extension.
Setting up Plotly
By default, Plotly should work without any extra setup. The problem is that in some occasions you have to change the render configuration. This is really well explained in the Plotly Renderers documentation. You can use the following snippet to check your default render and to modify it
|
|
It’s time to run a sample code to see if we can draw a plot. Just copy and paste this into a cell.
|
|
You should see something like this, but with a different style.
Dict syntax vs graph_objects
In the previous example, the figure is build using the graph_objects
. This is not a bad idea, but for simple cases, I prefer to create the dict by my self. This is basically the same idea. Let’s compare.
tip
Quoting the docs [Creating and Updating Figures]:
The goal of plotly.py is to provide a pleasant Python interface for creating figure specifications for display in the Plotly.js JavaScript library. In Plotly.js, a figure is specified by a declarative JSON data structure, and so the ultimate responsibility of plotly.py is to produce Python dictionaries that can be serialized into a JSON data structure that represents a valid figure.
|
|
|
|
If you want to get the json representation of the graph_objects
, there is a method to_json
that is what you expect. There is also an option to convert the dict to a graph_objects
using go.Figure
and passing the dictionary as parameter.
Working with Pandas
The next big question is. How do we draw pandas dataframes? Thankfully, we have Python List Comprehensions that make things much easier. First let’s fetch some data. I’ve chossen data from Airbnb Barcelona from here. Loading the data is straight forward thanks to pandas. The second instructions is to parse the price values into float values to allow calculations.
|
|
Okey, now let’s draw the mean price of a room based on the number of room. I like to wrap my plots with a function, but this is just personal preference.
|
|
This produces an ugly plot, but let’s focus on just the data for now.
This is really easy to follow, you just have to put the columns that you want to draw and it works! Provably you will find yourself wanting to add categories, so let’s do it. This is where list comprehension is essential. Let’s include a category by the cancellation policy.
|
|
This may look complicated, but the idea behind is really simple. We get the unique policy values and create a trace for each one. To select the appropriate data, we filter the dataframe by the given policy. To identify it, we assign the trace name the value of the policy.
with this technique, you can draw almost anything in a dataframe using plotly. If you are familiar with working with big datasets, you may notice that working in this way is not efficient. The good new are that you will hit plotly limitations before you have to think a better way of generating the plot.
Export figure as json
To draw the figures in the blog, I’ve exported the figure in json format and used the plot shortcode. This is cool because allows me to generate complex structures like the category groupby from the previous example and display it here in a dynamic plot with just a copy/paste operation.
fist thing is to import the json package and modify the plot code like this.
|
|
the to_list()
is important because if not, the json generated contains the dataframe data. Then the json.dumps
just works. Here is the output.
|
|
using this json body in the plot shortcode, generates the graph above.
This is not the only way of generating json. The package plotly.io has an option pio.to_json
that is much more robust. This function handles the dataframes without the to_list
option and also prints style configuration. The only drawback is that the output is much bigger. This is the plot generated from that json.
docker-compose recipe
And to finish this howto, I’m going to leave a repository with a docker-compose configured to create this environment. This way, you can easily create the environment wherever you want. It also includes a volume under the work folder to ensure that your notebooks survive when you stop the container.
Just keep in mind that this is a basic template. You will need to modify it to include credentials, other libraries, more extenssions… whatever you want.
Now just run this, and you are ready to code!
|
|
Conclussion
If you are planning to understand some data, you should start with an environment like this. With Pandas is really easy to transform data. With plotly you can create amazing interactive plots. And if you need anything else, remember that you are using Python and everything is already implemented in Python.
Thank you for reading and feel free to leave a comment bellow, I will be really happy to hear from you.
References
extra