Interactive Data Visualization in Python Using Bokeh
There are two types of data visualizations: exploratory and explanatory. Explanatory analysis is what happens when you have something specific you want to show an audience. The aim of explanatory visualizations is to tell stories - they’re carefully constructed to surface key findings.
Exploratory analysis, on the other hand, is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory visualizations, “create an interface into a dataset or subject matter… they facilitate the user exploring the data, letting them unearth their own insights: findings they consider relevant or interesting.”
In a previous series of posts on exploratory data analysis (EDA) - EDA 1, EDA
2, EDA 3 and EDA 4, we have covered static plotting in python using major
libraries like matplotlib, seaborn, plotnine, and pandas.
plotnine is an implementation
of a grammar of graphics in Python, based on the ggplot2 library in R. The grammar allows users
to compose plots by explicitly mapping data to the visual objects that make up the plot.
In this article, we will focus on EDA using interactive plots. More often than not, exploratory visualizations are easier when they are interactive!
Although there are few libraries in python that can help us make interactive plots, I find bokeh and holoviews to be the only ones that can cover most use cases. Others like plotly and pygal seem to be too specific and mpld3 is no longer being actively maintained.
bokeh provides fundamental blocks for making interactive plots, following the grammar of
holoviews on the hand uses bokeh as back-end to provide high level APIs for making
plots. All of these interactive plots can be viewed in a browser and are aided by corresponding
Embedding bokeh Plots in Web Pages
In order to incorporate bokeh figures in a web page, you will first need to include following
js files in your page:
The “-widgets” files are only necessary if your document includes bokeh widgets. Similarly, the “-tables” files are only necessary if you are using Bokeh data tables in your document.
Then you can use the
bokeh.embed.components() to generate relevant code for your plots. This
function returns a
<script> that contains the data for your plot, together with an accompanying
<div> tag that the plot view is loaded into. These tags can be used in HTML documents however you
from bokeh.plotting import figure from bokeh.embed import components plot = figure() plot.circle([1, 2], [3, 4]) script, div = components(plot)
<script> will look something like:
<div> will look something like:
<div class="bk-root" id="9574d123-9332-4b5f-96cc-6323bef37f40"></div>
There will be one
<div> for each of your plots and they should be placed at where you want your
plot to appear. The
<script> section should be placed in a typical place - the bottom of the
<body> section for late loading.
Bokeh has built-in support for various types of interactions (like pan, wheel zoom, box zoom, reset and save etc.) on all plots. Additionally, all of such interactions can be customized.
In the following sections, we will look at few major types of interactions that are required typically in an exploratory plot.
Visualization of high dimensional data is a pretty common task in data science projects. The two most common algorithms to project high dimensional data to 2-dimensional space are TSNE and UMAP. The scikit-learn and umap-learn python libraries provide a neat implementation of these algorithms.
In this post, as an example, we will use the fashion MNIST data to look at its TSNE and UMAP embeddings. We can first load the data from the pytorch library. We will load only the training data and save both images and labels in a pandas dataframe.
We will also create randomized permutation of indices so that we can access random elements of our data.
rndperm = np.random.permutation(df.shape)
Now, we can calculate the tsne and umap features. For faster computation, we will use only random 7000 samples.
Now, we can use the resulting arrays
umap_results to make bokeh plots. In
particular, we will use scatter plots to compare two embeddings. To make more sense of the data,it
would be great if hovering over a point could show the corresponding image. We will enable that
using a customized
HoverTool() tool. The constructor for the
HoverTool object takes a
tooltips option in the form of html code, that represents what is shown when one hovers over a
point. The data is provided in terms of arrays in the
ColumnDataSource, by prefixing column names
with ‘@’ symbol.
Notice, how the two plots are linked - If you select some points in one, it will highlight the corresponding points in other!
Also notice the trick used in the right plot of UMAP embeddings to move legends outside the plot
circle() has an option for
legend, however this leads to legend being shown
inside the main plot region!
Although we already have seen above how one can enable linked plots. In this example, I want to
highlight a different kind of linking. It’s often desired to link pan or zooming actions across
many plots. All that is needed to enable this feature is to share range objects between
In particular, we want to look at the effect of average number of rooms per dwelling (RM), per capita crime rate by town (CRIM) and pupil-teacher ratio ( PTRATIO) on the sales price of houses. We can visualize such a correlation by scatter plots of each of these variables wrt price.
In the following plot, with “pan” tool selected (the first one that looks like a +-like anchor
symbol), if you drag any one of the plots along x-axis all of others will move too! This is enabled
x_range be shared to all plots. Notice, similar to the previous plot, we can still do
selection across all three plots since all plots share a common
In the final example on types of interactive plots, I want to highlight a very different type of desired interactions - filter/select data on the fly and keep updating the plot! I also want to show the plots of maps and geo locations in bokeh. I will be using the San Fransisco Crime dataset to showcase this.
Let us first download the file from abve link and load it as a dataframe:
Now, we want to look at the crime rate at different days of weeks. We want to view this interactively. Users can choose All or a particular day and our plot should show us distribution of crime for that/those days on the map. I will be using the google maps API for displaying the map. You can get your own API at the this link.
We will use the
Seelct bokeh widget to let users choose the day of the week to visualize. Notice
the use of
callback method to implement interaction between our widget and plot.
You can choose the Day of Week in the Selection dropdown menu at the top and see how the plot
updates itself. For this example, I have restricted to showing only 50 entries for each day to keep
js file small.
High level bokeh plots using holoviews Library
By now you might have noticed, bokeh provides only low level APIs for plotting. Theoretically this enables to us plot any kind of complex plots. However, for common day-to-do plots like box plots, histograms etc. we need to write a lot of code! Luckily, holoviews library comes to our rescue!
In the final example, I will show a simple example of box and histogram plots on [Boston Housing] data using holoviews.
First we want to look at the distribution of prices for houses with different room sizes. We will
first need to bin no. of rooms using
cut() method of pandas. Then we can make Box (Whisker) plot
Notice, plotting was just a single line in holoviews! Furthermore, we could get corresponding bokeh figure from it and apply all modifications from bokeh. This makes it easy to use as well as quite customizable.
To illustrate making of histogram plots, we can take a look at the overall distribution of house
prices. We will first calculate the histograms using
numpy and then plot it using holoviews.
We get an interactive histogram plot with a single line of code!
Similar to the box plot, we customized it by getting the corresponding bokeh figure. We have touched only the simplest of plots using holoviews. If you look at their web page, you can make pretty complex interactive figures quite easily!
Hopefully, this was enough to convince to start using interactive plots for some of your EDA. Go through the APIs of bokeh and holoviews to find additional details. Bokeh also provides a nice tutorial for new users. If you have any question regarding any type of plot, feel free to leave a comment below!