Statistical Plots#

In addition to the plots available via the plot interface, hvPlot makes a number of more sophisticated, statistical plots available that are modelled on pandas.plotting. To explore these, we will load the iris and stocks datasets from Bokeh:

import pandas as pd
import hvplot.pandas  # noqa

from bokeh.sampledata import iris, stocks 

iris = iris.flowers

Scatter Matrix#

When working with multi-dimensional data, it is often difficult to understand the relationship between all the different variables. A scatter_matrix makes it possible to visualize all of the pairwise relationships in a compact format. hvplot.scatter_matrix is closely modelled on pandas.plotting.scatter_matrix:

hvplot.scatter_matrix(iris, c="species")

Compared to a static Seaborn/Matplotlib-based plot, here it is easy to explore the data interactively thanks to Bokeh’s linked zooming, linked panning, and linked brushing (using the box_select and lasso_select tools).

Parallel Coordinates#

Parallel coordinate plots provide another way of visualizing multi-variate data. hvplot.parallel_coordinates provides a simple API to create such a plot, modelled on the API of pandas.plotting.parallel_coordinates():

hvplot.parallel_coordinates(iris, "species")

The plot quickly clarifies the relationship between different variables, highlighting the difference of the “setosa” species in the petal width and length dimensions.

Andrews Curves#

Another similar approach is to visualize the dimensions using Andrews curves, which are constructed by generating a Fourier series from the features of each observation, visualizing the aggregate differences between classes. The hvplot.andrews_curves() function provides a simple API to generate Andrews curves from a datafrom, closely matching the API of pandas.plotting.andrews_curves():

hvplot.andrews_curves(iris, "species")

Once again we can see the significant difference of the setosa species. However, unlike the parallel coordinate plot, the Andrews plot does not give any real quantitative insight into the features that drive those differences.

Lag Plot#

Lastly, for the analysis of time series hvplot offers a so called lag plot, implemented by the hvplot.lag_plot() function, modelled on the matching pandas.plotting.lag_plot() function.

As an example we will compare the closing stock prices of Apple and IBM from 2000-2013 using a lag of 365 days:

index = pd.DatetimeIndex(stocks.AAPL['date'])
stock_df = pd.DataFrame({'IBM': stocks.IBM['close'], 'AAPL': stocks.AAPL['close']}, index=index)

hvplot.lag_plot(stock_df, lag=365, alpha=0.3)

Using this plot it becomes apparent that Apple was significantly more volatile over the analyzed time scale. In other words, its price at a particular point in time sometimes differed significantly from the price 365 days in the past. This also becomes visible in a simple line chart of the same data:

stock_df.hvplot.line()

These plot types can help you make sense of complex datasets. See holoviews.org for many other plots and tools that can be used alongside those from hvPlot for other purposes.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.