Statistical Plots#

In addition to the plots available via the plot interface, hvPlot makes a number of more sophisticated, statistical plots available that are modelled on pandas.plotting. To explore these, we will load the iris and stocks datasets from Bokeh:

import pandas as pd
import hvplot.pandas  # noqa

from bokeh.sampledata import iris, stocks 

iris = iris.flowers

Scatter Matrix#

When working with multi-dimensional data, it is often difficult to understand the relationship between all the different variables. A scatter_matrix makes it possible to visualize all of the pairwise relationships in a compact format. hvplot.scatter_matrix is closely modelled on pandas.plotting.scatter_matrix:

hvplot.scatter_matrix(iris, c="species")

Compared to a static Seaborn/Matplotlib-based plot, here it is easy to explore the data interactively thanks to Bokeh’s linked zooming, linked panning, and linked brushing (using the box_select and lasso_select tools).

Parallel Coordinates#

Parallel coordinate plots provide another way of visualizing multi-variate data. hvplot.parallel_coordinates provides a simple API to create such a plot, modelled on the API of pandas.plotting.parallel_coordinates():

hvplot.parallel_coordinates(iris, "species")

The plot quickly clarifies the relationship between different variables, highlighting the difference of the “setosa” species in the petal width and length dimensions.

Andrews Curves#

Another similar approach is to visualize the dimensions using Andrews curves, which are constructed by generating a Fourier series from the features of each observation, visualizing the aggregate differences between classes. The hvplot.andrews_curves() function provides a simple API to generate Andrews curves from a datafrom, closely matching the API of pandas.plotting.andrews_curves():

hvplot.andrews_curves(iris, "species")