hvPlot 0.10 has just been released! Checkout the blog post and support hvPlot by giving it a 🌟 on Github.


import numpy as np
import pandas as pd
from hvplot.plotting import scatter_matrix

scatter_matrix shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.

This function is closely modelled on pandas.plotting.scatter_matrix.

df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])

scatter_matrix(df, alpha=0.2)
df_sub = df[['A', 'B']].copy()

The chart parameter allows to change the type of the off-diagonal plots.

scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')

The diagonal parameter allows to change the type of the diagonal plots.

scatter_matrix(df_sub, diagonal='kde')

Setting tools to include a selection tool like box_select and an inspection tool like hover permits further analysis.

scatter_matrix(df_sub, tools=['box_select', 'hover'])
df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))

The c parameter allows to colorize the data by a given column, here by 'CAT'. Note also that the diagonal_kwds parameter (equivalent to hist_kwds in this case or density_kwds for kde plots) allow to customize the diagonal plots.

scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))
df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])

Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to True the rasterize (or datashade) parameter that uses Datashader to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.

The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by datashader.

scatter_matrix(df, rasterize=True)