Scattermatrix#

import numpy as np
import pandas as pd
from hvplot.plotting import scatter_matrix

scatter_matrix shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.

This function is closely modelled on pandas.plotting.scatter_matrix.

df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])

scatter_matrix(df, alpha=0.2)

df_sub = df[['A', 'B']].copy()

The chart parameter allows to change the type of the off-diagonal plots.

scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')

The diagonal parameter allows to change the type of the diagonal plots.

scatter_matrix(df_sub, diagonal='kde')

Setting tools to include a selection tool like box_select and an inspection tool like hover permits further analysis.

scatter_matrix(df_sub, tools=['box_select', 'hover'])

df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))

The c parameter allows to colorize the data by a given column, here by 'CAT'. Note also that the diagonal_kwds parameter (equivalent to hist_kwds in this case or density_kwds for kde plots) allow to customize the diagonal plots.

scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))

df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])

Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to True the rasterize (or datashade) parameter that uses Datashader to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.

The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by datashader.

scatter_matrix(df, rasterize=True)

When rasterize (or datashade) is toggled it’s possible to make individual points more visible by setting dynspread=True or spread=True. Head over to the Working with large data using datashader guide of HoloViews to learn more about these operations and what parameters they accept (which can be passed as kwds to scatter_matrix).

scatter_matrix(df, rasterize=True, dynspread=True)

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.