import numpy as np import pandas as pd from hvplot.plotting import scatter_matrix
scatter_matrix shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.
This function is closely modelled on pandas.plotting.scatter_matrix.
df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D']) scatter_matrix(df, alpha=0.2)
df_sub = df[['A', 'B']].copy()
chart parameter allows to change the type of the off-diagonal plots.
scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')
diagonal parameter allows to change the type of the diagonal plots.
tools to include a selection tool like
box_select and an inspection tool like
hover permits further analysis.
scatter_matrix(df_sub, tools=['box_select', 'hover'])
df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))
c parameter allows to colorize the data by a given column, here by
'CAT'. Note also that the
diagonal_kwds parameter (equivalent to
hist_kwds in this case or
density_kwds for kde plots) allow to customize the diagonal plots.
scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))
df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])
Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to
datashade) parameter that uses Datashader to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.
The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by