Scattermatrix#
import numpy as np
import pandas as pd
from hvplot.plotting import scatter_matrix
scatter_matrix
shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.
This function is closely modelled on pandas.plotting.scatter_matrix.
df = pd.DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])
scatter_matrix(df, alpha=0.2)
df_sub = df[['A', 'B']].copy()
The chart
parameter allows to change the type of the off-diagonal plots.
scatter_matrix(df_sub, chart='bivariate') + scatter_matrix(df_sub, chart='hexbin')
The diagonal
parameter allows to change the type of the diagonal plots.
scatter_matrix(df_sub, diagonal='kde')
Setting tools
to include a selection tool like box_select
and an inspection tool like hover
permits further analysis.
scatter_matrix(df_sub, tools=['box_select', 'hover'])
df_sub['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df_sub))
The c
parameter allows to colorize the data by a given column, here by 'CAT'
. Note also that the diagonal_kwds
parameter (equivalent to hist_kwds
in this case or density_kwds
for kde plots) allow to customize the diagonal plots.
scatter_matrix(df_sub, c='CAT', diagonal_kwds=dict(alpha=0.3))
df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])
Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to True
the rasterize
(or datashade
) parameter that uses Datashader to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.
The following scatter matrix plot has 1,200,00 (12x100,000) points that are rendered efficiently by datashader
.
scatter_matrix(df, rasterize=True)