hvplot.plotting.andrews_curves#
- hvplot.plotting.andrews_curves(data, class_column, samples=200, alpha=0.5, width=600, height=300, cmap=None, colormap=None, **kwds)[source]#
Generate a plot of Andrews curves, for visualising clusters of multivariate data.
Andrews curves have the functional form:
\[f(t) = \frac{x_1}{\sqrt{2}} + x_2 \sin(t) + x_3 \cos(t) + x_4 \sin(2t) + x_5 \cos(2t) + \cdots\]Where \(x\) coefficients correspond to the values of each dimension and \(t\) is linearly spaced between \(-\pi\) and \(+\pi\). Each row of frame then corresponds to a single curve.
- Parameters:
- dataDataFrame
Data to be plotted, preferably normalized to (0.0, 1.0)
- class_columnstr
Column name containing class names
- samplesint, optional
Number of samples to draw. Default is 200.
- alphafloat, optional
The transparency of the lines. Default is 0.5.
- cmap/colormapstr or colormap object, optional
Colormap to use for groups. Default to Colorcet’s
glasbey_category10
.
- Returns:
- objHoloViews object
The HoloViews representation of the plot.
See also
pandas.plotting.parallel_coordinates
matplotlib version of this routine
Examples#
Basic Andrews curves plot#
This example shows how to create a simple Andrews curves plot from a dataframe with 4 features and a categorical column.
import hvplot
import numpy as np
import pandas as pd
np.random.seed(42)
n_samples = 50
df = pd.DataFrame({
'feature_1': np.random.normal(0, 1, n_samples),
'feature_2': np.random.normal(5, 2, n_samples),
'feature_3': np.random.normal(-2, 1, n_samples),
'feature_4': np.random.normal(3, 1.5, n_samples),
'class': np.random.choice(['A', 'B', 'C'], size=n_samples) # target class for coloring
})
hvplot.plotting.andrews_curves(df, class_column='class')
Example with penguins#
In this example we use 4 features from the penguins dataset and analyze how they are related with their species
. We can see, for instance, that Gentoo penguins are quite clearly separated from the two other classes, and that they have consistently larger or higher values across the key features used. Adelie and Chinstrap show moderate overlap. This plot suggests that a classification model (e.g. logistic regression or decision tree) would likely perform well overall).
Note
It is important to normalize the features before plotting them. This example leverages scikit-learn
and its StandardScaler
transform.
import hvplot
from sklearn.preprocessing import StandardScaler
df = hvplot.sampledata.penguins("pandas")
df_scaled = df
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]
hvplot.plotting.andrews_curves(df_scaled, class_column="species", samples=30)
With Matplotlib#
Andrews curves plots can quickly become pretty large and slow to explore with the Bokeh plotting backend. This example shows how to render such a plot with Matplotlib.
import hvplot
from sklearn.preprocessing import StandardScaler
hvplot.extension("matplotlib")
df = hvplot.sampledata.penguins("pandas")
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]
hvplot.plotting.andrews_curves(df_scaled, class_column="species")
hvplot.output(backend="bokeh")
Customize#
andrews_curves
offers multiple options to customize the plot, with samples
, alpha
, and cmap
.
import hvplot
import numpy as np
import pandas as pd
np.random.seed(42)
n_samples = 200
df = pd.DataFrame({
'feature_1': np.random.normal(0, 1, n_samples),
'feature_2': np.random.normal(5, 2, n_samples),
'feature_3': np.random.normal(-2, 1, n_samples),
'feature_4': np.random.normal(3, 1.5, n_samples),
'class': np.random.choice(['A', 'B', 'C'], size=n_samples) # target class for coloring
})
hvplot.plotting.andrews_curves(
df, class_column='class',
samples=10, alpha=0.3, cmap='Set1',
)