hvplot.plotting.andrews_curves#

hvplot.plotting.andrews_curves(data, class_column, samples=200, alpha=0.5, width=600, height=300, cmap=None, colormap=None, **kwds)[source]#

Generate a plot of Andrews curves, for visualising clusters of multivariate data.

Andrews curves have the functional form:

\[f(t) = \frac{x_1}{\sqrt{2}} + x_2 \sin(t) + x_3 \cos(t) + x_4 \sin(2t) + x_5 \cos(2t) + \cdots\]

Where \(x\) coefficients correspond to the values of each dimension and \(t\) is linearly spaced between \(-\pi\) and \(+\pi\). Each row of frame then corresponds to a single curve.

Parameters:

dataDataFrame: Data to be plotted, preferably normalized to (0.0, 1.0)
class_columnstr: Column name containing class names
samplesint, optional: Number of samples to draw. Default is 200.
alphafloat, optional: The transparency of the lines. Default is 0.5.
cmap/colormapstr or colormap object, optional: Colormap to use for groups. Default to Colorcet’s glasbey_category10.

Returns:

objHoloViews object: The HoloViews representation of the plot.

Examples#

Basic Andrews curves plot#

This example shows how to create a simple Andrews curves plot from a dataframe with 4 features and a categorical column.

import hvplot
import numpy as np
import pandas as pd

np.random.seed(42)
n_samples = 50
df = pd.DataFrame({
    'feature_1': np.random.normal(0, 1, n_samples),
    'feature_2': np.random.normal(5, 2, n_samples),
    'feature_3': np.random.normal(-2, 1, n_samples),
    'feature_4': np.random.normal(3, 1.5, n_samples),
    'class': np.random.choice(['A', 'B', 'C'], size=n_samples)  # target class for coloring
})

hvplot.plotting.andrews_curves(df, class_column='class')

Example with penguins#

In this example we use 4 features from the penguins dataset and analyze how they are related with their species. We can see, for instance, that Gentoo penguins are quite clearly separated from the two other classes, and that they have consistently larger or higher values across the key features used. Adelie and Chinstrap show moderate overlap. This plot suggests that a classification model (e.g. logistic regression or decision tree) would likely perform well overall.

Note

It is important to normalize the features before plotting them. This example leverages scikit-learn and its StandardScaler transform.

import hvplot
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = hvplot.sampledata.penguins("pandas")
df_scaled = df
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]

hvplot.plotting.andrews_curves(df_scaled, class_column="species", samples=30)

With Matplotlib#

Andrews curves plots can quickly become pretty large and slow to explore with the Bokeh plotting backend. This example shows how to render such a plot with Matplotlib.

import hvplot
import pandas as pd
from sklearn.preprocessing import StandardScaler
hvplot.extension("matplotlib")

df = hvplot.sampledata.penguins("pandas")
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]

hvplot.plotting.andrews_curves(df_scaled, class_column="species")

hvplot.output(backend="bokeh")

Customize#

andrews_curves offers multiple options to customize the plot, with samples, alpha, and cmap.

import hvplot
import numpy as np
import pandas as pd

np.random.seed(42)
n_samples = 200
df = pd.DataFrame({
    'feature_1': np.random.normal(0, 1, n_samples),
    'feature_2': np.random.normal(5, 2, n_samples),
    'feature_3': np.random.normal(-2, 1, n_samples),
    'feature_4': np.random.normal(3, 1.5, n_samples),
    'class': np.random.choice(['A', 'B', 'C'], size=n_samples)  # target class for coloring
})

hvplot.plotting.andrews_curves(
    df, class_column='class',
    samples=10, alpha=0.3, cmap='Set1',
)

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.