Andrews Curves Plot#
An Andrews Curves plot of 4 features from the penguins dataset to analyze how they are related with the species. We can see, for instance, that Gentoo penguins are quite clearly separated from the two other classes, and that they have consistently larger or higher values across the key features used. Adelie and Chinstrap show moderate overlap. This plot suggests that a classification model (e.g. logistic regression or decision tree) would likely perform well overall.
import hvplot
import pandas as pd
from sklearn.preprocessing import StandardScaler
df = hvplot.sampledata.penguins("pandas")
df_scaled = df
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]
hvplot.plotting.andrews_curves(
df_scaled,
class_column="species",
samples=30,
title="Andrews Curves Plot (Bokeh)",
)
import hvplot
import pandas as pd
from sklearn.preprocessing import StandardScaler
hvplot.extension("matplotlib")
df = hvplot.sampledata.penguins("pandas")
df_scaled = df
cols = ["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df[cols])
df_scaled = pd.DataFrame(scaled_features, columns=cols)
df_scaled["species"] = df["species"]
hvplot.plotting.andrews_curves(
df_scaled,
class_column="species",
samples=30,
title="Andrews Curves Plot (Matplotlib)",
)
See also
This web page was generated from a Jupyter notebook and not all
interactivity will work on this website.