Comparison with Pandas API#

This page recreates the pandas chart visualization guide. It shows how to configure and control the output of pd.DataFrame.plot and pd.Series.plot to generate hvPlot plots. For comparison, the page includes both the Pandas & Matplotlib and Pandas & hvPlot plot versions.

Note

You will find many notes in this page that are used to highlight differences between Pandas and hvPlot.

Tip

We generally recommend using hvPlot by installing the hvplot namespace on Pandas objects with import hvplot.pandas.

import numpy as np
import pandas as pd

pd.options.plotting.backend = 'hvplot'

Basic Plotting: `plot`#

The plot method on Series and DataFrame is just a simple wrapper around hvPlot():

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

ts.plot(backend='matplotlib');

../../../_images/5d061e85933bea1c2b9a87e2e32451448371e1fda9bd51b072ab94c170c87975.png

ts.plot()

If the index consists of dates, they will be formatted nicely along the x-axis as per above.

Note

Dates will be sorted as long as sort_date=True.

On DataFrame, plot() is a convenience to plot all of the columns with labels:

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
df = df.cumsum()

df.plot(backend='matplotlib');

../../../_images/29a937c387384a7451430b0b563644ea1fe6a03f9375a0c4a9be70ca18208b3c.png

df.plot()

Note

The default colors used in hvPlot are different from pandas but you can pass your preferred colors as a list via the color keyword.

df.plot(color=['#1F77B4', '#FF7F0D', '#B5DDB5', '#D93A3B'])

You can plot one column versus another using the x and y keywords in plot():

df3 = pd.DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum()
df3['A'] = pd.Series(list(range(len(df))))
df3.plot(x='A', y='B', backend='matplotlib');

../../../_images/3c02edbe852466bb530cd4be490a6e912e7962646aa82ef75792602f0a542cac.png

df3.plot(x='A', y='B')

Other Plots#

Plotting methods allow for a handful of plot styles other than the default line plot. These methods can be provided as the kind keyword argument to plot(). These include:

bar or barh for bar plots
hist for histogram
box for boxplot
kde or density for density plots
area for area plots
scatter for scatter plots
hexbin for hexagonal bin plots
[NOT IMPLEMENTED]: ‘pie’ for pie plots

For example, a bar plot can be created the following way:

df.iloc[5].plot(kind='bar', backend='matplotlib');

../../../_images/7b88600f2418b7f7819b6265ebb176441f1e683e9bb8121d9dec399eda52826c.png

df.iloc[5].plot(kind='bar')

You can also create these other plots using the methods DataFrame.plot.<kind> instead of providing the kind keyword argument. This makes it easier to discover plot methods and the specific arguments they use.

In addition to these kinds, there are the DataFrame.hist(), and DataFrame.boxplot() methods, which use a separate interface.

Finally, there are several plotting functions in hvplot.plotting that take a Series or DataFrame as an argument. These include:

Scatter Matrix
Andrews Curves
Parallel Coordinates
Lag Plot

Note

The following plots are not currently supported in hvPlot:

Autocorrelation Plot
Bootstrap Plot
RadViz

Plots may also be adorned with errorbars or tables.

Bar plots#

For labeled, non-time series data, you may wish to produce a bar plot:

df.iloc[5].plot.bar(backend='matplotlib');

df.iloc[5].plot.bar()

Calling a DataFrame’s plot.bar() method produces a multiple bar plot:

df2 = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df2.plot.bar(backend='matplotlib');

../../../_images/7e22ac7131a5a2c9a7321851c97697af12ce478ecd50c969392b7e8e4961013a.png

df2.plot.bar()

To produce a stacked bar plot, pass stacked=True:

df2.plot.bar(stacked=True, backend='matplotlib');

../../../_images/9a827e6108f4675b75ac64b8b893e844175f6359db39946d299d0f4c8c410347.png

df2.plot.bar(stacked=True)

To get horizontal bar plots, use the barh method:

df2.plot.barh(stacked=True, backend='matplotlib');

../../../_images/3f2d9b2e9cc8bf62848158069eed4547d9f0a7ebad4ae3e84c59fd8d9c48c09a.png

df2.plot.barh(stacked=True)

Histograms#

Histogram can be drawn by using the DataFrame.plot.hist() and Series.plot.hist() methods.

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
                    'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])

df4.plot.hist(alpha=0.5, backend='matplotlib');

../../../_images/5507755d1ac931306ff57b57e28416d2efe6a9f20ea7fd081ce76161806636fe.png

df4.plot.hist(alpha=0.5)

Note

Stacking for histograms is not yet supported in hvPlot

# df4.plot.hist(stacked=True, bins=20);

You can pass other keywords. For example, horizontal and cumulative histogram can be drawn by invert=True and cumulative=True.

df4['a'].plot.hist(orientation='horizontal', cumulative=True, backend='matplotlib');

../../../_images/9f6c7af83c901cc7e1ef92c4daeaee81550be80d674d63c3b1d2a3ad6535077b.png

Note

hvPlot uses invert=True instead of orientation='horizontal'.

df4['a'].plot.hist(invert=True, cumulative=True)

The existing interface DataFrame.hist to plot histogram still can be used.

df['A'].diff().hist(backend='matplotlib');

../../../_images/b5e8517e1fea8862e2a68b580a877e16d00f7b5cb3354b9f071577c85d05a79f.png

df['A'].diff().hist()

Note

Pandas’ DataFrame.hist() plots the histograms of the columns on multiple subplots. hvPlot creates instead an overlay of histogram plots. To reproduce Pandas’ behavior, you can set subplots=True to create a layout of plots (1 per column in this case), and additionally call .cols(2) on the object returned to lay the plots in a layout with a maximum number of 2 columns.

df.diff().hist(color='k', alpha=0.5, bins=50, backend='matplotlib');

../../../_images/47aac268726d827e210325879fd05bd80da5dc7a04d2d64766cd5dc79a82628c.png

df.diff().hist(color='k', alpha=0.5, bins=50, subplots=True, width=300).cols(2)

The by keyword can be specified to plot grouped histograms:

data = pd.Series(np.random.randn(1000))
df5 = pd.DataFrame({'data': data, 'by_column': np.random.randint(0, 4, 1000)})

data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4), backend='matplotlib');

../../../_images/c15c6710b852232f3664d82d2572be415e7cd1c219094405babe094c1822ad63.png

Note

Pandas’ hist method accepts a Numpy NdArray for by but hvPlot does not, which is why in the example above an extra column 'by_column' was created.

df5.hist(by='by_column', width=300, subplots=True).cols(2)

In Pandas, the by keyword can also be specified in DataFrame.plot.hist().

data = pd.DataFrame(
    {
        "a": np.random.choice(["x", "y", "z"], 1000),
        "b": np.random.choice(["e", "f", "g"], 1000),
        "c": np.random.randn(1000),
        "d": np.random.randn(1000) - 1,
    },
)
data.plot.hist(by=["a", "b"], figsize=(10, 5), backend='matplotlib');

../../../_images/84cbd456ef9f81075c888c00dff644edc590923d2227bbb725bc7d958f467369.png

Note

This is not yet supported in hvPlot but a similar plot can be generated using HoloViews directly.

import holoviews as hv
from holoviews.operation import histogram

ds = hv.Dataset(data, kdims=['a', 'b'], vdims=['c', 'd'])
dsg = ds.groupby(['a', 'b'])
(
    hv.NdLayout(histogram(dsg, dimension='c'))
    * hv.NdLayout(histogram(dsg, dimension='d'))
).opts(
    hv.opts.Histogram(height=100, width=800)
).cols(1)

Box Plots#

Boxplot can be drawn by calling Series.plot.box() and DataFrame.plot.box(), or DataFrame.boxplot() to visualize the distribution of values within each column.

For instance, here is a boxplot representing five trials of 10 observations of a uniform random variable on [0,1).

df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box(backend='matplotlib');

../../../_images/27cd49a92fe8683f2de198d1bd041957a147584773622bafcc17e71f4b004ac9.png

df.plot.box()

Using the matplotlib backend, boxplot can be colorized by passing color keyword. You can pass a dict whose keys are boxes, whiskers, medians and caps. If some keys are missing in the dict, default colors are used for the corresponding artists. Also, boxplot has sym keyword to specify fliers style.

color = {
    "boxes": "DarkGreen",
    "whiskers": "DarkOrange",
    "medians": "DarkBlue",
    "caps": "Gray",
}

df.plot.box(color=color, sym='r+', backend='matplotlib');

../../../_images/9476baea93d6f60bedcad38f58b3a28535359dd01fbadb4350aedd18d5230d34.png

Note

Setting color this way or sym isn’t supported in hvPlot. However hvPlot supports passing down backend-specific style options that allow some level of customization.

df.plot.box(
    box_fill_color='white',
    box_line_color='DarkGreen',
    whisker_line_color='DarkOrange',
    outlier_color='red',
)

In Pandas, you can pass other keywords supported by matplotlib boxplot. For example, horizontal and custom-positioned boxplot can be drawn by vert=False and positions keywords.

df.plot.box(vert=False, positions=[1, 4, 5, 6, 8], backend='matplotlib');

../../../_images/b5ae23087c40f32de43efe87ae6c8a3f72ae3782985ecc75eaa8d24766798167.png

Note

vert and positions are not supported by hvPlot and the Bokeh extension. Instead, and for example, horizontal boxplots can be drawn by invert=True.

df.plot.box(invert=True)

The existing interface DataFrame.boxplot to plot boxplot still can be used.

df = pd.DataFrame(np.random.rand(10, 5))
df.boxplot(backend='matplotlib')

<Axes: >

../../../_images/b97d51b47cf5bed207de65b99376349ad789752a180e1ece67e73b5e31c9eb08.png

df.boxplot()

In Pandas, you can create a stratified boxplot using the by keyword argument to create groupings.

df = pd.DataFrame(np.random.rand(10,2), columns=['Col1', 'Col2'])
df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
df.boxplot(by='X', backend='matplotlib');

../../../_images/35d169747b66aeaa4f6cbf4125e346b051582f660f742ac17098580b213e7195.png

Note

In hvPlot this plot can be generated differently by transforming first the data to long format with Pandas and then using by and col to facet the plot.

df_melted = df.melt(id_vars=['X'], value_vars=['Col1', 'Col2'], var_name='Col')
df_melted.boxplot(col='Col', by='X')

You can also pass a subset of columns to plot, as well as group by multiple columns:

df = pd.DataFrame(np.random.rand(10,3), columns=['Col1', 'Col2', 'Col3'])

df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
df['Y'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])

bp = df.boxplot(column=["Col1", "Col2"], by=["X", "Y"], backend='matplotlib')

../../../_images/4b52b3f0978ec77b24dd8911fd7fe46ee5caf34fe407adc788e6a29cb46e41fe.png

Note

In hvPlot this plot can be generated differently by transforming first the data to long format with Pandas and then using by and col to facet the plot.

df_melted = df.melt(id_vars=['X', 'Y'], value_vars=['Col1', 'Col2'], var_name='Col')
df_melted.boxplot(col='Col', by=['X', 'Y'])

df_box = pd.DataFrame(np.random.randn(50, 2))
df_box['g'] = np.random.choice(['A', 'B'], size=50)
df_box.loc[df_box['g'] == 'B', 1] += 3
df_box.boxplot(row='g')

For more control over the ordering of the levels, we can perform a groupby on the data before plotting.

df_box.groupby('g').boxplot()

Area Plot#

You can create area plots with Series.plot.area() and DataFrame.plot.area(). Area plots are stacked by default. To produce stacked area plot, each column must be either all positive or all negative values.

When input data contains NaN, it will be automatically filled by 0. If you want to drop or fill by different values, use dataframe.dropna() or dataframe.fillna() before calling plot.

df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area(backend='matplotlib');

../../../_images/3f2f0a2422281befff25b2e74ff179bcd68087e59a91fe2af366699d429e05b7.png

df.plot.area()

To produce an unstacked plot, pass stacked=False. In Pandas, alpha value is set to 0.5 unless otherwise specified:

df.plot.area(stacked=False, backend='matplotlib');

../../../_images/1d3ec1565ef8ea4be85f1ff1ace082ee61d8d15e9ec6f6adf496c6a2f7b5a615.png

df.plot.area(stacked=False, alpha=0.5)

Scatter Plot#

Scatter plots can be drawn by using the DataFrame.plot.scatter() method. Scatter plots require that x and y be specified using the x and y keywords.

df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df["species"] = pd.Categorical(

    ["setosa"] * 20 + ["versicolor"] * 20 + ["virginica"] * 10

)
df.plot.scatter(x='a', y='b', backend='matplotlib');

../../../_images/f74a17d6fad82e847338a4ce784df02642b63790b741f6a10b33f25eebaac2e0.png

df.plot.scatter(x='a', y='b')

In Pandas, to plot multiple column groups in a single axes, repeat plot method specifying target ax. It is recommended to specify color and label keywords to distinguish each groups.

Note

In hvPlot, to plot multiple column groups in a single axes, repeat plot method and then use the overlay operator (*) to combine the plots.

ax = df.plot.scatter(x="a", y="b", color="DarkBlue", label="Group 1", backend='matplotlib')
df.plot.scatter(x="c", y="d", color="DarkGreen", label="Group 2", ax=ax, backend='matplotlib');

../../../_images/ec7ec6b1c85f86f76fea906177ece04a931854eb602f363269c95daf3549d0cc.png

plot_1 = df.plot.scatter(x='a', y='b', color='DarkBlue', label='Group 1')
plot_2 = df.plot.scatter(x='c', y='d', color='DarkGreen', label='Group 2')
plot_1 * plot_2

The keyword c may be given as the name of a column to provide colors for each point:

df.plot.scatter(x='a', y='b', c='c', s=50, backend='matplotlib');

../../../_images/7adc1274797c93f9c8f3f4865647415504e3dff99734d46c34659dc76a84ecbf.png

df.plot.scatter(x='a', y='b', c='c', s=50)

Note

The default colors for continuous labelling in hvPlot is the kbc_r cmap. You can change the colormap to the pandas default by using the cmap keyword.

df.plot.scatter(x='a', y='b', c='c', s=50, cmap='viridis')

If a categorical column is passed to c, then, in Pandas, a discrete colorbar will be produced:

df.plot.scatter(x="a", y="b", c="species", s=50, cmap="viridis", backend='matplotlib');

../../../_images/7c64201e53962bf144439d87413c3fc5de243b5f5d547fd6ff538b2dfb8ccb20.png

Note

In hvPlot, setting c does not display a colorbar but a legend. The default categorical colormap used is glasbey_category10.

df.plot.scatter(x="a", y="b", c="species", s=50)

You can pass other keywords supported by matplotlib scatter. The example below shows a bubble chart using a column of the DataFrame as the bubble size.

df.plot.scatter(x="a", y="b", s=df["c"] * 200, backend='matplotlib');

../../../_images/9753bdabd4ed6dcf5c6bd507fa7a6a37a1155655d2c51ede739a480366151245.png

Note

In hvPlot you may need to increase s to obtain markers with a size similar to the ones generated with Pandas and Matplotlib.

df.plot.scatter(x='a', y='b', s=df['c'] * 500)

Note

In hvPlot, the same effect can be accomplished using the scale option.

df.plot.scatter(x='a', y='b', s='c', scale=25)

Hexagonal Bin Plot#

You can create hexagonal bin plots with DataFrame.plot.hexbin(). Hexbin plots can be a useful alternative to scatter plots if your data are too dense to plot each point individually.

df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df['b'] = df['b'] + np.arange(1000)
df.plot.hexbin(x="a", y="b", gridsize=25, backend='matplotlib');

../../../_images/832a45c155af8913f3dced526b07281834f03ca0090d62cacde007305b0522e0.png

Note

We pass aspect=1 to hvPlot to generate a plot with dimensions similar to the one generated with Pandas and Matplotlib

df.plot.hexbin(x='a', y='b', gridsize=25, aspect=1)

A useful keyword argument is gridsize; it controls the number of hexagons in the x-direction, and defaults to 100. A larger gridsize means more, smaller bins.

By default, a histogram of the counts around each (x, y) point is computed. You can specify alternative aggregations by passing values to the C and reduce_C_function arguments. C specifies the value at each (x, y) point and reduce_C_function is a function of one argument that reduces all the values in a bin to a single number (e.g. mean, max, sum, std). In this example the positions are given by columns a and b, while the value is given by column z. The bins are aggregated with Numpy’s max function.

df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df['b'] = df['b'] + np.arange(1000)
df['z'] = np.random.uniform(0, 3, 1000)
df.plot.hexbin(x='a', y='b', C='z', reduce_C_function=np.max, gridsize=25, backend='matplotlib');

../../../_images/2d3cdd84ed4119468d700839a7a9fbbf050f43feae386c02013cc5bc3727a948.png

Note

In hvPlot, reduce_function is used instead of reduce_C_function.

df.plot.hexbin(x='a', y='b', C='z', reduce_function=np.max, gridsize=25, aspect=1)

Pie plot#

Note

Not yet implemented in hvPlot/HoloViews.

Density Plot#

You can create density plots using the Series.plot.kde() and DataFrame.plot.kde() methods.

ser = pd.Series(np.random.randn(1000))
ser.plot.kde(backend='matplotlib');

../../../_images/97ea1e30526e341c5c81f239f4495b6984a8e3bba56ad4d672d2f5e01e84a2e4.png

ser.plot.kde()

Plotting with missing data#

hvPlot tries to be pragmatic about plotting DataFrames or Series that contain missing data. Missing values are dropped, left out, or filled depending on the plot type.

(To be confirmed)

Plot Type	Nan Handling
Line	Leave gaps at NaNs
Line (stacked)	Fill 0’s
Bar	Fill 0’s
Scatter	Drop NaNs
Histogram	Drop NaNs (column-wise)
Box	Drop NaNs (column-wise)
Area	Fill 0’s
KDE	Drop NaNs (column-wise)
Hexbin	Fill 0’s
Pie	Not supported

Plotting Tools#

These functions can be imported from hvplot.plotting and take a Series or DataFrame as an argument.

Scatter Matrix Plot#

You can create a scatter plot matrix using the scatter_matrix function:

from pandas.plotting import scatter_matrix as pd_scatter_matrix

df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
pd_scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde');

../../../_images/bc4a0cee6a8bc98ee4425221e6ee3d0f414b4605e4a11d369d63438f809e6c04.png

from hvplot import scatter_matrix

scatter_matrix(df, alpha=0.2, diagonal='kde')

Note

In hvPlot, since scatter matrix plots may generate a large number of points (the one above renders 120,000 points!), you may want to take advantage of the power provided by Datashader to rasterize the off-diagonal plots into a fixed-resolution representation.

df = pd.DataFrame(np.random.randn(10000, 4), columns=['a', 'b', 'c', 'd'])

scatter_matrix(df, rasterize=True, dynspread=True)

Andrews Curves#

Andrews curves allow one to plot multivariate data as a large number of curves that are created using the attributes of samples as coefficients for Fourier series, see the Wikipedia entry for more information. By coloring these curves differently for each class it is possible to visualize data clustering. Curves belonging to samples of the same class will usually be closer together and form larger structures.

import hvsampledata

dfp = hvsampledata.penguins("pandas")
df_data = dfp[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']].sample(frac=0.3)
df_data = (df_data - df_data.min()) / (df_data.max() - df_data.min())
df_data['species'] = dfp['species']

from pandas.plotting import andrews_curves

andrews_curves(df_data, 'species');

../../../_images/2847533e6704d786893d7ba0e780b4ff7eed95f289d04b01853c7f185e83c1dc.png

from hvplot.plotting import andrews_curves

andrews_curves(df_data, 'species')

Parallel Coordinates#

Parallel coordinates is a plotting technique for plotting multivariate data, see the Wikipedia entry for an introduction. Parallel coordinates allows one to see clusters in data and to estimate other statistics visually. Using parallel coordinates points are represented as connected line segments. Each vertical line represents one attribute. One set of connected line segments represents one data point. Points that tend to cluster will appear closer together.

from pandas.plotting import parallel_coordinates

parallel_coordinates(df_data, 'species');

../../../_images/f14360615bb7016e15099c645deda504e83b7ee17287f656423974dae441f6a3.png

from hvplot.plotting import parallel_coordinates

parallel_coordinates(df_data, 'species')

Lag Plot#

Lag plots are used to check if a data set or time series is random. Random data should not exhibit any structure in the lag plot. Non-random structure implies that the underlying data are not random. The lag argument may be passed, and when lag=1 the plot is essentially data[:-1] vs. data[1:].

from pandas.plotting import lag_plot

spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000)
data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing))

lag_plot(data);

../../../_images/fb2611c3bd4b354e5ef7266ed6cb3c2fdbc2fb3c47c2c3e9a21abb2f5f9ba598.png

from hvplot.plotting import lag_plot

lag_plot(data)

Autocorrelation plot#

Note

Not yet implemented in hvPlot.

Bootstrap plot#

Note

Not yet implemented in hvPlot.

RadViz#

Note

Not yet implemented in hvPlot.

Plot Formatting#

Setting the plot style#

TBD

General plot style arguments#

Most plotting methods have a set of keyword arguments that control the layout and formatting of the returned plot:

ts.plot(style='k--', label='Series', backend='matplotlib');

../../../_images/e8fbd8c771c7d02ecbd018e2cac10bab5a11def569e5bfa3755910817d959576.png

Note

The style keyword to customize line plots isn’t supported in hvPlot. Instead, hvPlot offers some generic styling options like c (alias for color), and backend-specific styling options like line_dash.

ts.plot(c='k', line_dash='dashed', label='Series')

For each kind of plot (e.g. line, bar, scatter) any additional arguments keywords are passed along to the corresponding HoloViews object (hv.Curve, hv.Bar, hv.Scatter). These can be used to control additional styling, beyond what hvPlot provides.

Controlling the legend#

You may set the legend argument to False to hide the legend, which is shown by default.

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()

df.plot(legend=False, backend='matplotlib');

../../../_images/88d248d8d295e459b2c6663c068301b81985194f501e300ab50d742d0ba809d3.png

df.plot(legend=False)

Note

In hvPlot, you can also control the placement of the legend using the same legend argument set to one of: 'top_right', 'top_left', 'bottom_left', 'bottom_right', 'right', 'left', 'top', or 'bottom'.

df.plot(legend='top_left')

Controlling the labels#

You may set the xlabel and ylabel arguments to give the plot custom labels for x and y axis. By default, pandas will pick up index name as xlabel, while leaving it empty for ylabel.

df.plot(xlabel="new x", ylabel="new y", backend='matplotlib');

../../../_images/090812593ad656b12c6185a5c0b5d3922049e63f36491f90cb711787f433c7f5.png

df.plot(xlabel="new x", ylabel="new y")

Scales#

You may pass logy to get a log-scale Y axis.

ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = np.exp(ts.cumsum())
ts.plot(logy=True, backend='matplotlib');

../../../_images/b12fcd5f69a30d5efd56a52509c61709fd0f967169953e7e146e209a789515e6.png

ts.plot(logy=True)

See also the logx and loglog keyword arguments.

Note

In hvPlot, logz=True allows to get a log-scale on the color axis.

dfh = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
dfh['b'] = dfh['b'] + np.arange(1000)
dfh.plot.hexbin(x='a', y='b', gridsize=25, width=500, height=400, logz=True)

Plotting on a Secondary Y-axis#

In Pandas, to plot data on a secondary y-axis, use the secondary_y keyword:

df["A"].plot(backend='matplotlib')
df["B"].plot(secondary_y=True, style="g", backend='matplotlib');

../../../_images/fe5f0d8f34c3e73116f2393601dc33ac5f5d3d93015e327e4d17c0cfa5eef036.png

Note

hvPlot does not currently have built-in support for multiple y-axes, but you can achieve this effect by overlaying multiple plots and setting the multi_y option in .opts(). This allows each y-variable to be displayed on a separate axis.

(df.plot(y='A') * df.plot(y='B') * df.plot(y='C')).opts(multi_y=True)

Suppressing tick resolution adjustment#

pandas includes automatic tick resolution adjustment for regular frequency time-series data. For limited cases where pandas cannot infer the frequency information (e.g., in an externally created twinx), you can choose to suppress this behavior for alignment purposes.

Here is the default behavior, notice how the x-axis tick labeling is performed:

df['A'].plot(backend='matplotlib');

../../../_images/ce9d50597aadbeb867218dab1a8e975d738a8cf4fe2ed66946a903c20a42a385.png

df['A'].plot()

Using the x_compat parameter, you can suppress this behavior:

df["A"].plot(x_compat=True, backend='matplotlib');

../../../_images/394a4301b2297aa63a8392c3e8231032f466f28dc328fba074f7288ab147a32c.png

Note

In hvPlot, formatters can be set using format strings, by declaring bokeh TickFormatters, or using custom functions. See HoloViews Tick Docs for more information.

from bokeh.models.formatters import DatetimeTickFormatter

formatter = DatetimeTickFormatter(months='%b %Y')
df['A'].plot(yformatter='$%.2f', xformatter=formatter)

Subplots#

Each Series in a DataFrame can be plotted on a different axis with the subplots keyword.

df.plot(subplots=True, figsize=(6, 6), backend='matplotlib');

../../../_images/e0499e3ad9391b8aa01570537e93d51d5bda5a2b63eb49c33b83c9b677510b17.png

Note

In hvPlot, you can control the maximum number N of plots displayed on a row by calling .cols(N) on the HoloViews layout returned by the .plot() call.

df.plot(subplots=True, height=120).cols(1)

Controlling layout and targeting multiple axes#

In Pandas, the layout of subplots can be specified by the layout keyword. It can accept (rows, columns). The layout keyword can be used in hist and boxplot also. If the input is invalid, a ValueError will be raised.

df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False, backend='matplotlib');

../../../_images/3f2c1e64793ae0afc1959eae7511247a6cf5b63a1f72033b45654729317fbb3c.png

Note

In hvPlot, layout is not supported, use .cols(N) instead.

df.plot(subplots=True, sharex=False, width=160).cols(3)

Note

In hvPlot, setting sharex and/or sharey to True is equivalent to setting shared_axes=True.

Plotting with error bars#

In Pandas, plotting with error bars is supported in DataFrame.plot() and Series.plot(). Horizontal and vertical error bars can be supplied to the xerr and yerr keyword arguments to plot().

df3 = pd.DataFrame({'data1': [3, 2, 4, 3, 2, 4, 3, 2],
                    'data2': [6, 5, 7, 5, 4, 5, 6, 5]})
mean_std = pd.DataFrame({'mean': df3.mean(), 'std': df3.std()})
mean_std.plot.bar(y='mean', yerr='std', capsize=4, rot=0, backend='matplotlib');

../../../_images/42be23abf90c35f41392a78dbff0a9a33e908adb79c28b3e2c17cf004f5120e6.png

Note

In hvPlot, plotting with error bars is not directly supported in DataFrame.plot() and Series.plot(). Instead, users should fall back to using the hvplot namespace directly and the errorbars() plotting method it offers.

import hvplot.pandas  # noqa

(
    mean_std.plot.bar(y='mean', alpha=0.7)
    * mean_std.hvplot.errorbars(x='index', y='mean', yerr1='std')
)

Plotting tables#

In Pandas, plotting with matplotlib table is now supported in DataFrame.plot() and Series.plot() with a table keyword. The table keyword can accept bool, DataFrame or Series. The simple way to draw a table is to specify table=True. Data will be transposed to meet matplotlib’s default layout.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(7, 6.5))
df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
ax.xaxis.tick_top()  # Display x-axis ticks on top.
df.plot(table=True, ax=ax, backend='matplotlib');

../../../_images/8edbf3924263a6850e4cbac11bc866979b2330ae00301d5521d95a3c2f67bcff.png

Note

In hvPlot, this same effect can be accomplished by using hvplot.table() and creating a layout of the resulting plots with the + operator, arranged with .cols(1).

import hvplot.pandas  # noqa

(
    df.plot(xaxis=False, legend='top_right')
    + df.T.hvplot.table()
).cols(1)

Colormaps#

A potential issue when plotting a large number of columns is that it can be difficult to distinguish some series due to repetition in the default colors. To remedy this, DataFrame plotting supports the use of the colormap argument, which accepts either a colormap or a string that is a name of a colormap.

To use the cubehelix colormap, we can pass colormap='cubehelix'.

df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index)
df = df.cumsum()
df.plot(colormap='cubehelix', backend='matplotlib');

../../../_images/f78a7d3924cef45ec1046890703bc7cb41fa8010b52b87c5c2d0d8c22049bf7f.png

Note

colormap is not yet supported in hvPlot for line plots. However, in this case color can be used instead when set to a HoloViews Palette object, that can sample a given colormap at a regularly spaced interval (in this case, the number of overlaid lines).

import matplotlib
import holoviews as hv

# cubehelix palette only added when the matplotlib extension is loaded
hv.Palette.colormaps['cubehelix'] = matplotlib.colormaps['cubehelix']

df.plot(color=hv.Palette("cubehelix"))

Colormaps can also be used with other plot types, like bar charts:

dd = pd.DataFrame(np.random.randn(10, 10)).map(abs)
dd = dd.cumsum()
dd.plot.bar(colormap='Greens', backend='matplotlib');

../../../_images/c5e0675094599b34d0cbce6b74b032c7bbafc906cf47730e677c699f3bb45633.png

dd.plot.bar(colormap='Greens')

Parallel coordinates charts:

import hvsampledata
from pandas.plotting import parallel_coordinates

dfp = hvsampledata.penguins('pandas')[['species', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm']].sample(n=50)

parallel_coordinates(dfp, 'species', colormap='gist_rainbow');

../../../_images/4a7b0034fbbdce449770af51808a99bb7de67f7c7aae61c79898e2e0b6bc977d.png

from hvplot import parallel_coordinates

parallel_coordinates(dfp, 'species', colormap='gist_rainbow')

Andrews curves charts:

from pandas.plotting import andrews_curves

andrews_curves(dfp, 'species', colormap='winter')

<Axes: >

../../../_images/d5407b65cc1e3e8b38a1084d54a682cb14cbca2150bd03799c3cadf217194280.png

from hvplot.plotting import andrews_curves

andrews_curves(dfp, 'species', colormap='winter')

Plotting directly with HoloViews#

In some situations it may still be preferable or necessary to prepare plots directly with hvPlot or HoloViews, for instance when a certain type of plot or customization is not (yet) supported by pandas. Series and DataFrame objects behave like arrays and can therefore be passed directly to HoloViews functions without explicit casts.

import holoviews as hv

price = pd.Series(np.random.randn(150).cumsum(),
                  index=pd.date_range('2000-1-1', periods=150, freq='B'), name='price')
ma = price.rolling(20).mean()
mstd = price.rolling(20).std()

price.plot(c='k') * ma.plot(c='b', label='mean') * \
hv.Area((mstd.index, ma - 2 * mstd, ma + 2 * mstd),
        vdims=['y', 'y2']).opts(color='b', alpha=0.2)

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.

Comparison with Pandas API#

Basic Plotting: plot#

Other Plots#

Bar plots#

Histograms#

Box Plots#

Area Plot#

Scatter Plot#

Hexagonal Bin Plot#

Pie plot#

Density Plot#

Plotting with missing data#

Plotting Tools#

Scatter Matrix Plot#

Andrews Curves#

Parallel Coordinates#

Lag Plot#

Autocorrelation plot#

Bootstrap plot#

RadViz#

Plot Formatting#

Setting the plot style#

General plot style arguments#

Controlling the legend#

Controlling the labels#

Scales#

Plotting on a Secondary Y-axis#

Suppressing tick resolution adjustment#

Subplots#

Controlling layout and targeting multiple axes#

Plotting with error bars#

Plotting tables#

Colormaps#

Plotting directly with HoloViews#

Basic Plotting: `plot`#