# Pandas

hvPlot has been designed as a simple plotting interface (one-line call is enough for most cases) to [many data libraries](../../data_libraries.ipynb). It was greatly inspired by Pandas' original plotting interface, that is mostly a convenient interface to Matplotlib's plotting API, and allows in fact to pass many arguments directly to Matplotlib. On the other hand, hvPlot's plotting interface is a convenient interface to [HoloViews](https://holoviews.org/), that similarly allows to pass many arguments directly to HoloViews. Matplotlib and HoloViews are different types of visualization library, the former being a pure plotting tool (i.e. it knows how to draw pixels on your screen) while the later is more of a data exploration tool. These differences explain some of the differences you will observe between Pandas' and hvPlot's plotting APIs.

Pandas offers a mechanism to [register a third-party plotting backend](https://pandas.pydata.org/docs/user_guide/visualization.html#plotting-backends). When registered, `<DataFrame|Series>.plot()` calls  are delegated to the third-party tool. hvPlot has implemented the required interface to be registered as Pandas' plotting backend. As a consequence there are two main ways to generate hvPlot plots from Pandas objects:
- By importing `hvplot.pandas` and using the `.hvplot()` namespace (recommended).
- By registering hvPlot as Pandas' plotting backend and using Pandas' `.plot()` namespace.

:::{note}
Pandas does not force third-party plotting tools like hvPlot to implement all of its plotting methods. It also does not enforce each method to implement specific arguments.
:::

As a summary about hvPlot's compatibility with Pandas' plotting API:
- hvPlot can be registered as Pandas plotting backend.
- As an convenient interface to HoloViews and not to Matplotlib, hvPlot does not aim to be 100% compatible with Pandas' API. However, Pandas users will find the plotting methods they are used to, and most of the generic arguments they accept. In a sense, hvPlot aims more for familiarity than compatibility.

For a more in-depth comparison between Pandas and hvPlot APIs, visit the [Pandas API](./Pandas_API.ipynb) reference that recreates the [Pandas chart visualization guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) using both APIs.

In [None]:
%matplotlib inline

In [None]:
import hvplot.pandas  # noqa
import hvsampledata
import numpy as np
import pandas as pd

df = hvsampledata.penguins('pandas')

## Switch Pandas backend to hvPlot

:::{hint}
This approach is an easy way (one-line change) to convert some code from generating plots with Pandas & Matplotlib to Pandas & hvPlot and see whether you like the output or not. Generally, we recommend installing the `hvplot` namespace on Pandas objects by importing `hvplot.pandas`, and invoking hvPlot via this namespace, e.g. `df.hvplot.line()`, as it can be adapted to other data libraries (e.g. if you use Dask, you can install the `hvplot` namespace on Dask objects with `import hvplot.dask`).
:::

:::{note}
This requires `pandas >= 0.25.0`.
:::

hvPlot can be registered as Pandas' plotting backend instead of Matplotlib with:

In [None]:
pd.options.plotting.backend = 'hvplot'

Once registered, hvPlot plots are generated when calling Pandas `.plot()`:

In [None]:
df.plot.scatter('bill_length_mm', 'bill_depth_mm')

:::{note}
To function correctly hvPlot needs to load some front-end (Javascript, CSS, etc.) content in a notebook. This is usually achieved as a side-effect of importing for example `hvplot.pandas`. In the example above, this step is done in the first cell that calls `.plot()`. It is important not to delete this cell to avoid running into hard-to-debug interactivity issues.
:::


### API comparison

| Kind | In Pandas | In hvPlot | Comment |
| - | - | - | - |
| {meth}`hvplot.hvPlot.area` | ✅| ✅| Alpha set to 0.5 automatically in Pandas when `stacked=False`, not in hvPlot. |
| {meth}`hvplot.hvPlot.bar`| ✅| ✅| |
| {meth}`hvplot.hvPlot.barh` | ✅| ✅| |
| {meth}`hvplot.hvPlot.bivariate`| ❌| ✅| |
| `DataFrame.boxplot`| ✅| ✅|  |
| {meth}`hvplot.hvPlot.box`| ✅| ✅| In Pandas `colors` can be used to specify the color of the components of the box plot, in hvPlot this can roughly be done via backend-specific style options. `sym` and `positions` are not supported in hvPlot. `vert` in Pandas can be replaced by `invert` in hvPlot. |
| {meth}`hvplot.hvPlot.density`| ✅| ✅| |
| {meth}`hvplot.hvPlot.errorbars`| ❌| ✅| Error bars can be set with `xerr` and `yerr` [in Pandas](https://pandas.pydata.org/docs/user_guide/visualization.html#plotting-with-error-bars) |
| {meth}`hvplot.hvPlot.heatmap`| ❌| ✅| |
| {meth}`hvplot.hvPlot.hexbin` | ✅| ✅| `reduce_C_function` in Pandas is named `reduce_function` in hvPlot.  |
| {meth}`hvplot.hvPlot.hist` | ✅| ✅| Stacking not supported in hvPlot. hvPlot uses `invert=True` instead of `orientation='horizontal'`. Pandas' `hist` method accepts a Numpy NdArray for `by` but hvPlot does not. |
| `DataFrame.hist` | ✅| ✅| Pandas' `DataFrame.hist()` plots the histograms of the columns on multiple subplots. hvPlot creates instead an overlay of histogram plots. To reproduce Pandas' behavior, you can set `subplots=True` to create a layout of plots (1 per column in this case), and additionally call `.cols(2)` on the object returned to lay the plots in a layout with a maximum number of 2 columns. |
| {meth}`hvplot.hvPlot.kde`| ✅| ✅| |
| {meth}`hvplot.hvPlot.labels` | ❌| ✅| |
| {meth}`hvplot.hvPlot.line` | ✅| ✅| `colormap` not yet supported in hvPlot, use `color` instead. |
| {meth}`hvplot.hvPlot.ohlc` | ❌| ✅| |
| {meth}`hvplot.hvPlot.scatter`| ✅| ✅| |
| {meth}`hvplot.hvPlot.step` | ❌| ✅| |
| {meth}`hvplot.hvPlot.table`| ✅| ✅| Pandas has a [whole API](https://pandas.pydata.org/docs/user_guide/style.html) dedicated to displaying and styling tables. It also offers {func}`pandas:pandas.plotting.table` to convert a DataFrame to a Matplotlib table |
| `pie`| ✅| ❌| Not yet implemented in HoloViews, see [this issue](https://github.com/holoviz/holoviews/issues/4800)|
| {meth}`hvplot.hvPlot.points` | ❌| ✅| For two independent variables, useful for geographic data for examples|
| {meth}`hvplot.hvPlot.violin` | ❌| ✅| |
| {func}`hvplot.plotting.andrews_curves` | ✅| ✅| |
| `autocorrelation_plot` | ✅| ❌| |
| `bootstrap_plot` | ✅| ❌| |
| {func}`hvplot.plotting.lag_plot` | ✅| ✅| |
| {func}`hvplot.plotting.parallel_coordinates` | ✅| ✅| |
| `radviz` | ✅| ❌| |
| {func}`hvplot.plotting.scatter_matrix` | ✅| ✅| |


## Notable differences

In [None]:
pd.options.plotting.backend = 'matplotlib'

This section aims to describe a few of the main notable differences between Pandas and hvPlot plotting APIs. More specific differences can be found in the [Pandas API](./Pandas_API.ipynb) page that recreates the [Pandas chart visualization guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html).

### Figure handling

A `plot` call in Pandas returns a Matplotlib `Axes` object. This object can be passed to Pandas' `plot` API via the `ax` argument, for example to overlay two different plots. The `ax` argument is not supported in hvPlot.

In [None]:
plot = df.plot.scatter('bill_length_mm', 'bill_depth_mm', figsize=(4, 3))

In [None]:
print(plot)

hvPlot's plotting API returns HoloViews objects. These objects are wrappers around the original dataset, whose rich representation is a plot.

In [None]:
plot = df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', hover_cols=['species'])
print(plot)

In [None]:
plot

In [None]:
plot.data.head()

Using HoloViews' API, this object can be further customized.

In [None]:
import holoviews as hv

plot.opts(
    height=300, width=300, color=hv.dim('species'),
    cmap='Category10', show_legend=False,
).hist(['bill_length_mm','bill_depth_mm'])

### Overlays and layouts

In Pandas, overlays are usually created by passing down an `Axes` object to another `plot` call via the `ax` argument. Layouts are created by setting `subplots=True`, and can be customized further with the `layout` argument, or with Matplotlib's API.

The approach is quite different in hvPlot as HoloViews offers some very convenient API with `*` for overlaying plots and `+` for laying out plots. Together with the `subplots` argument and HoloViews' `.cols(N)` method to limit the number `N` of plots per row, this forms an API flexible enough to handle most situations.

In [None]:
df1 = df.query('species == "Adelie"')
df2 = df.query('species == "Gentoo"')
ax = df1.plot.scatter('bill_length_mm', 'bill_depth_mm', color="blue", label="Adelie")
df2.plot.scatter('bill_length_mm', 'bill_depth_mm', color="green", label="Gentoo", ax=ax);

In [None]:
(
    df1.hvplot.scatter('bill_length_mm', 'bill_depth_mm', color="blue", label="Adelie")
    * df2.hvplot.scatter('bill_length_mm', 'bill_depth_mm', color="green", label="Gentoo")
)

In [None]:
dft = pd.DataFrame(np.random.randn(1000, 4), columns=list("ABCD")).cumsum()
dft.plot.line(subplots=True, layout=(2, 3), figsize=(8, 6));

In [None]:
dft.hvplot.line(subplots=True, width=220).cols(3)

In [None]:
dft['A'].hvplot.line(width=220) + dft['B'].hvplot.line(width=220)

### Plot dimensions

Setting plot dimensions in Pandas is done with the `figsize` argument that accepts a tuple *(width, height)* in *inches*. `figsize` is not supported in hvPlot, instead, plot dimensions are set with the `width` (default is `700`) and `height` (default is `700`) arguments that accept integer values in pixels.

In [None]:
df.plot.scatter('bill_length_mm', 'bill_depth_mm', figsize=(4, 3));

In [None]:
df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', width=350, height=250)

### Default color cycle and colormap

Pandas and hvPlot have different default color cycle and colormap.

The default color cycle in Pandas is Matplotlib's `tab10` (or ["Tableau 10"](https://www.tableau.com/blog/colors-upgrade-tableau-10-56782)) 10-colors sequence. hvPlot's default color cycle is inherited from HoloViews and is a [custom 12-colors sequence](https://github.com/holoviz/holoviews/issues/1591).

In [None]:
dfl = pd.DataFrame({col: [0, i+1] for i, col in enumerate('ABCDEFGHIJLKMN')})
dfl.plot();

In [None]:
dfl.hvplot().opts(legend_cols=2)

:::{note}

hvPlot's default color cycle can be set via HoloViews API, make sure to run this before importing the plotting extension (e.g. `hv.extension('bokeh')`, done implicitly when running `import hvplot.pandas`).

```python
import holoviews as hv
import matplotlib

hv.Cycle.default_cycles['default_colors'] = list(map(matplotlib.colors.rgb2hex, matplotlib.colormaps['tab10'].colors))

import hvplot.pandas

...
```
:::

The default categorical colormap in Pandas is a gray scale. In hvPlot, it is [`glasbey_category10`](https://colorcet.holoviz.org/user_guide/Categorical.html#starting-colors), a colormap with 256 colors that extends Bokeh's `Category10` colormap (originally from D3).

In [None]:
categories = list('ABCDEFGHIJLKMNOPQRST')
dfc = pd.DataFrame({
    'x': np.random.rand(len(categories)),
    'y': np.random.rand(len(categories)),
    'category': categories,
})
dfc['category'] = dfc['category'].astype('category')
dfc.plot.scatter('x', 'y', c='category');

In [None]:
dfc.hvplot.scatter(
    'x', 'y', c='category', legend='top_right'
).opts(legend_cols=3)

The default colormap for numerical values is `viridis` in Pandas and `kbc_r` (cyan to very dark blue) in hvPlot (see more info in [this issue](https://github.com/holoviz/holoviews/issues/3500)).

In [None]:
df.plot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g']);

In [None]:
df.hvplot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g'])

:::{note}
hvPlot does not allow yet to configure globally the default colormap. The `colormap` (or `cmap`) argument can be used instead locally.
:::

In [None]:
df.hvplot.scatter('bill_length_mm', 'flipper_length_mm', c=df['body_mass_g'], cmap='viridis')

### Marker size

The marker size in {meth}`hvplot.hvPlot.scatter` and {meth}`hvplot.hvPlot.points` plots can be controlled with the `s` argument. When converting a plot from Pandas to hvPlot, the size has to be increased to obtain an output visually similar.


In [None]:
df.plot.scatter('bill_length_mm', 'bill_depth_mm', s=50, figsize=(4, 4));

In [None]:
df.hvplot.scatter('bill_length_mm', 'bill_depth_mm', s=110, aspect=1)

```{toctree}
:hidden: true
:titlesonly: true

Comparison with Pandas API <Pandas_API>
```