Resampling Options#

Notes

  1. Most of the options below require the Datashader library to be installed. The downsample option should preferably be used with the tsdownsample library installed.

  2. To dive deeper into the Datashader-related option, users are encouraged to read HoloViews’ Large Data user guide and the Datashader website, in particular the Plotting pitfalls user guide.

  3. Most of the examples below require to be run in a Jupyter notebook to experience their full interactivity (dynamic resampling after zoomin/panning).

Performance related options for handling large datasets, including downsampling, rasterization, and dynamic plot updates:

Parameters

Description

aggregator (str, datashader.Reduction, or None, default=None)

Aggregator to use when applying rasterize or datashade operation (valid options include 'mean', 'count', 'min', 'max' and more, and datashader reduction objects)

datashade (bool, default=False)

Whether to apply rasterization and shading (colormapping) using the Datashader library, returning an RGB object instead of individual points

downsample (bool or str or None, default=None)

Controls the application of downsampling to the plotted data, which is particularly useful for large timeseries datasets to reduce the amount of data sent to browser and improve visualization performance.

Acceptable values:

  • False: No downsampling is applied.

  • True: Applies downsampling using HoloViews’ default algorithm (LTTB - Largest Triangle Three Buckets).

  • 'lttb': Explicitly applies the Largest Triangle Three Buckets algorithm. Uses tsdownsample if installed.

  • 'minmax': Applies the MinMax algorithm, selecting the minimum and maximum values in each bin. Requires tsdownsample.

  • 'm4': Applies the M4 algorithm, selecting the minimum, maximum, first, and last values in each bin. Requires tsdownsample.

  • 'minmax-lttb': Combines MinMax and LTTB algorithms for downsampling, first applying MinMax to reduce to a preliminary set of points, then LTTB for further reduction. Requires tsdownsample.

Other string values corresponding to supported algorithms in HoloViews may also be used.

dynspread (bool, default=False)

For plots generated with datashade=True or rasterize=True, automatically increase the point size when the data is sparse so that individual points become more visible.

max_px (int, default=3)

The maximum size in pixels for dynamically spreading elements in sparse data using dynspread. This helps to increase the visibility of sparse data points.

pixel_ratio (number or None, default=None)

Pixel ratio applied to the height and width, used when rasterizing or datashading. When not set explicitly, the ratio is automatically obtained from the browser device pixel ratio. Default is 1 when the browser information is not available. Useful when the browser information is not available (pixel_ratio=2 can give better results on Retina displays) or for using lower resolution for speed.

precompute (bool, default=False)

Whether to precompute aggregations when using rasterize or datashade.

rasterize (bool, default=False)

Whether to apply rasterization using the Datashader library, returning an aggregated Image (to be colormapped by the plotting backend) instead of individual points.

resample_when (int or None, default=None)

Applies a resampling operation (datashade, rasterize or downsample) if the number of individual data points present in the current viewport is above this threshold. The raw plot is displayed otherwise.

selector (datashader.Reduction | str | tuple | None, default=None)

Datashader reduction to apply during a rasterize or datashade operation, used to select additional information for inclusion in the hover tooltip. Supported options include:

  • string: only 'first' and 'last'

  • tuple of two strings: (<reduction>, <column>), e.g. ('min', 'value').

  • Datashader object: ds.first, ds.last, ds.min, and ds.max.

Added in version 0.12.0: Requires holoviews>=1.21. Requires bokeh>=3.7.

threshold (float, default=0.5)

When using dynspread, this value defines the minimum density of overlapping points required before the spreading operation is applied. Values between 0 and 1, where 1 means always spread and 0 means never spread.

x_sampling/y_sampling (number or None, default=None:)

Specifies the smallest allowed sampling interval along the x/y axis. Used when rasterizing or datashading.

The hvplot.sampledata.synthetic_clusters dataset is used in many examples below. This dataset, returned as a DataFrame object, consists of five sub-datasets combined. Each of the sub-dataset has a random x, y-coordinate based on a normal distribution centered at a specific (x, y) location, with standard deviations derived from a power law, resulting in very dense to very scattered clusters. Each point also carries a val (0 to 4) and cat (d1 to d5) column to identify its dataset and category. The total dataset contains 1,000,000 points, evenly split across the five distributions.

import hvplot

df = hvplot.sampledata.synthetic_clusters("pandas")
print(f"This dataset contains {len(df)} rows.")
print("Sample from each of the 5 clusters")
df.iloc[[int(i *len(df)) / 5 for i in range(5)]]
This dataset contains 1000000 rows.
Sample from each of the 5 clusters
x y s val cat
0 2.010368 2.039335 0.03 0 d1
200000 2.034558 -1.868884 0.10 1 d2
400000 -1.827208 -1.344420 0.50 2 d3
600000 -1.654416 3.311160 1.00 3 d4
800000 1.036753 3.933480 3.00 4 d5

aggregator#

The aggregator option determines how data is reduced when applying datashade=True or rasterize=True. Precisely, aggregators dynamically compute a single value per pixel when converting many raw data points into a rasterized image. By default, a simple count aggregator is applied, which creates a plot that displays how many data points are contained within each pixel.

This can be particularly useful for large datasets where plotting every point is impractical. Choosing the right aggregator helps highlight the relevant aspects of the data — whether you want to count point density, show average values, or observe minimums and maximums across regions.

Two main types of aggregators are available:

  • Mathematical combination of data such as the count of data points per pixel or the mean of a dimension of the supplied dataset, including: 'any', 'count', 'mode', 'mean', 'sum', 'var', 'std'.

  • Selection of data from a dimension of the supplied dataset, or the index of the corresponding row in the dataset, including: 'first', 'last', 'min', 'max'.

aggregator accepts either:

  • A Datashader reduction instance, such as ds.count() or ds.mean('val').

  • A string (e.g. 'mean', 'count', 'min', 'max', etc.), in which case the aggregated dimension can be defined by setting the color option (if not, the first non-coordinate variable found is used).

The 'count_cat' or 'by' aggregators can be used for categorical cata. ds.by(<column>, <reduction>) allows to define the per-category reduction function (default is count). Alternatively, setting the by option to a categorical column is equivalent to setting aggregator=ds.by(<cat_column>).

Two additional aggregators are available:

  • ds.summary(...) can be used to compute multiple aggregates simultaneously, by defining a sequence of key/value pairs representing the aggregate labels and reductions. For example, setting aggregator=ds.summary(min_s=ds.min('s'), max_s=ds.max('s')) will make both the min_s and max_s aggregated values available in the hover tooltip.

  • ds.where(<selector_reduction>, lookup_column) can be used to extract the values of another column based on a selector refuction (e.g. 'first', 'min'). For example, setting aggregator=ds.where(ds.min('s'), 'val') will display in each pixel the value of the variable 'val' where the variable 's' is minimum.

Let’s start with the simple case, setting aggregator with a plain string and with a datashader reduction object.

import hvplot.pandas  # noqa
import datashader as ds

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
    rasterize=True, aggregator='var', color='s',
    clabel='var(s)', **plot_opts
) +\
df.hvplot.points(
    rasterize=True, aggregator=ds.min('s'), clabel='min(s)',
    **plot_opts
)

The example below shows how aggregator can be used to handle categorical data.

import hvplot.pandas  # noqa
import datashader as ds

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
    datashade=True, aggregator=ds.by('cat'),
    title="Categorical datashading with\n'count' aggregator'", **plot_opts
) +\
df.hvplot.points(
    datashade=True, aggregator=ds.by('cat', ds.min('s')), hover_cols=['s'],
    title="Categorical datashading with\n'min(s)' aggregator'", **plot_opts
)

The next examples show how to leverage ds.summary() and ds.where(). Hover over the plots to see how what information is made available in the tooltip.

import hvplot.pandas  # noqa
import datashader as ds

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
    rasterize=True, title="summary(\n  min_s=min('s'), max_s=max('s')\n)",
    aggregator=ds.summary(min_s=ds.min('s'), min_val=ds.min('val')),
    **plot_opts
) +\
df.hvplot.points(
    rasterize=True, title="where(min('s', 'val')",
    aggregator=ds.where(ds.min('s'), 'val'),
    **plot_opts
)

datashade#

The datashade option can be used to apply rasterization (aggregation into a grid of pixels) and colormapping operations using the Datashader library. Enabling this options allows:

  • Rendering large datasets in the browser that would otherwise crash it as it can easily handle billions of data points.

  • Dynamically exploring large datasets and discovering patterns, which would otherwise be difficult to find as plotting large data come with many pitfalls such as overplotting.

This approach can turn even the largest datasets into an image that captures patterns such as density or value distribution, making it ideal for high-volume scatter plots. When datashade=True, hvPlot returns a DynamicMap containing an RGB instead of individual glyphs.

Tip

Since datashade=True produces an RGB image, the underlying data (e.g. the aggregated values per pixel) is not directly available to the plot. Enabling the 'hover' tool (disabled by default when datashade=True unless selector is set) would only show the RGB value per pixel, and no meaningful colorbar can be attached to the plot. To let the frontend apply colormapping instead of the backend, and as a consequence expose the underlying data, we recommend setting rasterize=True instead of datashade=True.

The cnorm option defaults to 'eq_hist' when datashade=True.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', datashade=True, data_aspect=1, frame_height=250,
    title='Datashaded points plot with\n"count" aggregator and\n"eq_hist" cnorm'
)

In this example, the entire dataset is rasterized into a colormapped image, with denser areas appearing darker.

downsample#

The downsample option can be used to dynamically reduce the number of plotted points by summarizing data before rendering, making it ideal for large timeseries datasets. This results in lighter plots and faster rendering.

Valid values include (most require the tsdownsample library to be installed):

  • False: No downsampling is applied.

  • True: Applies downsampling using HoloViews’ default algorithm LTTB.

  • 'lttb': Explicitly applies the Largest Triangle Three Buckets algorithm (LTTB). Uses tsdownsample if installed, if not defers to HoloViews’ LTTB implementation (slower).

  • 'minmax': Applies the MinMax algorithm, selecting the minimum and maximum values in each bin. Requires tsdownsample.

  • 'm4': Applies the M4 algorithm, selecting the minimum, maximum, first, and last values in each bin. Requires tsdownsample.

  • 'minmax-lttb': Combines MinMax and LTTB algorithms for downsampling, first applying MinMax to reduce to a preliminary set of points, then LTTB for further reduction. Requires tsdownsample.

import hvplot.pandas  # noqa

df = (
    hvplot.sampledata.stocks("pandas")
    .set_index("date")[["Apple"]]
    .resample("2Min")
    .interpolate(method="polynomial", order=5)
)
print(f"This dataset contains {len(df)} rows.")

df.hvplot.line(downsample="lttb", width=500, height=300, title="downsampled with lttb")
This dataset contains 1310401 rows.

dynspread#

When rendering with datashade=True or rasterize=True, individual points can become hard to see, especially when sparse (1 isolated data point will color 1 pixel only which can be hard to see on a screen). Enabling dynspread=True dynamically increases the size of points in less dense areas, making them more visible.

In more details, spreading expands each pixel a certain number of pixels on all sides according to a circular shape, merging pixels using a compositing operator. Dynamic spreading determines how many pixels to spread based on a density heuristic. Spreading starts at 1 pixel, and stops when the fraction of adjacent non-empty pixels reaches the specified threshold, or the max_px is reached, whichever comes first.

In the example below, we zoom in over an area with sparse data. The colored pixels are difficult to distinguish on the left plot, while they are clearly visible on the right plot with dynspread=True.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(
    x='x', y='y', frame_height=250, data_aspect=1,
    xlim=(-5.5, -5), ylim=(2.5, 3),
)
df.hvplot.points(
    rasterize=True, dynspread=False,
    title="Datashade without dynspread", **plot_opts,
) +\
df.hvplot.points(
    rasterize=True, dynspread=True,
    title="Datashade with dynspread", **plot_opts,
)

max_px#

The max_px option sets the upper limit on how much dynspread can increase point size (in pixels, default is 3). It only applies when dynspread=True.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(
    x='x', y='y', frame_height=250, data_aspect=1,
    xlim=(-5.5, -5), ylim=(2.5, 3),
)
df.hvplot.points(
    rasterize=True, dynspread=True,
    title="Dynspread with max_px=3 (default)", **plot_opts,
) +\
df.hvplot.points(
    rasterize=True, dynspread=True, max_px=8,
    title="Dynspread with max_px=8", **plot_opts
)

Note

Larger values make sparse data more prominent but can also distort the visual scale if overused.

pixel_ratio#

This option adjusts the internal pixel resolution used by datashade or rasterize, relative to the actual display size.

Note

By default, and if possible, pixel_ratio is inferred automatically from the browser and will be internally set to 2 on high-DPI screens to improve sharpness. You can override it manually for consistent rendering across devices.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', datashade=True, pixel_ratio=0.1, frame_height=250,
    data_aspect=1, title="Datashade with low pixel ratio"
)

precompute#

Operations that involve rasterization (rasterize and datashade) can be computationally expensive as they operate on the full dataset (unlike e.g. dynspread). precompute can be set to True to get faster performance in interactive usage by caching the last set of data used in plotting (after any transformations needed) and reusing it when it is requested again. This is particularly useful when your data is not in one of the supported data formats already and needs to be converted. precompute is False by default, because it requires using memory to store the cached data, but if you have enough memory, you can enable it so that repeated interactions (such as zooming and panning) will be much faster than the first one. Learn more about this option in the HoloViews large data user guide.

Tip

In practice, most Datashader-plots don’t need to do extensive precomputing, but enabling it for hvplot.hvPlot.polygons() and hvplot.hvPlot.quadmesh() plots can greatly speed up interactive usage.

rasterize#

The rasterize option can be used to apply a rasterization (aggregation into a grid of pixels) operation using the Datashader library. Enabling this options allows:

  • Rendering large datasets in the browser that would otherwise crash it as it can easily handle billions of data points.

  • Dynamically exploring large datasets and discovering patterns, which would otherwise be difficult to find as plotting large data come with many pitfalls such as overplotting.

This approach can turn even the largest datasets into an image that captures patterns such as density or value distribution, making it for example ideal for high-volume scatter plots. This option applies only rasterization, leaving colormapping to the plotting backend. Unlike datashade, the returned DynamicMap does not contain an RGB element but data grid (Image or ImageStack). This allows exposing the underlying data to the user interface via for example a colorbar and additional values displayed in the hover tooltip.

The cnorm option defaults to 'linear' when datashade=True.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log',
    title='Rasterized points with count aggregator\nand log cnorm'
)

In this example, the entire dataset is rasterized into an image grid that is colormapped in the front-end, with denser areas appearing darker. Hover over the plot to see the count computed in each bin/pixel.

resample_when#

Operations like rasterize, datashade, and downsample are very effective at displaying large datasets. When interacting with a plot, and for example zooming in over a region with a few points, these operations are in practice no longer needed. resample_when can be set to a number of data points present in the viewport below which resampling is toggled off.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', rasterize=True, resample_when=1_000,
    data_aspect=1, frame_height=250, cnorm='log',
    title="Rasterize only when >1000 points in view"
)

When running the code above, you will notice that after zooming in enough, the original data points appear. This gives a hybrid experience: raw points at low density, rasterized aggregates when zoomed out.

selector#

Added in version 0.12.0: Requires holoviews>=1.21. Requires bokeh>=3.7.

When a Datashader operation is applied, with datashade=True or rasterize=True, the selector option allows to augment the tooltip with information computed (selected) from variables other than the aggregated one, effectively showing a sample of the dataset in the tooltip.

Datashader operations allow to easily identify macro level patterns in large datasets by aggregating the data appropriately. However, they do not by default expose information about individual data points. Let’s take for example a simple scatter plots set with rasterize=True; hovering over the image will only display the aggregated value per pixel ('count' by default), with no way to know more about each point (unless resample_when is enabled and the user zooms in enough). Setting selector in this case would augment the tooltip with sample information from other variables, selected from one unique row of the dataset. Find out more about selector in HoloViews’ Interactive Hover for Big Data guide.

Like the aggregator option, a selector refers to a Datashader Reduction object. However, unlike aggregator that accepts reductions that can combine data in a pixel (e.g. 'mean' or 'count'), selector only accepts reductions that select values, including: 'first', 'last', 'min', and 'max'. Valid options include:

  • A string object for reductions that do not require a variable name, including 'first' and 'last'.

  • A 2-tuple with a reduction name and a variable name, for reductions that require a variable name, including 'min' and 'max' (e.g. ('min', 'column')).

  • A reduction instance, including ds.first(), ds.last(), ds.min(), and ds.max().

Note

The hover tooltip always requires a live kernel when selector is set as the values displayed need to be sent by the Python server. Without a live kernel, like on this webpage, all the values are displayed as 'undefined'.

When you hover over the first plot below, you will see a value for s, val, and cat in the bottom part of the tooltip. All these values originate from the same row in the DataFrame, that row being the first one found in the subdataset contained within this pixel. In the second plot, the values displayed are derived from the row where val is minimum within the hovered pixel.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log')
(
    df.hvplot.points(selector='first', title='selector="first"', **plot_opts) +
    df.hvplot.points(selector=('min', 'val'), title='selector=("min", "val")', **plot_opts)
)

datashade=True plots get their hover tool enabled by default when selector is set.

import datashader as ds
import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', data_aspect=1, frame_height=250, cnorm='log',
    datashade=True, selector=ds.min('val'), title='datashade=True',
)

selector can also be set when datashading categorical data.

import datashader as ds
import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', data_aspect=1, frame_height=250, colorbar=False,
    rasterize=True, aggregator=ds.by('cat'), selector='first',
    title="Categorical rasterizing with\n'count' aggregator'",
)

threshold#

Controls sensitivity for dynspread. A value of 1.0 always spreads sparse points, while 0.0 never does. Intermediate values let you tune spreading behavior. It only applies when dynspread=True.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

plot_opts = dict(
    x='x', y='y', datashade=True, dynspread=True,
    data_aspect=1, frame_width=200, xlim=(-2, 0), ylim=(7, 9),
)
df.hvplot.points(threshold=0.0, title="Dynspread threshold=0.0", **plot_opts) +\
df.hvplot.points(threshold=0.5, title="Dynspread threshold=0.5", **plot_opts) +\
df.hvplot.points(threshold=1.0, title="Dynspread threshold=1.0", **plot_opts)

x_sampling / y_sampling#

Set minimum data resolution in the x and/or y direction when setting datashade or rasterize. This is useful to set the granularity of the pixel grid when zoomed in, or when visualizing images or gridded data that requires consistent resolution control.

import hvplot.pandas  # noqa

df = hvplot.sampledata.synthetic_clusters("pandas")

df.hvplot.points(
    x='x', y='y', rasterize=True, x_sampling=0.1, y_sampling=0.1,
    data_aspect=1, cnorm='log', xlim=(0, 1), ylim=(0, 1), frame_height=250,
    title='Zoomed in rasterized plot\nwith custom x/y-sampling'
)
This web page was generated from a Jupyter notebook and not all interactivity will work on this website.