Resampling Options#
Notes
Most of the options below require the Datashader library to be installed. The
downsample
option should preferably be used with the tsdownsample library installed.To dive deeper into the Datashader-related option, users are encouraged to read HoloViews’ Large Data user guide and the Datashader website, in particular the Plotting pitfalls user guide.
Most of the examples below require to be run in a Jupyter notebook to experience their full interactivity (dynamic resampling after zoomin/panning).
Performance related options for handling large datasets, including downsampling, rasterization, and dynamic plot updates:
Parameters |
Description |
---|---|
aggregator (str, datashader.Reduction, or None, default=None) |
Aggregator to use when applying rasterize or datashade operation
(valid options include |
datashade (bool, default=False) |
Whether to apply rasterization and shading (colormapping) using the Datashader library, returning an RGB object instead of individual points |
downsample (bool or str or None, default=None) |
Controls the application of downsampling to the plotted data, which is particularly useful for large timeseries datasets to reduce the amount of data sent to browser and improve visualization performance. Acceptable values:
Other string values corresponding to supported algorithms in HoloViews may also be used. |
dynspread (bool, default=False) |
For plots generated with datashade=True or rasterize=True, automatically increase the point size when the data is sparse so that individual points become more visible. |
max_px (int, default=3) |
The maximum size in pixels for dynamically spreading elements in sparse data using |
pixel_ratio (number or None, default=None) |
Pixel ratio applied to the height and width, used when rasterizing or datashading. When not set explicitly, the ratio is automatically obtained from the browser device pixel ratio. Default is 1 when the browser information is not available. Useful when the browser information is not available (pixel_ratio=2 can give better results on Retina displays) or for using lower resolution for speed. |
precompute (bool, default=False) |
Whether to precompute aggregations when using |
rasterize (bool, default=False) |
Whether to apply rasterization using the Datashader library, returning an aggregated Image (to be colormapped by the plotting backend) instead of individual points. |
resample_when (int or None, default=None) |
Applies a resampling operation (datashade, rasterize or downsample) if the number of individual data points present in the current viewport is above this threshold. The raw plot is displayed otherwise. |
selector (datashader.Reduction | str | tuple | None, default=None) |
Datashader reduction to apply during a
Added in version 0.12.0: Requires |
threshold (float, default=0.5) |
When using |
x_sampling/y_sampling (number or None, default=None:) |
Specifies the smallest allowed sampling interval along the x/y axis. Used when rasterizing or datashading. |
The hvplot.sampledata.synthetic_clusters
dataset is used in many examples below. This dataset, returned as a DataFrame object, consists of five sub-datasets combined. Each of the sub-dataset has a random x, y-coordinate based on a normal distribution centered at a specific (x, y) location, with standard deviations derived from a power law, resulting in very dense to very scattered clusters. Each point also carries a val
(0
to 4
) and cat
(d1
to d5
) column to identify its dataset and category. The total dataset contains 1,000,000 points, evenly split across the five distributions.
import hvplot
df = hvplot.sampledata.synthetic_clusters("pandas")
print(f"This dataset contains {len(df)} rows.")
print("Sample from each of the 5 clusters")
df.iloc[[int(i *len(df)) / 5 for i in range(5)]]
This dataset contains 1000000 rows.
Sample from each of the 5 clusters
x | y | s | val | cat | |
---|---|---|---|---|---|
0 | 2.010368 | 2.039335 | 0.03 | 0 | d1 |
200000 | 2.034558 | -1.868884 | 0.10 | 1 | d2 |
400000 | -1.827208 | -1.344420 | 0.50 | 2 | d3 |
600000 | -1.654416 | 3.311160 | 1.00 | 3 | d4 |
800000 | 1.036753 | 3.933480 | 3.00 | 4 | d5 |
aggregator
#
The aggregator
option determines how data is reduced when applying datashade=True
or rasterize=True
. Precisely, aggregators dynamically compute a single value per pixel when converting many raw data points into a rasterized image. By default, a simple count
aggregator is applied, which creates a plot that displays how many data points are contained within each pixel.
This can be particularly useful for large datasets where plotting every point is impractical. Choosing the right aggregator helps highlight the relevant aspects of the data — whether you want to count point density, show average values, or observe minimums and maximums across regions.
Two main types of aggregators are available:
Mathematical combination of data such as the
count
of data points per pixel or themean
of a dimension of the supplied dataset, including:'any'
,'count'
,'mode'
,'mean'
,'sum'
,'var'
,'std'
.Selection of data from a dimension of the supplied dataset, or the index of the corresponding row in the dataset, including:
'first'
,'last'
,'min'
,'max'
.
aggregator
accepts either:
A Datashader reduction instance, such as
ds.count()
ords.mean('val')
.A string (e.g.
'mean'
,'count'
,'min'
,'max'
, etc.), in which case the aggregated dimension can be defined by setting thecolor
option (if not, the first non-coordinate variable found is used).
The 'count_cat'
or 'by'
aggregators can be used for categorical cata. ds.by(<column>, <reduction>)
allows to define the per-category reduction function (default is count
). Alternatively, setting the by
option to a categorical column is equivalent to setting aggregator=ds.by(<cat_column>)
.
Two additional aggregators are available:
ds.summary(...)
can be used to compute multiple aggregates simultaneously, by defining a sequence of key/value pairs representing the aggregate labels and reductions. For example, settingaggregator=ds.summary(min_s=ds.min('s'), max_s=ds.max('s'))
will make both themin_s
andmax_s
aggregated values available in the hover tooltip.ds.where(<selector_reduction>, lookup_column)
can be used to extract the values of another column based on a selector refuction (e.g.'first'
,'min'
). For example, settingaggregator=ds.where(ds.min('s'), 'val')
will display in each pixel the value of the variable'val'
where the variable's'
is minimum.
Let’s start with the simple case, setting aggregator with a plain string and with a datashader reduction object.
import hvplot.pandas # noqa
import datashader as ds
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
rasterize=True, aggregator='var', color='s',
clabel='var(s)', **plot_opts
) +\
df.hvplot.points(
rasterize=True, aggregator=ds.min('s'), clabel='min(s)',
**plot_opts
)
The example below shows how aggregator
can be used to handle categorical data.
import hvplot.pandas # noqa
import datashader as ds
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
datashade=True, aggregator=ds.by('cat'),
title="Categorical datashading with\n'count' aggregator'", **plot_opts
) +\
df.hvplot.points(
datashade=True, aggregator=ds.by('cat', ds.min('s')), hover_cols=['s'],
title="Categorical datashading with\n'min(s)' aggregator'", **plot_opts
)
The next examples show how to leverage ds.summary()
and ds.where()
. Hover over the plots to see how what information is made available in the tooltip.
import hvplot.pandas # noqa
import datashader as ds
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(x='x', y='y', data_aspect=1, frame_height=250)
df.hvplot.points(
rasterize=True, title="summary(\n min_s=min('s'), max_s=max('s')\n)",
aggregator=ds.summary(min_s=ds.min('s'), min_val=ds.min('val')),
**plot_opts
) +\
df.hvplot.points(
rasterize=True, title="where(min('s', 'val')",
aggregator=ds.where(ds.min('s'), 'val'),
**plot_opts
)
datashade
#
The datashade
option can be used to apply rasterization (aggregation into a grid of pixels) and colormapping operations using the Datashader library. Enabling this options allows:
Rendering large datasets in the browser that would otherwise crash it as it can easily handle billions of data points.
Dynamically exploring large datasets and discovering patterns, which would otherwise be difficult to find as plotting large data come with many pitfalls such as overplotting.
This approach can turn even the largest datasets into an image that captures patterns such as density or value distribution, making it ideal for high-volume scatter plots. When datashade=True
, hvPlot returns a DynamicMap
containing an RGB
instead of individual glyphs.
Tip
Since datashade=True
produces an RGB image, the underlying data (e.g. the aggregated values per pixel) is not directly available to the plot. Enabling the 'hover'
tool (disabled by default when datashade=True
unless selector
is set) would only show the RGB value per pixel, and no meaningful colorbar can be attached to the plot. To let the frontend apply colormapping instead of the backend, and as a consequence expose the underlying data, we recommend setting rasterize=True
instead of datashade=True
.
The cnorm
option defaults to 'eq_hist'
when datashade=True
.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', datashade=True, data_aspect=1, frame_height=250,
title='Datashaded points plot with\n"count" aggregator and\n"eq_hist" cnorm'
)
In this example, the entire dataset is rasterized into a colormapped image, with denser areas appearing darker.
downsample
#
The downsample
option can be used to dynamically reduce the number of plotted points by summarizing data before rendering, making it ideal for large timeseries datasets. This results in lighter plots and faster rendering.
Valid values include (most require the tsdownsample
library to be installed):
False
: No downsampling is applied.True
: Applies downsampling using HoloViews’ default algorithm LTTB.'lttb'
: Explicitly applies the Largest Triangle Three Buckets algorithm (LTTB). Usestsdownsample
if installed, if not defers to HoloViews’ LTTB implementation (slower).'minmax'
: Applies the MinMax algorithm, selecting the minimum and maximum values in each bin. Requirestsdownsample
.'m4'
: Applies the M4 algorithm, selecting the minimum, maximum, first, and last values in each bin. Requirestsdownsample
.'minmax-lttb'
: Combines MinMax and LTTB algorithms for downsampling, first applying MinMax to reduce to a preliminary set of points, then LTTB for further reduction. Requirestsdownsample
.
import hvplot.pandas # noqa
df = (
hvplot.sampledata.stocks("pandas")
.set_index("date")[["Apple"]]
.resample("2Min")
.interpolate(method="polynomial", order=5)
)
print(f"This dataset contains {len(df)} rows.")
df.hvplot.line(downsample="lttb", width=500, height=300, title="downsampled with lttb")
This dataset contains 1310401 rows.
dynspread
#
When rendering with datashade=True
or rasterize=True
, individual points can become hard to see, especially when sparse (1 isolated data point will color 1 pixel only which can be hard to see on a screen). Enabling dynspread=True
dynamically increases the size of points in less dense areas, making them more visible.
In more details, spreading expands each pixel a certain number of pixels on all sides according to a circular shape, merging pixels using a compositing operator. Dynamic spreading determines how many pixels to spread based on a density heuristic. Spreading starts at 1 pixel, and stops when the fraction of adjacent non-empty pixels reaches the specified threshold
, or the max_px
is reached, whichever comes first.
In the example below, we zoom in over an area with sparse data. The colored pixels are difficult to distinguish on the left plot, while they are clearly visible on the right plot with dynspread=True
.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(
x='x', y='y', frame_height=250, data_aspect=1,
xlim=(-5.5, -5), ylim=(2.5, 3),
)
df.hvplot.points(
rasterize=True, dynspread=False,
title="Datashade without dynspread", **plot_opts,
) +\
df.hvplot.points(
rasterize=True, dynspread=True,
title="Datashade with dynspread", **plot_opts,
)
max_px
#
The max_px
option sets the upper limit on how much dynspread
can increase point size (in pixels, default is 3
). It only applies when dynspread=True
.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(
x='x', y='y', frame_height=250, data_aspect=1,
xlim=(-5.5, -5), ylim=(2.5, 3),
)
df.hvplot.points(
rasterize=True, dynspread=True,
title="Dynspread with max_px=3 (default)", **plot_opts,
) +\
df.hvplot.points(
rasterize=True, dynspread=True, max_px=8,
title="Dynspread with max_px=8", **plot_opts
)
Note
Larger values make sparse data more prominent but can also distort the visual scale if overused.
pixel_ratio
#
This option adjusts the internal pixel resolution used by datashade
or rasterize
, relative to the actual display size.
Note
By default, and if possible, pixel_ratio
is inferred automatically from the browser and will be internally set to 2
on high-DPI screens to improve sharpness. You can override it manually for consistent rendering across devices.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', datashade=True, pixel_ratio=0.1, frame_height=250,
data_aspect=1, title="Datashade with low pixel ratio"
)
precompute
#
Operations that involve rasterization (rasterize
and datashade
) can be computationally expensive as they operate on the full dataset (unlike e.g. dynspread
). precompute
can be set to True
to get faster performance in interactive usage by caching the last set of data used in plotting (after any transformations needed) and reusing it when it is requested again. This is particularly useful when your data is not in one of the supported data formats already and needs to be converted. precompute
is False
by default, because it requires using memory to store the cached data, but if you have enough memory, you can enable it so that repeated interactions (such as zooming and panning) will be much faster than the first one. Learn more about this option in the HoloViews large data user guide.
Tip
In practice, most Datashader-plots don’t need to do extensive precomputing, but enabling it for hvplot.hvPlot.polygons()
and hvplot.hvPlot.quadmesh()
plots can greatly speed up interactive usage.
rasterize
#
The rasterize
option can be used to apply a rasterization (aggregation into a grid of pixels) operation using the Datashader library. Enabling this options allows:
Rendering large datasets in the browser that would otherwise crash it as it can easily handle billions of data points.
Dynamically exploring large datasets and discovering patterns, which would otherwise be difficult to find as plotting large data come with many pitfalls such as overplotting.
This approach can turn even the largest datasets into an image that captures patterns such as density or value distribution, making it for example ideal for high-volume scatter plots. This option applies only rasterization, leaving colormapping to the plotting backend. Unlike datashade
, the returned DynamicMap
does not contain an RGB
element but data grid (Image
or ImageStack
). This allows exposing the underlying data to the user interface via for example a colorbar and additional values displayed in the hover tooltip.
The cnorm
option defaults to 'linear'
when datashade=True
.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log',
title='Rasterized points with count aggregator\nand log cnorm'
)
In this example, the entire dataset is rasterized into an image grid that is colormapped in the front-end, with denser areas appearing darker. Hover over the plot to see the count computed in each bin/pixel.
resample_when
#
Operations like rasterize
, datashade
, and downsample
are very effective at displaying large datasets. When interacting with a plot, and for example zooming in over a region with a few points, these operations are in practice no longer needed. resample_when
can be set to a number of data points present in the viewport below which resampling is toggled off.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', rasterize=True, resample_when=1_000,
data_aspect=1, frame_height=250, cnorm='log',
title="Rasterize only when >1000 points in view"
)
When running the code above, you will notice that after zooming in enough, the original data points appear. This gives a hybrid experience: raw points at low density, rasterized aggregates when zoomed out.
selector
#
Added in version 0.12.0: Requires holoviews>=1.21
.
Requires bokeh>=3.7
.
When a Datashader operation is applied, with datashade=True
or rasterize=True
, the selector
option allows to augment the tooltip with information computed (selected) from variables other than the aggregated one, effectively showing a sample of the dataset in the tooltip.
Datashader operations allow to easily identify macro level patterns in large datasets by aggregating the data appropriately. However, they do not by default expose information about individual data points. Let’s take for example a simple scatter plots set with rasterize=True
; hovering over the image will only display the aggregated value per pixel ('count'
by default), with no way to know more about each point (unless resample_when
is enabled and the user zooms in enough). Setting selector
in this case would augment the tooltip with sample information from other variables, selected from one unique row of the dataset. Find out more about selector
in HoloViews’ Interactive Hover for Big Data guide.
Like the aggregator
option, a selector
refers to a Datashader Reduction
object. However, unlike aggregator
that accepts reductions that can combine data in a pixel (e.g. 'mean'
or 'count'
), selector
only accepts reductions that select values, including: 'first'
, 'last'
, 'min'
, and 'max'
. Valid options include:
A string object for reductions that do not require a variable name, including
'first'
and'last'
.A 2-tuple with a reduction name and a variable name, for reductions that require a variable name, including
'min'
and'max'
(e.g.('min', 'column')
).A reduction instance, including
ds.first()
,ds.last()
,ds.min()
, andds.max()
.
Note
The hover tooltip always requires a live kernel when selector
is set as the values displayed need to be sent by the Python server. Without a live kernel, like on this webpage, all the values are displayed as 'undefined'
.
When you hover over the first plot below, you will see a value for s
, val
, and cat
in the bottom part of the tooltip. All these values originate from the same row in the DataFrame, that row being the first one found in the subdataset contained within this pixel. In the second plot, the values displayed are derived from the row where val
is minimum within the hovered pixel.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log')
(
df.hvplot.points(selector='first', title='selector="first"', **plot_opts) +
df.hvplot.points(selector=('min', 'val'), title='selector=("min", "val")', **plot_opts)
)
datashade=True
plots get their hover tool enabled by default when selector
is set.
import datashader as ds
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', data_aspect=1, frame_height=250, cnorm='log',
datashade=True, selector=ds.min('val'), title='datashade=True',
)
selector
can also be set when datashading categorical data.
import datashader as ds
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', data_aspect=1, frame_height=250, colorbar=False,
rasterize=True, aggregator=ds.by('cat'), selector='first',
title="Categorical rasterizing with\n'count' aggregator'",
)
threshold
#
Controls sensitivity for dynspread
. A value of 1.0
always spreads sparse points, while 0.0
never does. Intermediate values let you tune spreading behavior. It only applies when dynspread=True
.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
plot_opts = dict(
x='x', y='y', datashade=True, dynspread=True,
data_aspect=1, frame_width=200, xlim=(-2, 0), ylim=(7, 9),
)
df.hvplot.points(threshold=0.0, title="Dynspread threshold=0.0", **plot_opts) +\
df.hvplot.points(threshold=0.5, title="Dynspread threshold=0.5", **plot_opts) +\
df.hvplot.points(threshold=1.0, title="Dynspread threshold=1.0", **plot_opts)
x_sampling
/ y_sampling
#
Set minimum data resolution in the x and/or y direction when setting datashade
or rasterize
. This is useful to set the granularity of the pixel grid when zoomed in, or when visualizing images or gridded data that requires consistent resolution control.
import hvplot.pandas # noqa
df = hvplot.sampledata.synthetic_clusters("pandas")
df.hvplot.points(
x='x', y='y', rasterize=True, x_sampling=0.1, y_sampling=0.1,
data_aspect=1, cnorm='log', xlim=(0, 1), ylim=(0, 1), frame_height=250,
title='Zoomed in rasterized plot\nwith custom x/y-sampling'
)