Large Timeseries Data#

Effectively representing temporal dynamics in large datasets requires selecting appropriate visualization techniques that ensure responsiveness while providing both a macroscopic view of overall trends and a microscopic view of fine details. This guide will explore various methods, such as WebGL Rendering, LTTB Downsampling, Datashader Rasterizing, and Minimap Contextualizing, each suited for different aspects of large timeseries data visualization. We predominantly demonstrate the use of hvPlot syntax, leveraging HoloViews for more complex requirements. Although hvPlot supports multiple backends, including Matplotlib and Plotly, our focus will be on Bokeh due to its advanced capabilities in handling large timeseries data.

Getting the data#

Here we have a DataFrame with 1.2 million rows containing standardized data from 5 different sensors.

import pandas as pd

df = pd.read_parquet("https://datasets.holoviz.org/sensor/v1/data.parq")
df.sample(5)
sensor value time
163934 0 0.337184 2023-05-10 06:05:00
306738 1 0.111234 2023-02-22 23:39:00
1146911 4 0.303527 2023-05-26 13:11:00
419289 1 -0.480056 2023-05-25 20:07:00
803486 3 0.208775 2023-02-25 18:20:00
df0 = df[df.sensor=='0']

Let’s go ahead and plot this data using various approaches.

WebGL Rendering#

WebGL is a JavaScript API that allows rendering content in the browser using hardware acceleration from a Graphics Processing Unit (GPU). WebGL is standardized and available in all modern browsers.

Canvas Rendering - Prior Default#

Rendering Bokeh plots in hvPlot or HoloViews has evolved significantly. Prior to 2023, Bokeh’s custom HTML Canvas rendering was the default. This approach works well for datasets up to a few tens of thousands of points but struggles above 100K points, particularly in terms of zooming and panning speed. These days, if you want to utilize Bokeh’s Canvas rendering, use import holoviews as hv; hv.renderer("bokeh").webgl = False prior to creating your hvPlot or HoloViews object.

WebGL Rendering - Current Default#

Around mid-2023, the adoption of improved WebGL as the default for hvPlot and HoloViews allowed for smoother interactions with larger datasets by utilizing GPU-acceleration. It’s important to note that WebGL performance can vary based on your machine’s specifications. For example, some Apple Mac models may not exhibit a marked improvement in WebGL performance over Canvas due to GPU hardware configuration.

import holoviews as hv
import hvplot.pandas  # noqa

# Set notebook hvPlot/HoloViews default options
hv.opts.defaults(hv.opts.Curve(responsive=True))

df0.hvplot(x="time", y="value", autorange='y', title="WebGL", min_height=300)

Note: autorange='y' is demonstrated here for automatic y-axis scaling, a feature from HoloViews 1.17 and hvPlot 0.9.0. You can omit that option if you prefer to set the y scaling manually using the zoom tool.

Alone, both Canvas and WebGL rendering have a common limitation: they transfer the entire dataset from the server to the browser. This can be a significant bottleneck, especially for remote server setups or datasets larger than a million points. To address this, we’ll explore other techniques like LTTB Downsampling, which focus on delivering only the necessary data for the current view. These methods offer more scalable solutions for interacting with large timeseries data, as we’ll see in the following sections.

LTTB Downsampling#

The Challenge with Simple Downsampling#

A straightforward approach to handling large datasets might involve plotting every _n_th datapoint using a method like df.sample:

df0.hvplot(x="time", y="value", color= '#003366', label = "All the data") *\
df0.sample(500).hvplot(x="time", y="value", alpha=0.8, color='#FF6600', min_height=300,
                       label="Decimation", title="Decimation: Don't do this!")