interactive#
hvPlot isn’t only a plotting library, it is dedicated to make data exploration easier. In this guide you will see how it can help you to get better control over your data pipelines. We define a data pipeline as a series of commands that transform some data, such as aggregating, filtering, reshaping, renaming, etc. A data pipeline may include a load step that will provide the input data to the pipeline, e.g. reading the data from a data base.
When you analyze some data in a notebook that is for instance held in a Pandas DataFrame, you may find yourself having to re-run many cells after changing the parameters you provide to Pandas’ methods, either to get more insights on the data or fine tune an algorithm. .interactive()
is a solution to improve this rather cumbersome workflow, by which you replace the constant parameters in the pipeline by widgets (e.g. a number slider), that will automatically get displayed next to your pipeline output and will trigger an output update on changes. With this approach all your pipeline parameters are available in one place and you get full interactive control over the pipeline.
.interactive()
doesn’t only work with DataFrames but also with Xarray data structures, this is what we are going to show in this guide. First we will import hvplot.xarray
which is going make available the .interactive()
accessor on Xarray objects.
import hvplot.xarray # noqa
import xarray as xr
We load a dataset and get a handle on its unique air variable.
ds = xr.tutorial.load_dataset('air_temperature')
air = ds.air
ds
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
We want to better understand the temporal evolution of the air temperature over different latitudes compared to a baseline. The data pipeline we build includes:
filtering the data at one latitude
cleaning up the data
aggregating the temperatures by time
computing a rolling mean
subtracting a baseline from the above
The output we choose for now is the .describe()
method of Pandas. This pipeline has two parameters, the latitude and the temporal window of the rolling operation.
LATITUDE = 30.
ROLLING_WINDOW = '1D'
baseline = air.sel(lat=LATITUDE).mean().item()
pipeline = (
air
.sel(lat=LATITUDE)
.to_dataframe()
.drop(columns='lat')
.groupby('time').mean()
.rolling(ROLLING_WINDOW).mean()
- baseline
)
pipeline.describe()
air | |
---|---|
count | 2920.000000 |
mean | 0.000418 |
std | 3.185757 |
min | -6.247181 |
25% | -2.930566 |
50% | -0.344752 |
75% | 3.222418 |
max | 5.372583 |
Without .interactive()
we would manually change the values of LATITUDE
and ROLLING_WINDOW
to see how they affect the pipeline output. Instead we create two widgets with the values we expect them to take, we are basically declaring beforehand our parameter space. To create widgets we import Panel and pick two appropriate widgets from its Reference Gallery.
import panel as pn
w_latitude = pn.widgets.DiscreteSlider(name='Latitude', options=list(air.lat.values))
w_rolling_window = pn.widgets.RadioButtonGroup(name='Rolling window', options=['1D', '7D', '30D'])
Now we instantiate an Interactive object by calling .interactive()
on our data. This object mirrors the underlying object API, it accepts all of its natural operations. We replace the data by the interactive object in the pipeline, and replace the constant parameters by the widgets we have just created.
airi = air.interactive()
baseline = airi.sel(lat=w_latitude).mean().item()
pipeline = (
airi
.sel(lat=w_latitude)
.to_dataframe()
.drop(columns='lat')
.groupby('time').mean()
.rolling(w_rolling_window).mean()
- baseline
)
pipeline.describe()
You can see that now the pipeline when rendered doesn’t only consist of its output, it also includes the widgets that control it. Change the widgets’ values and observe how the output dynamically updates.
You can notice that .interactive()
supports the fact that the data type changed in the pipeline (see the call to .to_dataframe
) and that it also supports math operators (- baseline
).
A plot would be a better output for this pipeline. We will use .hvplot()
to create an interactive Bokeh line plot, note that using Pandas’ .plot()
is also possible.
import hvplot.pandas # noqa
pipeline.hvplot()
For information on using .interactive()
take a look at the User Guide.