Gridded Data#

hvPlot provides one API to explore data of many different types. Previous sections have exclusively worked with tabular data stored in pandas (or pandas-like) DataFrames. The other most common type of data are n-dimensional arrays. hvPlot aims to eventually support different array libraries but for now focuses on xarray. XArray provides a convenient and very powerful wrapper to label the axis and coordinates of multi-dimensional (n-D) arrays. This user guide will cover how to leverage xarray and hvplot to visualize and explore data of different dimensionality ranging from simple 1D data, to 2D image-like data, to multi-dimensional cubes of data.

For these examples we’ll use the North American air temperature dataset:

import xarray as xr
import hvplot.xarray  # noqa

air_ds = xr.tutorial.open_dataset('air_temperature').load()
air = air_ds.air
air_ds
<xarray.Dataset> Size: 31MB
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

1D Plots#

Selecting the data at a particular lat/lon coordinate we get a 1D dataset of air temperatures over time:

air1d = air.sel(lat=40, lon=285)
air1d.hvplot()

Notice how the axes are already appropriately labeled, because xarray stores the metadata required. We can also further subselect the data and use * to overlay plots:

air1d_sel = air1d.sel(time='2013-01')
air1d_sel.hvplot(color='purple') * air1d_sel.hvplot.scatter(marker='o', color='blue', size=15)
air.lat
<xarray.DataArray 'lat' (lat: 25)> Size: 100B
array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y

Selecting multiple#

If we select multiple coordinates along one axis and plot a chart type, the data will automatically be split by the coordinate:

air.sel(lat=[20, 40, 60], lon=285).hvplot.line()

To plot a different relationship we can explicitly request to display the latitude along the y-axis and use the by keyword to color each longitude (or ‘lon’) differently (note that this differs from the hue keyword xarray uses):

air.sel(time='2013-02-01 00:00', lon=[280, 285]).hvplot.line(y='lat', by='lon', legend='top_right')

2D Plots#

By default the DataArray.hvplot() method generates an image if the data is two-dimensional.

air2d = air.sel(time='2013-06-01 12:00')
air2d.hvplot(width=400)

Alternatively we can also plot the same data using the contour and contourf methods, which provide a levels argument to control the number of iso-contours to draw:

air2d.hvplot.contour(width=400, levels=20) + air2d.hvplot.contourf(width=400, levels=8)

n-D Plots#

If the data has more than two dimensions it will default to a histogram without providing it further hints:

air.hvplot()

However we can tell it to apply a groupby along a particular dimension, allowing us to explore the data as images along that dimension with a slider:

air.hvplot(groupby='time', width=500)

By default, for numeric types you’ll get a slider and for non-numeric types you’ll get a selector. Use widget_type and widget_location to control the look of the widget. To learn more about customizing widget behavior see Widgets.

air.hvplot(groupby='time', width=600, widget_type='scrubber', widget_location='bottom')

If we pick a different, lower dimensional plot type (such as a ‘line’) it will automatically apply a groupby over the remaining dimensions:

air.hvplot.line(width=600)

Statistical plots#

Statistical plots such as histograms, kernel-density estimates, or violin and box-whisker plots aggregate the data across one or more of the coordinate dimensions. For instance, plotting a KDE provides a summary of all the air temperature values but we can, once again, use the by keyword to view each selected latitude (or ‘lat’) separately:

air.sel(lat=[25, 50, 75]).hvplot.kde('air', by='lat', alpha=0.5)

Using the by keyword we can break down the distribution of the air temperature across one or more variables:

air.hvplot.violin('air', by='lat', color='lat', cmap='Category20')

Rasterizing#

If you are plotting a large amount of data at once, you can consider using the hvPlot interface to Datashader, which can be enabled simply by setting rasterize=True.

Note that by declaring that the data should not be grouped by another coordinate variable, i.e. by setting groupby=[], we can plot all the datapoints, showing us the spread of air temperatures in the dataset:

air.hvplot.scatter('time', groupby=[], rasterize=True) *\
air.mean(['lat', 'lon']).hvplot.line('time', color='indianred')

Here we also overlaid a non-datashaded line plot of the average temperature at each time. If you enable the appropriate hover tool, the overlaid data supports hovering and zooming even in a static export such as on a web server or in an email, while the raw-data plot has been aggregated spatially before it is sent to the browser, and thus it has only the fixed spatial binning available at that time. If you have a live Python process, the raw data will be aggregated each time you pan or zoom, letting you see the entire dataset regardless of size.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.