Supported data libraries#
Show code cell content
# hidden on the website
import numpy as np
np.random.seed(1)
The .hvplot()
plotting API supports a wide range of data sources. For most of them, a special import can be executed to register the .hvplot
accessor on specific objects. For instance, importing hvplot.pandas
registers the .hvplot
accessor on Pandas DataFrame
and Series
objects, allowing to call df.hvplot.line()
.
Among the data sources introduced below, Pandas is the only library that doesn’t need to be installed separately as it is a direct dependency of hvPlot.
The following table provides a summary of the data sources supported by hvPlot. The HoloViews interface column indicates whether HoloViews offers an interface for that data source, meaning the data would in most cases not be converted to another type until the very last steps of the rendering process. When no interface is available for a data type, hvPlot internally casts the data to a supported type, for instance polars
objects are casted upfront to pandas
objects.
Source |
Module |
Type |
HoloViews interface |
Comment |
---|---|---|---|---|
|
Tabular |
✅ |
||
|
Tabular |
✅ |
||
|
Tabular |
✅ |
||
|
Tabular |
✅ |
||
|
Tabular |
❌ |
To Pandas |
|
|
Tabular |
❌ |
To Pandas |
|
|
Tabular |
✅ |
GPU |
|
|
Tabular |
❌ |
Experimental |
|
|
Multidimensional |
✅ |
||
|
Catalog |
❌ |
||
|
Streaming |
✅ |
||
- |
Graph |
- |
Note
Supporting so many data sources is hard work! We are aware that the support for some of them isn’t as good as we would like. If you encounter any issue please report it on GitHub, we always welcome Pull Requests too!
Columnar/tabular#
Pandas#
.hvplot()
supports Pandas DataFrame
and Series
objects. hvPlot can also be registered as Pandas’ default plotting backend to delegate Pandas .plot()
calls to hvPlot directly instead of to Pandas’ Matplotlib backend. Find out more about hvPlot’s compatibility with Pandas’ plotting interface.
import hvplot.pandas # noqa
import pandas as pd
df_pandas = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD')).cumsum()
df_pandas.head(2)
A | B | C | D | |
---|---|---|---|---|
0 | 1.624345 | -0.611756 | -0.528172 | -1.072969 |
1 | 2.489753 | -2.913295 | 1.216640 | -1.834176 |
# Pandas DataFrame
df_pandas.hvplot.line(height=150)
# Pandas Series
s_pandas = df_pandas['A']
s_pandas.hvplot.line(height=150)
Dask#
.hvplot()
supports Dask DataFrame
and Series
objects.
import hvplot.dask # noqa
import dask
df_dask = dask.dataframe.from_pandas(df_pandas, npartitions=2)
df_dask
A | B | C | D | |
---|---|---|---|---|
npartitions=2 | ||||
0 | float64 | float64 | float64 | float64 |
500 | ... | ... | ... | ... |
999 | ... | ... | ... | ... |
# Dask DataFrame
df_dask.hvplot.line(height=150)
# Dask Series
s_dask = df_dask['A']
s_dask.hvplot.line(height=150)
GeoPandas#
.hvplot()
supports GeoPandas GeoDataFrame
objects.
import hvplot.pandas # noqa
import geopandas as gpd
p_geometry = gpd.points_from_xy(
x=[12.45339, 12.44177, 9.51667, 6.13000],
y=[41.90328, 43.93610, 47.13372, 49.61166],
crs='EPSG:4326'
)
p_names = ['Vatican City', 'San Marino', 'Vaduz', 'Luxembourg']
gdf = gpd.GeoDataFrame(dict(name=p_names), geometry=p_geometry)
gdf.head(2)
name | geometry | |
---|---|---|
0 | Vatican City | POINT (12.45339 41.90328) |
1 | San Marino | POINT (12.44177 43.9361) |
# GeoPandas GeoDataFrame
gdf.hvplot.points(geo=True, tiles='CartoLight', frame_height=150, data_aspect=0.5)
Ibis#
Ibis is the “portable Python dataframe library”, it provides a unified interface to many data backends (e.g. DuckDB, SQLite, SnowFlake, Google BigQuery). .hvplot()
supports Ibis Expr
objects.
import hvplot.ibis # noqa
import ibis
table = ibis.memtable(df_pandas.reset_index())
table
InMemoryTable data: PandasDataFrameProxy: index A B C D 0 0 1.624345 -0.611756 -0.528172 -1.072969 1 1 2.489753 -2.913295 1.216640 -1.834176 2 2 2.808792 -3.162665 2.678748 -3.894316 3 3 2.486375 -3.546720 3.812517 -4.994207 4 4 2.313947 -4.424578 3.854731 -4.411392 .. ... ... ... ... ... 995 995 14.422083 -48.258487 19.318585 63.861029 996 996 13.814368 -47.528673 18.431398 63.938357 997 997 13.887784 -47.112647 16.552198 64.513816 998 998 13.989847 -45.928343 15.757355 64.387913 999 999 13.029500 -46.772256 16.385696 64.925127 [1000 rows x 5 columns]
# Ibis Expr
table.hvplot.line(x='index', height=150)
Polars#
Note
Added in version 0.9.0
.
Important
While other data sources like Pandas
or Dask
have built-in support in HoloViews, as of version 1.17.1 this is not yet the case for Polars
. You can track this issue to follow the evolution of this feature in HoloViews. Internally hvPlot simply selects the columns that contribute to the plot and casts them to a Pandas object using Polars’ .to_pandas()
method.
import hvplot.polars # noqa
import polars
df_polars = polars.from_pandas(df_pandas)
df_polars.head(2)
A | B | C | D |
---|---|---|---|
f64 | f64 | f64 | f64 |
1.624345 | -0.611756 | -0.528172 | -1.072969 |
2.489753 | -2.913295 | 1.21664 | -1.834176 |
.hvplot()
supports Polars DataFrame
, LazyFrame
and Series
objects.
# Polars DataFrame
df_polars.hvplot.line(y=['A', 'B', 'C', 'D'], height=150)
# Polars LazyFrame
df_polars.lazy().hvplot.line(y=['A', 'B', 'C', 'D'], height=150)
# Polars Series
df_polars['A'].hvplot.line(height=150)
DuckDB#
Note
Added in version 0.11.0
.
import numpy as np
import pandas as pd
df_pandas = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD')).cumsum()
df_pandas.head(2)
A | B | C | D | |
---|---|---|---|---|
0 | -0.140371 | 0.141642 | 0.311969 | 0.769085 |
1 | 0.443915 | 1.930234 | 0.287338 | 2.259375 |
import hvplot.duckdb # noqa
import duckdb
connection = duckdb.connect(':memory:')
relation = duckdb.from_df(df_pandas, connection=connection)
relation.to_view("example_view");
.hvplot()
supports DuckDB DuckDBPyRelation
and DuckDBConnection
objects.
relation.hvplot.line(y=['A', 'B', 'C', 'D'], height=150)
DuckDBPyRelation
is a bit more optimized because it handles column subsetting directly within DuckDB before the data is converted to a pd.DataFrame
.
So, it’s a good idea to use the connection.sql()
method when possible, which gives you a DuckDBPyRelation
, instead of connection.execute()
, which returns a DuckDBPyConnection
.
sql_expr = "SELECT * FROM example_view WHERE A > 0 AND B > 0"
connection.sql(sql_expr).hvplot.line(y=['A', 'B'], hover_cols=["C"], height=150) # subsets A, B, C
Alternatively, you can directly subset the desired columns in the SQL expression.
sql_expr = "SELECT A, B, C FROM example_view WHERE A > 0 AND B > 0"
connection.execute(sql_expr).hvplot.line(y=['A', 'B'], hover_cols=["C"], height=150)
Rapids cuDF#
Important
Rapids cuDF is a Python GPU DataFrame library. Neither hvPlot’s nor HoloViews’ test suites currently run on a GPU part of their CI, as of versions 0.9.0 and 1.17.1, respectively. This is due to the non availability of machines equipped with a GPU on the free CI system we rely on (Github Actions). Therefore it’s possible that support for cuDF gets degraded in hvPlot without us noticing it immediately. Please report any issue you might encounter.
.hvplot()
supports cuDF DataFrame
and Series
objects.
Fugue#
Experimental
Fugue support, added in version 0.9.0
, is experimental and may change in future versions.
hvPlot adds the hvplot
plotting extension to FugueSQL.
import hvplot.fugue # noqa
import fugue
fugue.api.fugue_sql(
"""
OUTPUT df_pandas USING hvplot:line(
height=150,
)
"""
)
A | B | C | D | |
---|---|---|---|---|
0 | -0.140371 | 0.141642 | 0.311969 | 0.769085 |
1 | 0.443915 | 1.930234 | 0.287338 | 2.259375 |
2 | 0.122838 | 2.491040 | 0.558123 | 3.026044 |
3 | 0.156186 | 4.251613 | -1.475462 | 3.886280 |
4 | 0.854354 | 5.019826 | 0.276333 | 5.212350 |
... | ... | ... | ... | ... |
995 | 33.079834 | -30.932895 | 0.340049 | -11.429079 |
996 | 33.609601 | -30.549642 | 0.540773 | -11.335922 |
997 | 33.697890 | -29.223362 | 0.920503 | -9.163789 |
998 | 33.376851 | -27.716442 | 1.177258 | -9.545096 |
999 | 34.413737 | -27.044706 | 1.687819 | -8.538965 |
1000 rows × 4 columns
Multidimensional#
Xarray#
.hvplot()
supports XArray Dataset
and DataArray
labelled multidimensional objects.
import hvplot.xarray # noqa
import xarray as xr
ds = xr.Dataset({
'A': (['x', 'y'], np.random.randn(100, 100)),
'B': (['x', 'y'], np.random.randn(100, 100))},
coords={'x': np.arange(100), 'y': np.arange(100)}
)
ds
<xarray.Dataset> Size: 162kB Dimensions: (x: 100, y: 100) Coordinates: * x (x) int64 800B 0 1 2 3 4 5 6 7 8 9 ... 91 92 93 94 95 96 97 98 99 * y (y) int64 800B 0 1 2 3 4 5 6 7 8 9 ... 91 92 93 94 95 96 97 98 99 Data variables: A (x, y) float64 80kB -2.219 -1.234 1.828 ... 0.1273 0.3056 0.1591 B (x, y) float64 80kB -3.204 -0.4675 -0.2595 ... -0.2403 -0.7512
# Xarray Dataset
ds.hvplot.hist(height=150)
# Xarray DataArray
ds['A'].hvplot.image(height=150)
Catalog#
Intake#
.hvplot()
supports Intake DataSource
objects.
Streaming#
Streamz#
.hvplot()
supports Streamz DataFrame
, DataFrames
, Series
and Seriess
objects.
Graph#
NetworkX#
The hvPlot NetworkX plotting API is meant as a drop-in replacement for the networkx.draw
methods. The draw
and other draw_<>
methods are available in the hvplot.networkx
module.
import hvplot.networkx as hvnx
import networkx as nx
G = nx.petersen_graph()
hvnx.draw(G, with_labels=True, height=150)