Getting started¶
The xdatasets library enables users to effortlessly access a vast collection of earth observation datasets that are compatible with xarray formats.
The library adopts an opinionated approach to data querying and caters to the specific needs of certain user groups, such as hydrologists, climate scientists, and engineers. One of the functionalities of xdatasets is the ability to extract data at a specific location or within a designated region, such as a watershed or municipality, while also enabling spatial and temporal operations.
To use xdatasets, users must employ a query. For instance, a straightforward query to extract the variables t2m (2m temperature) and tp (Total precipitation) from the era5_reanalysis_single_levels dataset at two geographical positions (Montreal and Toronto) could be as follows:
query = {
"datasets": {"era5_reanalysis_single_levels": {'variables': ["t2m", "tp"]}},
"space": {
"clip": "point", # bbox, point or polygon
"geometry": {'Montreal' : (45.508888, -73.561668),
'Toronto' : (43.651070, -79.347015)
}
}
}
An example of a more complex query would look like the one below.
Note Don’t worry! Below, you’ll find additional examples that will assist in understanding each parameter in the query, as well as the possible combinations.
This query calls the same variables as above. However, instead of specifying geographical positions, a GeoPandas.DataFrame is used to provide features (such as shapefiles or geojson) for extracting data within each of them. Each polygon is identified using the unique identifier Station, and a spatial average is computed within each one (aggregation: True). The dataset, initially at an hourly time step, is converted into a daily time step while applying one or more temporal aggregations
for each variable as prescribed in the query. xdatasets ultimately returns the dataset for the specified date range and time zone.
query = {
"datasets": {"era5_reanalysis_single_levels": {'variables': ["t2m", "tp"]}},
"space": {
"clip": "polygon", # bbox, point or polygon
"averaging": True, # spatial average of the variables within each polygon
"geometry": gdf,
"unique_id": "Station" # unique column name in geodataframe
},
"time": {
"timestep": "D",
"aggregation": {"tp": np.nansum,
"t2m": [np.nanmax, np.nanmin]},
"start": '2000-01-01',
"end": '2020-05-31',
"timezone": 'America/Montreal',
},
}
Query climate datasets¶
In order to use xdatasets, you must import at least xdatasets, pandas, geopandas, and numpy. Additionally, we import pathlib to interact with files.
[1]:
import os
import warnings
from pathlib import Path
warnings.simplefilter("ignore")
os.environ["USE_PYGEOS"] = "0"
import geopandas as gpd
# Visualization
import hvplot.pandas # noqa
import hvplot.xarray # noqa-
import numpy as np
import pandas as pd
import panel as pn # noqa
import xdatasets as xd
Clip by points (sites)¶
To begin with, we need to create a dictionary of sites and their corresponding geographical coordinates.
[2]:
sites = {
"Montreal": (45.508888, -73.561668),
"New York": (40.730610, -73.935242),
"Miami": (25.761681, -80.191788),
}
We will then extract the tp (total precipitation) and t2m (2m temperature) from the era5_reanalysis_single_levels dataset for the designated sites. Afterward, we will convert the time step to daily and adjust the timezone to Eastern Time. Finally, we will limit the temporal interval.
Before proceeding with this first query, let’s quickly outline the role of each parameter:
datasets: A dictionary where datasets serve as keys and desired variables as values.
space: A dictionary that defines the necessary spatial operations to apply on user-supplied geographic features.
time: A dictionary that defines the necessary temporal operations to apply on the datasets
For more information on each parameter, consult the API documentation.
This is what the requested query looks like :
[3]:
query = {
"datasets": "era5_reanalysis_single_levels",
"space": {"clip": "point", "geometry": sites}, # bbox, point or polygon
"time": {
"timestep": "D", # http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
"aggregation": {"tp": np.nansum, "t2m": np.nanmean},
"start": "1995-01-01",
"end": "2000-12-31",
"timezone": "America/Montreal",
},
}
xds = xd.Query(**query)
Temporal operations: processing tp with era5_reanalysis_single_levels: 100%|██████████| 2/2 [00:00<00:00, 5.95it/s]
By accessing the data attribute, you can view the data obtained from the query. It’s worth noting that the variable name tp has been updated to tp_nansum to reflect the reduction operation (np.nansum) that was utilized to convert the time step from hourly to daily. Likewise, t2m was updated to t2m_nanmean.
[4]:
xds.data
[4]:
<xarray.Dataset> Size: 70kB
Dimensions: (spatial_agg: 1, timestep: 1, site: 3, time: 2192, source: 1)
Coordinates:
* spatial_agg (spatial_agg) object 8B 'point'
* timestep (timestep) object 8B 'D'
* site (site) <U8 96B 'Montreal' 'New York' 'Miami'
* time (time) datetime64[ns] 18kB 1995-01-01 1995-01-02 ... 2000-12-31
* source (source) <U29 116B 'era5_reanalysis_single_levels'
latitude (site) float64 24B 45.5 40.75 25.75
longitude (site) float64 24B -73.5 -74.0 -80.25
Data variables:
t2m_nanmean (spatial_agg, timestep, time, site) float32 26kB 266.9 ... 2...
tp_nansum (spatial_agg, timestep, time, site) float32 26kB 0.007034 .....
Attributes: (12/30)
GRIB_NV: 0
GRIB_Nx: 1440
GRIB_Ny: 721
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: an
... ...
GRIB_totalNumber: 0
GRIB_typeOfLevel: surface
GRIB_units: K
long_name: 2 metre temperature
standard_name: unknown
units: K[5]:
title = f"Comparison of total precipitation across three cities in North America from \
{xds.data.time.dt.year.min().values} to {xds.data.time.dt.year.max().values}"
xds.data.sel(
timestep="D",
source="era5_reanalysis_single_levels",
).hvplot(
title=title,
x="time",
y="tp_nansum",
grid=True,
width=750,
height=450,
by="site",
legend="top",
widget_location="bottom",
)
[5]:
[6]:
title = f"Comparison of 2m temperature across three cities in North America from \
{xds.data.time.dt.year.min().values} to {xds.data.time.dt.year.max().values}"
xds.data.sel(
timestep="D",
source="era5_reanalysis_single_levels",
).hvplot(
title=title,
x="time",
y="t2m_nanmean",
grid=True,
width=750,
height=450,
by="site",
legend="top",
widget_location="bottom",
)
[6]:
Clip on polygons with no averaging in space¶
First, let’s explore specific polygon features. With xdatasets, you can access geographical datasets, such as watershed boundaries linked to streamflow stations. These datasets follow a nomenclature where they are named after the hydrological dataset, with "_polygons" appended. For example, if the hydrological dataset is named deh, its corresponding watershed boundaries dataset will be labeled deh_polygons. The query below retrieves all polygons for the deh_polygons dataset.
gdf = xd.Query(
**{
"datasets": "deh_polygons"
}).data
gdf
As the data is loaded into memory, the process of loading all polygons may take some time. To expedite this, we recommend employing filters, as illustrated below. It’s important to note that the filters are consistent for both hydrological and corresponding geographical datasets. Consequently, only watershed boundaries associated with existing hydrological data will be returned.
[7]:
[7]:
| Station | Superficie | geometry | |
|---|---|---|---|
| 0 | 042102 | 623.479187 | POLYGON ((-78.5712 46.70742, -78.57112 46.7073... |
| 1 | 042103 | 579.479614 | POLYGON ((-78.49014 46.64514, -78.4901 46.6450... |
Let’s examine the geographic locations of the polygon features.
[8]:
gdf.hvplot(
geo=True,
tiles="ESRI",
color="Station",
alpha=0.8,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
Cell In[8], line 1
----> 1 gdf.hvplot(
2 geo=True,
3 tiles="ESRI",
4 color="Station",
5 alpha=0.8,
6 width=750,
7 height=450,
8 legend="top",
9 hover_cols=["Station", "Superficie"],
10 )
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/plotting/core.py:95, in hvPlotBase.__call__(self, x, y, kind, **kwds)
92 plot = self._get_converter(x, y, kind, **kwds)(kind, x, y)
93 return pn.panel(plot, **panel_dict)
---> 95 return self._get_converter(x, y, kind, **kwds)(kind, x, y)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1645, in HoloViewsConverter.__call__(self, kind, x, y)
1643 columns = [c for c in data.columns if c != 'geometry']
1644 shape_dims = ['Longitude', 'Latitude'] if self.geo else ['x', 'y']
-> 1645 dataset = Dataset(data, kdims=shape_dims + columns)
1646 elif self.datatype == 'xarray':
1647 import xarray as xr
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/holoviews/core/data/__init__.py:333, in Dataset.__init__(self, data, kdims, vdims, **kwargs)
331 (data, self.interface, dims, extra_kws) = initialized
332 super().__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))
--> 333 self.interface.validate(self, validate_vdims)
335 # Handle _pipeline property
336 if input_pipeline is None:
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/holoviews/core/data/pandas.py:166, in PandasInterface.validate(cls, dataset, vdims)
164 not_found = [d for d in dimensions if d not in cols]
165 if not_found:
--> 166 raise DataError("Supplied data does not contain specified "
167 "dimensions, the following dimensions were "
168 "not found: %s" % repr(not_found), cls)
DataError: Supplied data does not contain specified dimensions, the following dimensions were not found: ['Longitude', 'Latitude']
PandasInterface expects tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html
The following query seeks the variables t2m and tp from the era5_reanalysis_single_levels dataset, covering the period between January 1, 1959, and September 30, 1961, for the three polygons mentioned earlier. It is important to note that as aggregation is set to False, no spatial averaging will be conducted, and a mask (raster) will be returned for each polygon.
[9]:
query = {
"datasets": {"era5_reanalysis_single_levels": {"variables": ["t2m", "tp"]}},
"space": {
"clip": "polygon", # bbox, point or polygon
"averaging": False, # spatial average of the variables within each polygon
"geometry": gdf,
"unique_id": "Station", # unique column name in geodataframe
},
"time": {
"start": "1959-01-01",
"end": "1961-08-31",
},
}
xds = xd.Query(**query)
By accessing the data attribute, you can view the data obtained from the query. For each variable, the dimensions of time, latitude, longitude, and Station (the unique ID) are included. In addition, there is another variable called weights that is returned. This variable specifies the weight that should be assigned to each pixel if spatial averaging is conducted over a mask (polygon).
[10]:
xds.data
[10]:
<xarray.Dataset> Size: 2MB
Dimensions: (Station: 2, time: 23376, latitude: 3, longitude: 2, source: 1)
Coordinates:
* Station (Station) object 16B '042102' '042103'
* time (time) datetime64[ns] 187kB 1959-01-01 ... 1961-08-31T23:00:00
* latitude (latitude) float64 24B 46.25 46.5 46.75
* longitude (longitude) float64 16B -78.5 -78.25
* source (source) <U29 116B 'era5_reanalysis_single_levels'
Data variables:
t2m (Station, time, latitude, longitude) float32 1MB 260.3 ... nan
tp (Station, time, latitude, longitude) float32 1MB nan nan ... nan
weights (Station, latitude, longitude) float64 96B 0.01953 0.1244 ... nan
Attributes:
pangeo-forge:inputs_hash: 1f36b436501a00782beae12abecf39fe812c7ce36b6b23...
pangeo-forge:recipe_hash: f61831cdda3c85ffc5f263f50f939a5a44733f96f67bb1...
pangeo-forge:version: 0.9.4Weights are much easier to comprehend visually, so let’s examine the weights returned for the station 042102. Notice that when selecting a single feature (Station 042102 in this case), the shape of our spatial dimensions is reduced to a 3x2 pixel area (longitude x latitude) that encompasses the entire feature.
[11]:
station = "042102"
ds_station = xds.data.sel(Station=station)
ds_clipped = xds.bbox_clip(ds_station).squeeze()
ds_clipped
[11]:
<xarray.Dataset> Size: 1MB
Dimensions: (time: 23376, latitude: 3, longitude: 2)
Coordinates:
* time (time) datetime64[ns] 187kB 1959-01-01 ... 1961-08-31T23:00:00
* latitude (latitude) float64 24B 46.25 46.5 46.75
* longitude (longitude) float64 16B -78.5 -78.25
Station <U6 24B '042102'
source <U29 116B 'era5_reanalysis_single_levels'
Data variables:
t2m (time, latitude, longitude) float32 561kB 260.3 259.9 ... nan
tp (time, latitude, longitude) float32 561kB nan nan nan ... 0.0 nan
weights (latitude, longitude) float64 48B 0.01953 0.1244 ... 0.0481 nan
Attributes:
pangeo-forge:inputs_hash: 1f36b436501a00782beae12abecf39fe812c7ce36b6b23...
pangeo-forge:recipe_hash: f61831cdda3c85ffc5f263f50f939a5a44733f96f67bb1...
pangeo-forge:version: 0.9.4[12]:
(
(
ds_clipped.t2m.isel(time=0).hvplot(
title="The 2m temperature for pixels that intersect with the polygon on January 1, 1959",
tiles="ESRI",
geo=True,
alpha=0.6,
colormap="isolum",
width=750,
height=450,
)
* gdf[gdf.Station == station].hvplot(
geo=True,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
)
+ ds_clipped.weights.hvplot(
title="The weights that should be assigned to each pixel when performing spatial averaging",
tiles="ESRI",
alpha=0.6,
colormap="isolum",
geo=True,
width=750,
height=450,
)
* gdf[gdf.Station == station].hvplot(
geo=True,
width=750,
height=450,
legend="top",
hover_cols=["Station", "Superficie"],
)
).cols(1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[12], line 3
1 (
2 (
----> 3 ds_clipped.t2m.isel(time=0).hvplot(
4 title="The 2m temperature for pixels that intersect with the polygon on January 1, 1959",
5 tiles="ESRI",
6 geo=True,
7 alpha=0.6,
8 colormap="isolum",
9 width=750,
10 height=450,
11 )
12 * gdf[gdf.Station == station].hvplot(
13 geo=True,
14 width=750,
15 height=450,
16 legend="top",
17 hover_cols=["Station", "Superficie"],
18 )
19 )
20 + ds_clipped.weights.hvplot(
21 title="The weights that should be assigned to each pixel when performing spatial averaging",
22 tiles="ESRI",
23 alpha=0.6,
24 colormap="isolum",
25 geo=True,
26 width=750,
27 height=450,
28 )
29 * gdf[gdf.Station == station].hvplot(
30 geo=True,
31 width=750,
32 height=450,
33 legend="top",
34 hover_cols=["Station", "Superficie"],
35 )
36 ).cols(1)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/plotting/core.py:95, in hvPlotBase.__call__(self, x, y, kind, **kwds)
92 plot = self._get_converter(x, y, kind, **kwds)(kind, x, y)
93 return pn.panel(plot, **panel_dict)
---> 95 return self._get_converter(x, y, kind, **kwds)(kind, x, y)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1671, in HoloViewsConverter.__call__(self, kind, x, y)
1668 obj = gv.project(obj, projection=self.output_projection)
1670 if not (self.datashade or self.rasterize or self.downsample):
-> 1671 layers = self._apply_layers(obj)
1672 layers = _transfer_opts_cur_backend(layers)
1673 return layers
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1826, in HoloViewsConverter._apply_layers(self, obj)
1824 tiles = self._get_tiles(self.tiles)
1825 else:
-> 1826 tiles = self._get_tiles(self.tiles, lib='geoviews')
1827 tiles = tiles.opts(clone=True, **self.tiles_opts)
1828 obj = tiles * obj
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1868, in HoloViewsConverter._get_tiles(self, source, lib)
1858 if tiles is None:
1859 msg = (
1860 f'{tile_source} tiles not recognized. tiles must be either True, a '
1861 'xyzservices.TileProvider instance, a HoloViews'
(...) 1866 + '.'
1867 )
-> 1868 raise ValueError(msg)
1869 return tiles
ValueError: EsriImagery tiles not recognized. tiles must be either True, a xyzservices.TileProvider instance, a HoloViews or Geoviews basemap string (one of CartoDark, CartoEco, CartoLight, CartoMidnight, ESRI, OSM, StamenLabels, StamenTerrain, StamenTerrainRetina, StamenToner, StamenTonerBackground, StamenWatercolor, Wikipedia), a HoloViews Tiles instance, a Geoviews WMTS instance.
The two plots depicted above show the 2m temperature for each pixel that intersects with the polygon from Station 042102 and the corresponding weights to be applied to each pixel. In the lower plot, it is apparent that the majority of the polygon is situated in the central pixels, which results in those pixels having a weight of approximately 80%. It is evident that the two lower and the upper pixels have much less intersection with the polygon, which results in their respective weights
being smaller (hover on the plot to verify the weights).
In various libraries, either all pixels that intersect with the geometries are kept, or only pixels with centers within the polygon are retained. However, as shown in the previous example, utilizing such methods can introduce significant biases in the final calculations.
Clip on polygons with averaging in space¶
The following query seeks the variables t2m and tp from the era5_reanalysis_single_levels and era5_land_reanalysis datasets, covering the period between January 1, 2014, to December 31, 2023, for the three polygons mentioned earlier. Note that when the aggregation parameter is set to True, spatial averaging takes place. In addition, the weighted mask (raster) described earlier will be applied to generate a time series for each polygon.
Additional steps are carried out in the process, including converting the original hourly time step to a daily time step. During this conversion, various temporal aggregations will be applied to each variable and a conversion to the local time zone will take place.
Note If users prefer to pass multiple dictionaries instead of a single large one, the following format is also considered acceptable.
[13]:
datasets = {
"era5_reanalysis_single_levels": {"variables": ["t2m", "tp"]},
"era5_land_reanalysis": {"variables": ["t2m", "tp"]},
}
space = {
"clip": "polygon", # bbox, point or polygon
"averaging": True,
"geometry": gdf, # 3 polygons
"unique_id": "Station",
}
time = {
"timestep": "D",
"aggregation": {"tp": [np.nansum], "t2m": [np.nanmax, np.nanmin]},
"start": "2014-01-01",
"end": "2023-12-31",
"timezone": "America/Montreal",
}
xds = xd.Query(datasets=datasets, space=space, time=time)
Temporal operations: processing tp with era5_reanalysis_single_levels: 100%|██████████| 2/2 [00:00<00:00, 2.87it/s]
Temporal operations: processing tp with era5_land_reanalysis: 100%|██████████| 2/2 [00:00<00:00, 2.90it/s]
[14]:
xds.data
[14]:
<xarray.Dataset> Size: 380kB
Dimensions: (spatial_agg: 1, timestep: 1, Station: 2, time: 3652, source: 2)
Coordinates:
* spatial_agg (spatial_agg) object 8B 'polygon'
* timestep (timestep) object 8B 'D'
* Station (Station) object 16B '042102' '042103'
* time (time) datetime64[ns] 29kB 2014-01-01 2014-01-02 ... 2023-12-31
* source (source) <U29 232B 'era5_land_reanalysis' 'era5_reanalysis_s...
Data variables:
t2m_nanmax (spatial_agg, timestep, Station, time, source) float64 117kB ...
t2m_nanmin (spatial_agg, timestep, Station, time, source) float64 117kB ...
tp_nansum (spatial_agg, timestep, Station, time, source) float64 117kB ...
Attributes: (12/30)
GRIB_NV: 0
GRIB_Nx: 1440
GRIB_Ny: 721
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: an
... ...
GRIB_totalNumber: 0
GRIB_typeOfLevel: surface
GRIB_units: K
long_name: 2 metre temperature
standard_name: unknown
units: K[15]:
(
xds.data[["t2m_nanmax", "t2m_nanmin"]]
.squeeze()
.hvplot(
x="time",
groupby=["Station", "source"],
width=750,
height=400,
grid=True,
widget_location="bottom",
)
)
[15]:
The resulting dataset can be explored for the total_precipitation (tp) data attribute :
[16]:
(
xds.data[["tp_nansum"]]
.squeeze()
.hvplot(
x="time",
groupby=["Station", "source"],
width=750,
height=400,
grid=True,
widget_location="bottom",
color="blue",
)
)
[16]:
Bounding box (bbox) around polygons¶
The following query seeks the variable tp from the era5_land_reanalysis_dev dataset, covering the period between January 1, 1959, and December 31, 1970, for the bounding box that delimits the three polygons mentioned earlier.
Additional steps are carried out in the process, including converting to the local time zone.
[17]:
[18]:
xds.data
[18]:
<xarray.Dataset> Size: 42MB
Dimensions: (time: 105192, latitude: 9, longitude: 11, source: 1)
Coordinates:
* time (time) datetime64[ns] 842kB 1969-01-01 ... 1980-12-31T23:00:00
* latitude (latitude) float64 72B 46.9 46.8 46.7 46.6 ... 46.3 46.2 46.1
* longitude (longitude) float64 88B -78.9 -78.8 -78.7 ... -78.1 -78.0 -77.9
* source (source) <U20 80B 'era5_land_reanalysis'
Data variables:
tp (time, latitude, longitude) float32 42MB 5.794e-05 ... 0.0
Attributes:
pangeo-forge:inputs_hash: 6da291a73c895c6436c167c57fec8ad71779e56907f4e3...
pangeo-forge:recipe_hash: ab855142f36b09416158e383739fb6421346103d72aeb6...
pangeo-forge:version: 0.9.4Let’s find out which day (24-hour period) was the rainiest in the entire region for the data retrieved in previous cell.
[19]:
indexer = (
xds.data.sel(source="era5_land_reanalysis")
.tp.sum(["latitude", "longitude"])
.rolling(time=24)
.sum()
.argmax("time")
.values
)
xds.data.isel(time=indexer).time.dt.date.values.tolist()
[19]:
datetime.date(1980, 6, 20)
Let’s visualise the evolution of the hourly precipitation during that day. Note that each image (raster) delimits exactly the bounding box required to cover all polygons in the query. Please note that for full interactivity, running the code in a Jupyter Notebook is necessary.
[20]:
da = xds.data.tp.isel(time=slice(indexer - 24, indexer))
da = da.where(da > 0.0001, drop=True)
(da * 1000).squeeze().hvplot.quadmesh(
width=750,
height=450,
geo=True,
tiles="ESRI",
groupby=["time"],
legend="top",
cmap="gist_ncar",
widget_location="bottom",
widget_type="scrubber",
dynamic=False,
clim=(0.01, 10),
)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[20], line 4
1 da = xds.data.tp.isel(time=slice(indexer - 24, indexer))
2 da = da.where(da > 0.0001, drop=True)
----> 4 (da * 1000).squeeze().hvplot.quadmesh(
5 width=750,
6 height=450,
7 geo=True,
8 tiles="ESRI",
9 groupby=["time"],
10 legend="top",
11 cmap="gist_ncar",
12 widget_location="bottom",
13 widget_type="scrubber",
14 dynamic=False,
15 clim=(0.01, 10),
16 )
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/plotting/core.py:2163, in hvPlot.quadmesh(self, x, y, z, colorbar, **kwds)
2114 def quadmesh(self, x=None, y=None, z=None, colorbar=True, **kwds):
2115 """
2116 QuadMesh plot
2117
(...) 2161 - HoloViews: https://holoviews.org/reference/elements/bokeh/QuadMesh.html
2162 """
-> 2163 return self(x, y, z=z, kind='quadmesh', colorbar=colorbar, **kwds)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/plotting/core.py:92, in hvPlotBase.__call__(self, x, y, kind, **kwds)
90 return pn.panel(callback)
91 if panel_dict:
---> 92 plot = self._get_converter(x, y, kind, **kwds)(kind, x, y)
93 return pn.panel(plot, **panel_dict)
95 return self._get_converter(x, y, kind, **kwds)(kind, x, y)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1671, in HoloViewsConverter.__call__(self, kind, x, y)
1668 obj = gv.project(obj, projection=self.output_projection)
1670 if not (self.datashade or self.rasterize or self.downsample):
-> 1671 layers = self._apply_layers(obj)
1672 layers = _transfer_opts_cur_backend(layers)
1673 return layers
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1826, in HoloViewsConverter._apply_layers(self, obj)
1824 tiles = self._get_tiles(self.tiles)
1825 else:
-> 1826 tiles = self._get_tiles(self.tiles, lib='geoviews')
1827 tiles = tiles.opts(clone=True, **self.tiles_opts)
1828 obj = tiles * obj
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/hvplot/converter.py:1868, in HoloViewsConverter._get_tiles(self, source, lib)
1858 if tiles is None:
1859 msg = (
1860 f'{tile_source} tiles not recognized. tiles must be either True, a '
1861 'xyzservices.TileProvider instance, a HoloViews'
(...) 1866 + '.'
1867 )
-> 1868 raise ValueError(msg)
1869 return tiles
ValueError: EsriImagery tiles not recognized. tiles must be either True, a xyzservices.TileProvider instance, a HoloViews or Geoviews basemap string (one of CartoDark, CartoEco, CartoLight, CartoMidnight, ESRI, OSM, StamenLabels, StamenTerrain, StamenTerrainRetina, StamenToner, StamenTonerBackground, StamenWatercolor, Wikipedia), a HoloViews Tiles instance, a Geoviews WMTS instance.
Query hydrological datasets¶
Hydrological queries are still being tested and output format is likely to change. Stay tuned!
[21]:
<xarray.Dataset> Size: 1GB
Dimensions: (id: 745, time: 60631, variable: 2, spatial_agg: 2,
timestep: 1, time_agg: 1, source: 1)
Coordinates: (12/15)
* id (id) object 6kB '010101' '010801' ... '104804' '120201'
* time (time) datetime64[ns] 485kB 1860-01-01 ... 2025-12-31
* variable (variable) object 16B 'level' 'streamflow'
* spatial_agg (spatial_agg) object 16B 'point' 'watershed'
* timestep (timestep) object 8B 'D'
* time_agg (time_agg) object 8B 'mean'
... ...
latitude (id) float32 3kB 48.48 48.1 48.19 48.19 ... 56.78 55.86 46.97
longitude (id) float32 3kB -64.53 -65.46 -65.56 ... -64.88 -70.86
name (id) object 6kB 'La Grande Rivière' ... 'Dauphine'
province (id) object 6kB 'QC' 'QC' 'QC' 'QC' ... 'QC' 'QC' 'QC' 'QC'
regulated (id) object 6kB 'Natural' 'Natural' ... 'Natural' 'Natural'
start_date (variable, id, spatial_agg, timestep, time_agg, source) datetime64[ns] 24kB ...
Data variables:
level (id, time, variable, spatial_agg, timestep, time_agg, source) float32 723MB dask.array<chunksize=(1, 60631, 1, 1, 1, 1, 1), meta=np.ndarray>
streamflow (id, time, variable, spatial_agg, timestep, time_agg, source) float32 723MB dask.array<chunksize=(1, 60631, 1, 1, 1, 1, 1), meta=np.ndarray>[22]:
ds = (
xd.Query(
**{
"datasets": {
"deh": {
"id": ["020*"],
"regulated": ["Natural"],
"variables": ["streamflow"],
}
},
"time": {"start": "1970-01-01", "minimum_duration": (10 * 365, "d")},
}
)
.data.squeeze()
.load()
)
ds
[22]:
<xarray.Dataset> Size: 737kB
Dimensions: (id: 7, time: 20454)
Coordinates: (12/15)
* id (id) object 56B '020301' '020302' ... '020602' '020802'
* time (time) datetime64[ns] 164kB 1970-01-01 ... 2025-12-31
drainage_area (id) float32 28B 1.09e+03 1.09e+03 1.01e+03 ... 626.0 1.2e+03
end_date (id) datetime64[ns] 56B 1989-05-22 2006-10-13 ... 1996-08-13
latitude (id) float32 28B 48.77 48.77 48.83 48.81 48.98 48.98 49.2
longitude (id) float32 28B -64.52 -64.52 -64.63 ... -64.43 -64.7 -65.29
... ...
source <U102 408B 'Ministère de l’Environnement, de la Lutte cont...
spatial_agg <U9 36B 'watershed'
start_date (id) datetime64[ns] 56B 1979-05-15 1989-08-12 ... 1970-01-01
time_agg <U4 16B 'mean'
timestep <U1 4B 'D'
variable <U10 40B 'streamflow'
Data variables:
streamflow (id, time) float32 573kB nan nan nan nan ... nan nan nan nan
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[23], line 2
1 query = {"datasets": "hydat"}
----> 2 xds = xd.Query(**query)
3 xds.data
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/xdatasets/core.py:130, in Query.__init__(self, datasets, space, time, catalog_path)
127 self.space = self._resolve_space_params(**space)
128 self.time = self._resolve_time_params(**time)
--> 130 self.load_query(datasets=self.datasets, space=self.space, time=self.time)
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/xdatasets/core.py:258, in Query.load_query(self, datasets, space, time)
255 except: # noqa: S110
256 pass
--> 258 ds_one = self._process_one_dataset(
259 dataset_name=dataset_name,
260 variables=variables_name,
261 space=space,
262 time=time,
263 **kwargs,
264 )
265 dsets.append(ds_one)
267 try:
268 # Try naively merging datasets into single dataset
File ~/checkouts/readthedocs.org/user_builds/xdatasets/conda/stable/lib/python3.12/site-packages/xdatasets/core.py:301, in Query._process_one_dataset(self, dataset_name, variables, space, time, **kwargs)
298 dataset_category = "user-provided"
300 elif isinstance(dataset_name, str):
--> 301 dataset_category = [
302 category for category in self.catalog._entries.keys() for name in self.catalog[category]._entries.keys() if name == dataset_name
303 ][0]
305 if dataset_category in ["atmosphere"]:
306 with warnings.catch_warnings():
IndexError: list index out of range