xdatasets package¶

Easy access to Earth observation datasets with xarray.

Submodules¶

xdatasets.core module¶

class xdatasets.core.Query(datasets: str | List[str] | Dict[str, str | List[str]], space: Dict[str, str | List[str]] = {}, time: Dict[str, str | List[str]] = {}, catalog_path: str = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml')[source]¶

Bases: object

The Query class.

The Query interface facilitates access to analysis-ready earth observation datasets and allows for spatiotemporal operations to be performed based on user queries.

Attributes¶

datasetsstr, list, dict-like: If a str, a dataset name, i.e.: era5_land_reanalysis. If a list, a list of dataset names, i.e.: [era5_single_levels_reanalysis, era5_land_reanalysis]. If a dictionary, it should map dataset names to their corresponding requested content such as some desired variables. See the notes below for more details. The list of available datasets in this library is coming soon!
spacedict-like: A dictionary that maps spatial parameters with their corresponding value. More information on accepted key/value pairs : _resolve_space_params()
timedict-like: A dictionary that maps temporal parameters with their corresponding value. More information on accepted key/value pairs : _resolve_time_params()
catalog_pathstr: URL for the intake catalog which provides access to the datasets. While this library provides its own intake catalog, users have the option to provide their own catalog, which can be particularly beneficial for private datasets or if different configurations are needed.

Notes¶

The dictionary approach allows more flexibility in the request. i.e.:

>>> query = {
...     era5_land_reanalysis: {"variables": ["t2m", "tp"]},
...     era5_single_levels_reanalysis: {"variables": "t2m"},
... }

Currently, accepted key, value pairs for a mapping argument include the following:

>>> {"variables": Union[str, List[str]]}

Examples¶

Create data:

>>> sites = {
...     "Montreal": (45.508888, -73.561668),
...     "New York": (40.730610, -73.935242),
...     "Miami": (25.761681, -80.191788),
... }

>>> query = {
...     "datasets": "era5_land_reanalysis_dev",
...     "space": {"clip": "point", "geometry": sites},
...     "time": {
...         "timestep": "D",
...         "averaging": {"tp": np.nansum, "t2m": np.nanmean},
...         "start": "1950-01-01",
...         "end": "1955-12-31",
...         "timezone": "America/Montreal",
...     },
... }
>>> xds = xd.Query(**query)
>>> xds.data
<xarray.Dataset>
Dimensions:      (site: 3, time: 2191, source: 1)
Coordinates:
    latitude     (site) float64 45.5 40.7 25.8
    longitude    (site) float64 -73.6 -73.9 -80.2
  * site         (site) <U8 'Montreal' 'New York' 'Miami'
  * time         (time) datetime64[ns] 1950-01-01 1950-01-02 ... 1955-12-31
  * source       (source) <U24 'era5_land_reanalysis_dev'
Data variables:
    t2m_nanmean  (time, site, source) float32 269.6 273.8 294.3 ... 268.1 292.1
    tp_nansum    (time, site, source) float32 0.0004192 2.792e-06 ... 0.0001207
Attributes:
    pangeo-forge:inputs_hash:  1622c0abe9326bfa4d6ee6cdf817fccb1ef1661046f30f...
    pangeo-forge:recipe_hash:  f2b6c75f28693bbae820161d5b71ebdb9d740dcdde0666...
    pangeo-forge:version:      0.9.4

bbox_clip(ds, variable='weights')[source]¶

load_query(datasets: str | Dict[str, str | List[str]], space: Dict[str, str | List[str]], time)[source]¶

xdatasets.scripting module¶

xdatasets.spatial module¶

xdatasets.spatial.aggregate(ds_in, ds_weights)[source]¶

xdatasets.spatial.bbox_ds(ds_copy, geom)[source]¶

xdatasets.spatial.clip_by_bbox(ds, space, dataset_name)[source]¶

xdatasets.spatial.clip_by_point(ds, space, dataset_name)[source]¶

xdatasets.spatial.clip_by_polygon(ds, space, dataset_name)[source]¶

xdatasets.spatial.create_weights_mask(da, poly)[source]¶

xdatasets.temporal module¶

xdatasets.temporal.ajust_dates(ds, time)[source]¶

xdatasets.temporal.change_timezone(ds, input_timezone, output_timezone=None)[source]¶

xdatasets.temporal.minimum_duration(ds, time)[source]¶

xdatasets.temporal.temporal_aggregation(ds, time, dataset_name, spatial_agg)[source]¶

xdatasets.tutorial module¶

xdatasets.tutorial.list_available_datasets()[source]¶: Open, load lazily, and close a dataset from the public online repository (requires internet).

See Also¶

open_dataset

xdatasets.tutorial.load_dataset(*args, **kwargs)[source]¶: Open, load lazily, and close a dataset from the online repository (requires internet).

See Also¶

open_dataset

xdatasets.tutorial.open_dataset(name: str, **kws)[source]¶

Open a dataset from the online public repository (requires internet).

Available datasets: * "era5_reanalysis_single_levels": ERA5 reanalysis subset (t2m and tp) * "cehq": CEHQ flow and water levels observations

Parameters¶

namestr: Name of the file containing the dataset. e.g. ‘era5_reanalysis_single_levels’
**kwsdict, optional: Passed to xarray.open_dataset

xdatasets.utils module¶

class xdatasets.utils.HiddenPrints[source]¶: Bases: object

xdatasets.utils.cache_catalog(url)[source]¶

Cache the catalog in the system’s temporary folder for easier access.

This is especially useful when working behind firewalls or if the remote server containing the yaml files is down. Looks for http_proxy/https_proxy environment variable if the request goes through a proxy.

Parameters¶

urlstr: URL for the intake catalog which provides access to the datasets. While this library provides its own intake catalog, users have the option to provide their own catalog, which can be particularly beneficial for private datasets or if different configurations are needed.

xdatasets.utils.open_dataset(name: str, catalog, **kws)[source]¶

Open a dataset from the online public repository (requires internet).

Notes¶

Available datasets:: “era5_reanalysis_single_levels”: ERA5 reanalysis subset (t2m and tp) “cehq”: CEHQ flow and water levels observations

Parameters¶

namestr: Name of the file containing the dataset. e.g. ‘era5_reanalysis_single_levels’
**kwsdict, optional: Passed to xarray.open_dataset

xdatasets.validations module¶

xdatasets.workflows module¶

xdatasets.workflows.climate_request(dataset_name, variables, space, time, catalog)[source]¶

xdatasets.workflows.gis_request(dataset_name, variables, space, time, catalog, **kwargs)[source]¶

xdatasets.workflows.hydrometric_request(dataset_name, variables, space, time, catalog, **kwargs)[source]¶

xdatasets.workflows.user_provided_dataset(dataset_name, variables, space, time, ds)[source]¶

xdatasets package¶

Submodules¶

xdatasets.core module¶

Attributes¶

Notes¶

Examples¶

xdatasets.scripting module¶

xdatasets.spatial module¶

xdatasets.temporal module¶

xdatasets.tutorial module¶

See Also¶

See Also¶

Parameters¶

See Also¶

xdatasets.utils module¶

Parameters¶

Notes¶

Parameters¶

See Also¶

xdatasets.validations module¶

xdatasets.workflows module¶