API Reference¶
xarray Accessors¶
The primary interface. Registered automatically when you import extractrs.
Use the accessor for quick, one-off computations. For multi-variable or multi-file workflows where the grid and polygons stay the same, use the Low-Level API to build the cache once and reuse it.
Dataset Accessor¶
extractrs.ExtractrsDatasetAccessor
¶
xarray Dataset accessor for extractrs zonal statistics.
zonal_stats(gdf, stat='mean', id_col=None, var=None, vars=None, method='exact')
¶
Compute zonal statistics for each geometry in a GeoDataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
Polygons to compute statistics for. Must be in the same CRS as the raster. |
required |
stat
|
str
|
Statistic to compute: "mean", "sum", "count", "min", "max", "variance", "stdev", "weighted_mean", "weighted_sum". |
'mean'
|
id_col
|
str
|
Column in gdf to use as basin IDs. If None, uses integer index. |
None
|
var
|
str
|
Single data variable to process. |
None
|
vars
|
list of str
|
Subset of data variables to process. If neither var nor vars is given, all data variables are processed. |
None
|
method
|
str
|
Coverage method: "exact" for fractional pixel coverage, "center" for cell-center point-in-polygon (binary 0/1). |
"exact"
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with spatial dims replaced by basin/id dimension. |
DataArray Accessor¶
extractrs.ExtractrsDataArrayAccessor
¶
xarray DataArray accessor for extractrs zonal statistics.
zonal_stats(gdf, stat='mean', id_col=None, method='exact')
¶
Compute zonal statistics for each geometry in a GeoDataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
Polygons to compute statistics for. Must be in the same CRS as the raster. |
required |
stat
|
str
|
Statistic to compute: "mean", "sum", "count", "min", "max", "variance", "stdev", "weighted_mean", "weighted_sum". |
'mean'
|
id_col
|
str
|
Column in gdf to use as basin IDs. If None, uses integer index. |
None
|
method
|
str
|
Coverage method: "exact" for fractional pixel coverage, "center" for cell-center point-in-polygon (binary 0/1). |
"exact"
|
Returns:
| Type | Description |
|---|---|
DataArray
|
DataArray with spatial dims replaced by basin/id dimension. |
Low-Level API¶
For workflows that process many timesteps or multiple variables against the same geometries, use the low-level API to build the cache once and apply statistics repeatedly. This avoids redundant cache builds — the accessor rebuilds the cache on every call.
The functions come in four variants:
| Single timestep (2D array) | Batch (3D array) | |
|---|---|---|
| Unweighted | apply_stat |
apply_stat_batch |
| Weighted | apply_stat_weights |
apply_stat_batch_weights |
Example¶
import numpy as np
import geopandas as gpd
import xarray as xr
from extractrs import build_cache, apply_stat_batch
ds = xr.open_dataset("daymet_tmin_2024.nc")
basins = gpd.read_file("basins.shp")
# Prepare inputs for build_cache
wkb_list = [geom.wkb for geom in basins.geometry]
id_list = basins["COMID"].astype(np.int64).tolist()
# Compute grid extent from cell centers (add half-cell padding for cell edges)
x = ds["x"].values.astype(np.float64)
y = ds["y"].values.astype(np.float64)
dx = abs(float(x[1] - x[0]))
dy = abs(float(y[1] - y[0]))
xmin = float(x.min()) - dx / 2
xmax = float(x.max()) + dx / 2
ymin = float(y.min()) - dy / 2
ymax = float(y.max()) + dy / 2
# Build cache once
cache = build_cache(wkb_list, id_list, xmin, ymin, xmax, ymax, dx, dy)
# Apply to multiple variables using the same cache
tmin = ds["tmin"].values.astype(np.float64) # shape: (time, y, x)
tmax = ds["tmax"].values.astype(np.float64)
nodata = -9999.0
tmin_means = apply_stat_batch(cache, tmin, nodata, "mean") # shape: (time, n_basins)
tmax_means = apply_stat_batch(cache, tmax, nodata, "mean")
Grid extent is cell edges, not cell centers
xmin, ymin, xmax, ymax passed to build_cache define the grid extent at cell edges. If your coordinates are cell centers (as is typical in xarray), add half a cell width: xmin = x_centers.min() - dx / 2. The xarray accessor does this automatically.
build_cache¶
build_cache(
wkb_list: list[bytes],
id_list: list[int],
xmin: float,
ymin: float,
xmax: float,
ymax: float,
dx: float,
dy: float,
method: str = "exact",
) -> CoverageCache
Build a coverage cache from WKB geometries and grid parameters.
Parameters:
- wkb_list — List of WKB-encoded geometry bytes (one per polygon). Obtain from a GeoDataFrame with
[geom.wkb for geom in gdf.geometry]. Supports Polygon and MultiPolygon types. - id_list — List of integer IDs, same length as
wkb_list. - xmin, ymin, xmax, ymax — Grid extent at cell edges (not cell centers).
- dx, dy — Cell size in the x and y directions.
- method —
"exact"(default) for fractional pixel coverage,"center"for cell-center point-in-polygon.
Returns: A CoverageCache object.
Raises: ValueError if wkb_list and id_list have different lengths, or if method is not recognized.
apply_stat¶
apply_stat(
cache: CoverageCache,
raster: numpy.ndarray, # shape: (rows, cols), dtype: float64
nodata: float,
stat: str,
) -> numpy.ndarray # shape: (n_basins,), dtype: float64
Apply a statistic to a single 2D raster using the coverage cache.
Parameters:
- cache —
CoverageCachefrombuild_cache(). - raster — 2D numpy array of raster values. Must be
float64. - nodata — Nodata value to exclude from statistics. NaN values are always excluded regardless.
- stat — Statistic name (see Supported Statistics).
Returns: 1D numpy array of length n_basins with one result per zone.
apply_stat_batch¶
apply_stat_batch(
cache: CoverageCache,
raster: numpy.ndarray, # shape: (time, rows, cols), dtype: float64
nodata: float,
stat: str,
) -> numpy.ndarray # shape: (time, n_basins), dtype: float64
Apply a statistic to a 3D raster (time, rows, cols) using the coverage cache. Processes all timesteps in a single call.
Parameters:
- cache —
CoverageCachefrombuild_cache(). - raster — 3D numpy array. Must be
float64. - nodata — Nodata value to exclude.
- stat — Statistic name.
Returns: 2D numpy array of shape (time, n_basins).
apply_stat_weights¶
apply_stat_weights(
cache: CoverageCache,
raster: numpy.ndarray, # shape: (rows, cols), dtype: float64
weights: numpy.ndarray, # shape: (rows, cols), dtype: float64
nodata: float,
weight_nodata: float,
stat: str,
) -> numpy.ndarray # shape: (n_basins,), dtype: float64
Apply a weighted statistic to a single 2D raster. Use this for weighted_mean or weighted_sum with an external weight raster (e.g., cell-area weights for geographic grids).
Parameters:
- cache —
CoverageCachefrombuild_cache(). - raster — 2D numpy array of raster values. Must be
float64. - weights — 2D numpy array of weight values, same shape as
raster. Must befloat64. - nodata — Nodata value for the raster.
- weight_nodata — Nodata value for the weights array.
- stat — Statistic name.
Returns: 1D numpy array of length n_basins.
apply_stat_batch_weights¶
apply_stat_batch_weights(
cache: CoverageCache,
raster: numpy.ndarray, # shape: (time, rows, cols), dtype: float64
weights: numpy.ndarray, # shape: (rows, cols), dtype: float64
nodata: float,
weight_nodata: float,
stat: str,
) -> numpy.ndarray # shape: (time, n_basins), dtype: float64
Apply a weighted statistic to a 3D raster with 2D weights. The same weight array is applied to every timestep.
Parameters:
- cache —
CoverageCachefrombuild_cache(). - raster — 3D numpy array. Must be
float64. - weights — 2D numpy array, same spatial shape as each raster slice. Must be
float64. - nodata — Nodata value for the raster.
- weight_nodata — Nodata value for the weights.
- stat — Statistic name.
Returns: 2D numpy array of shape (time, n_basins).
CoverageCache¶
An opaque object returned by build_cache. Pass it to the apply_stat* functions. It stores the precomputed coverage fractions for each polygon against the grid.
- Tied to a specific grid geometry — reuse it only with rasters on the same grid
- Cannot be serialized to disk or modified after creation
- Not thread-safe
Properties:
n_basins(int) — Number of basins in the cache. May be less than the number of input geometries if some polygons fall outside the grid extent or have empty geometry.
Methods:
ids()→numpy.ndarray[int64]— Basin IDs in cache order.
Supported Statistics¶
| Stat | Formula | Notes |
|---|---|---|
mean |
sum(x * c) / sum(c) |
Coverage-weighted mean |
sum |
sum(x * c) |
Coverage-weighted sum |
count |
sum(c) |
Sum of coverage fractions (returns a float, not an integer cell count) |
min |
min(x) where c > 0 |
Minimum value of any covered cell |
max |
max(x) where c > 0 |
Maximum value of any covered cell |
variance |
coverage-weighted | Population variance (divides by total weight, not N-1) |
stdev |
coverage-weighted | Population standard deviation |
weighted_mean |
sum(x * c * w) / sum(c * w) |
Requires apply_stat_weights or apply_stat_batch_weights |
weighted_sum |
sum(x * c * w) |
Requires apply_stat_weights or apply_stat_batch_weights |
Where x = cell value, c = coverage fraction, w = external weight.
Cells with nodata or NaN values are excluded from all statistics. A zone where all covered cells are nodata returns NaN.
weighted_mean and weighted_sum are only available through the low-level API (apply_stat_weights, apply_stat_batch_weights). Passing these stat names to the xarray accessor will run with all weights set to 1.0, producing results identical to mean and sum.