Binsparse COO Format¶
DDR uses a zarr-based storage format for sparse COO (Coordinate) matrices, inspired by the binsparse specification and binsparse-python. This format efficiently stores river network connectivity for routing computations.
Format Overview¶
Each adjacency matrix is stored as a zarr v3 group containing arrays and metadata attributes.
Arrays¶
| Array | Type | Description |
|---|---|---|
indices_0 |
int32 | Row indices (downstream segment indices) |
indices_1 |
int32 | Column indices (upstream segment indices) |
values |
uint8 | Matrix values (1 for connected, 0 otherwise) |
order |
int32 | Topological sort order as domain-specific IDs |
Attributes¶
| Attribute | Type | Description |
|---|---|---|
format |
str | Always "COO" |
shape |
[int, int] | Matrix dimensions [rows, cols] |
geodataset |
str | Geodataset type (e.g., "merit", "lynker") for auto-detection |
data_types |
dict | Dtype strings for each array |
gage_catchment |
int/str | Origin catchment ID (gauge subsets only) |
gage_idx |
int | CONUS matrix index (gauge subsets only) |
Matrix Structure¶
The adjacency matrix is lower triangular, where A[i, j] = 1 indicates that flow goes from segment j (column) to segment i (row). This structure ensures topological ordering: upstream segments always have lower indices than downstream segments.
0 1 2 3 4 (upstream)
┌───────────────┐
0 │ 0 │ Flow direction: column → row
1 │ 1 0 │ Example: A[1,0]=1 means 0→1
2 │ 0 1 0 │ A[2,1]=1 means 1→2
3 │ 0 0 1 0 │ A[4,3]=1 means 3→4
4 │ 0 0 1 1 0 │ A[4,2]=1 means 2→4
└───────────────┘
(downstream)
Geodataset Types¶
Different geodatasets use different ID formats. The geodataset attribute stored in zarr metadata enables automatic detection when reading.
Supported Geodatasets¶
| Name | ID Format | Example IDs |
|---|---|---|
merit |
Integer COMIDs | 12345, 12346, 12347 |
lynker |
String wb-* IDs | "wb-123", "wb-456" |
hydrofabric_v2.2 |
Alias for lynker |
Same as lynker |
Listing Available Geodatasets¶
from ddr_engine import list_geodatasets
print(list_geodatasets()) # ['hydrofabric_v2.2', 'lynker', 'merit']
Registering Custom Geodatasets¶
from ddr_engine import register_converter
class MyConverter:
def to_zarr(self, ids):
return np.array(ids, dtype=np.int32)
def from_zarr(self, order):
return order.tolist()
register_converter("my_geodataset", MyConverter())
Reading Adjacency Matrices¶
Auto-Detection (Recommended)¶
The simplest way to read a COO matrix - the geodataset type is automatically detected from metadata:
from pathlib import Path
from ddr_engine import coo_from_zarr
# Auto-detects hydrofabric from metadata
coo, ts_order = coo_from_zarr(Path("data/merit_conus_adjacency.zarr"))
# coo: scipy.sparse.coo_matrix
# ts_order: list of domain-specific IDs (int for MERIT, str for Lynker)
Dataset-Specific Functions¶
For type-hinted return values, use the dataset-specific functions:
from pathlib import Path
from ddr_engine.merit.io import coo_from_zarr
# MERIT - returns COMIDs as integers
coo, ts_order = coo_from_zarr(Path("data/merit_conus_adjacency.zarr"))
# ts_order: list[int]
from ddr_engine.lynker_hydrofabric.io import coo_from_zarr
# Lynker - returns wb-* strings
coo, ts_order = coo_from_zarr(Path("data/hydrofabric_v2.2_conus_adjacency.zarr"))
# ts_order: list[str]
Reading Gauge Subsets¶
Gauge subsets are stored in a zarr group with one subgroup per gauge:
import zarr
# Open the gauge zarr store
root = zarr.open_group("data/merit_gages_conus_adjacency.zarr", mode="r")
# Each gauge is a subgroup keyed by station ID
gauge_group = root["01570500"]
# Access arrays
row = gauge_group["indices_0"][:]
col = gauge_group["indices_1"][:]
data = gauge_group["values"][:]
order = gauge_group["order"][:]
# Access metadata
shape = tuple(gauge_group.attrs["shape"])
gage_catchment = gauge_group.attrs["gage_catchment"]
gage_idx = gauge_group.attrs["gage_idx"]
Writing Adjacency Matrices¶
CONUS Full Network¶
from pathlib import Path
from scipy import sparse
from ddr_engine import coo_to_zarr
# Create a COO matrix (example)
row = [1, 2, 3, 4, 4]
col = [0, 1, 2, 2, 3]
data = [1, 1, 1, 1, 1]
coo = sparse.coo_matrix((data, (row, col)), shape=(5, 5), dtype="uint8")
# Topological order as COMIDs
ts_order = [12345, 12346, 12347, 12348, 12349]
# Write to zarr - pass geodataset name
coo_to_zarr(coo, ts_order, Path("output/merit_conus_adjacency.zarr"), "merit")
Gauge Subsets¶
import zarr
from ddr_engine import coo_to_zarr_group
# Create/open the gauge zarr store
store = zarr.storage.LocalStore(root="output/merit_gages_adjacency.zarr")
root = zarr.create_group(store=store)
# Create a subgroup for each gauge
gauge_group = root.create_group("01570500")
# Write the subset COO matrix - pass geodataset name
coo_to_zarr_group(
coo=subset_coo,
ts_order=[12345, 12346], # COMIDs in subset
origin=12346, # Gauge catchment COMID
gauge_root=gauge_group,
mapping={12345: 0, 12346: 1}, # COMID → CONUS index
geodataset="merit",
)
File Structure¶
CONUS Network¶
merit_conus_adjacency.zarr/
├── zarr.json # Group metadata
├── indices_0/ # Row indices
│ ├── zarr.json
│ └── c/0
├── indices_1/ # Column indices
│ ├── zarr.json
│ └── c/0
├── values/ # Matrix values
│ ├── zarr.json
│ └── c/0
└── order/ # Topological order
├── zarr.json
└── c/0
Per-Gauge Subsets¶
merit_gages_conus_adjacency.zarr/
├── zarr.json # Root group metadata
├── 01570500/ # Gauge station ID
│ ├── zarr.json # Subgroup metadata with geodataset, gage_catchment, gage_idx
│ ├── indices_0/
│ ├── indices_1/
│ ├── values/
│ └── order/
├── 01563500/
│ └── ...
└── ...
Creating Adjacency Matrices¶
The engine provides scripts to build adjacency matrices from raw hydrofabric data:
MERIT Hydro¶
uv run python -m ddr_engine.merit /path/to/riv_pfaf_X_MERIT_Hydro.shp \
--path data/ \
--gages references/gage_info/dhbv2_gages.csv
Lynker Hydrofabric v2.2¶
uv run python -m ddr_engine.lynker_hydrofabric /path/to/conus_nextgen.gpkg \
--path data/ \
--gages references/gage_info/dhbv2_gages.csv
Both commands create:
- *_conus_adjacency.zarr: Full CONUS river network
- *_gages_conus_adjacency.zarr: Per-gauge upstream subnetworks
API Reference¶
Primary Functions (Recommended)¶
::: ddr_engine.core.coo_to_zarr ::: ddr_engine.core.coo_from_zarr ::: ddr_engine.core.coo_to_zarr_group
Converter Registry¶
::: ddr_engine.core.get_converter ::: ddr_engine.core.register_converter ::: ddr_engine.core.list_geodatasets
Generic Functions (Low-Level)¶
::: ddr_engine.core.coo_to_zarr_generic ::: ddr_engine.core.coo_from_zarr_generic ::: ddr_engine.core.coo_to_zarr_group_generic