Benchmarks¶
The ddr-benchmarks package provides tools for comparing DDR against other routing models on identical input data. This enables rigorous, apples-to-apples performance evaluation.
Note
Benchmarking is currently only supported on the MERIT geodataset.
Overview¶
Benchmarking routing models requires:
- Identical input data - Same lateral inflows (Q'), network topology, and time period
- Consistent evaluation - Same evaluation criteria applied to the same observations
- Fair comparison - Account for differences in model formulations and parameters
The benchmarks package addresses all three by reusing DDR's existing data infrastructure while providing adapters for other routing models.
Supported Models¶
| Model | Type | Status | Description |
|---|---|---|---|
| DDR | Differentiable Muskingum-Cunge | Baseline | Physics-based with learned parameters |
| DiffRoute | Differentiable LTI Routing | Supported | Linear Time-Invariant routing with multiple IRF options |
| Summed Q' | Lateral flow summation | Supported | Optional non-routed baseline comparison |
| RAPID | Muskingum | Planned | Traditional non-differentiable routing |
Architecture¶
The benchmark runs in two phases:
- Phase 1 — DDR: Runs the full time-batched DataLoader loop (same as
scripts/test.py), accumulating predictions across all gages. - Phase 2 — DiffRoute: Iterates over each gage independently, building a connected NetworkX graph from its zarr subgroup. This avoids the disconnected-graph problem that arises from the full CONUS adjacency matrix.
An optional Summed Q' baseline can be included. This loads pre-computed lateral flow sums (from scripts/summed_q_prime.py) and compares them alongside DDR and DiffRoute.
Installation¶
The benchmarks package is installed as an optional dependency:
# Install with benchmarks support
pip install ddr[benchmarks]
# Or install separately
pip install ddr-benchmarks
Note: DiffRoute requires CUDA. The benchmarks will skip DiffRoute comparisons on CPU-only systems.
Quick Start¶
# Copy the example config and customize paths
cp benchmarks/config/example_benchmark.yaml benchmarks/config/benchmark.yaml
# Run benchmark
cd benchmarks
uv run python scripts/benchmark.py
# Override configuration options
uv run python scripts/benchmark.py \
experiment.checkpoint=/path/to/model.pt \
diffroute.k=0.1 \
diffroute.x=0.25
# Include summed Q' baseline
uv run python scripts/benchmark.py \
summed_q_prime=/path/to/summed_q_prime.zarr
Output¶
The benchmark produces publication-quality plots and console diagnostics:
Plots (saved to output/<run>/plots/)¶
| File | Description |
|---|---|
nse_cdf_comparison.png |
CDF of NSE across all gauges |
kge_cdf_comparison.png |
CDF of KGE across all gauges |
metric_boxplot_comparison.png |
6-panel boxplot (Bias, RMSE, FHV, FLV, NSE, KGE) |
gauge_map_ddr_NSE.png |
Map of gauges colored by DDR NSE |
gauge_map_diffroute_NSE.png |
Map of gauges colored by DiffRoute NSE |
gauge_map_sqp_NSE.png |
Map of gauges colored by summed Q' NSE (if enabled) |
hydrographs/*.png |
Per-gage time series with all models overlaid |
Results (saved to output/<run>/benchmark_results.zarr)¶
import xarray as xr
ds = xr.open_zarr("output/<run>/benchmark_results.zarr")
# <xarray.Dataset>
# Dimensions: (gage_ids: N, time: T)
# Data variables:
# ddr_predictions (gage_ids, time) float64
# diffroute_predictions (gage_ids, time) float64
# observations (gage_ids, time) float64
Package Structure¶
benchmarks/
├── scripts/
│ └── benchmark.py # Entry point (Hydra CLI)
├── src/ddr_benchmarks/
│ ├── __init__.py # Package exports
│ ├── benchmark.py # Benchmark runner and plotting
│ ├── diffroute_adapter.py # COO → NetworkX conversion for DiffRoute
│ └── validation/
│ ├── __init__.py
│ ├── benchmark.py # BenchmarkConfig (DDR + model configs)
│ └── diffroute.py # DiffRouteConfig
├── config/
│ ├── benchmark.yaml # Active Hydra configuration
│ ├── example_benchmark.yaml # Example config template (fully commented)
│ └── hydra/
│ └── settings.yaml
└── pyproject.toml
Key Components¶
Benchmark Runner (benchmark.py)¶
The main benchmark script follows the same pattern as scripts/test.py:
- Load dataset using DDR's
geodataset.get_dataset_class() - Initialize DDR models (KAN, DMC, StreamflowReader)
- Phase 1: Run DDR on time-batched DataLoader, accumulate predictions
- Phase 2: Run DiffRoute per-gage using zarr subgroup graphs
- Optionally load summed Q' predictions for baseline comparison
- Evaluate predictions using DDR's
Metricsclass - Generate comparison plots (CDF, boxplots, gauge maps, hydrographs)
- Save results to zarr
DiffRoute Adapter (diffroute_adapter.py)¶
Converts DDR's sparse COO adjacency matrices to DiffRoute-compatible NetworkX graphs:
zarr_group_to_networkx()- Load zarr subgroup → NetworkX DiGraphcreate_param_df()- Create parameter DataFrame for DiffRoutebuild_diffroute_inputs()- End-to-end conversion utility
Next Steps¶
- DiffRoute Comparison - Detailed guide to comparing DDR vs DiffRoute
- Configuration Reference - Full configuration options