Skip to content

Configuration Reference

All configuration fields for DDR, generated from the Pydantic models in src/ddr/validation/configs.py.

Config (top-level)

The base level configuration for the dMC (differentiable Muskingum-Cunge) model

Field Type Default Required Description
name string yes Unique identifier name for this model run used in output file naming
data_sources DataSources yes Configuration of all data source paths required by the model
experiment ExperimentConfig Experiment settings controlling training behavior and data selection
geodataset GeoDataset yes The geospatial dataset used in predictions and routing
mode Mode yes Operating mode: training, testing, or routing
params Params yes Physical and numerical parameters for the routing model
kan Kan yes Architecture and configuration settings for the Kolmogorov-Arnold Network
np_seed integer 1 Random seed for NumPy operations to ensure reproducibility
seed integer 0 Random seed for PyTorch operations to ensure reproducibility
device integer | string 0 Compute device specification (GPU index number, 'cpu', or 'cuda', or 'mps')
s3_region string us-east-2 AWS S3 region for accessing cloud-stored datasets

DataSources

Represents the data path sources for the model

Field Type Default Required Description
attributes string s3://mhpi-spatial/hydrofabric_v2.2_at... Path to the icechunk store containing catchment attribute data
geospatial_fabric_gpkg string yes Path to the geospatial fabric geopackage containing network topology
conus_adjacency string yes Path to the CONUS adjacency matrix created by engine/adjacency.py
statistics string data Path to the folder where normalization statistics files are saved
streamflow string s3://mhpi-spatial/hydrofabric_v2.2_dh... Path to the icechunk store containing modeled streamflow data
observations string s3://mhpi-spatial/usgs_streamflow_obs... Path to the USGS streamflow observations for model validation
gages string | None Path to CSV file containing gauge metadata, or None to use all segments
gages_adjacency string | None Path to the gages adjacency matrix (required if gages is provided)
target_catchments array | None Optional list of specific catchment IDs to route to (overrides gages)

Params

Parameters configuration

Field Type Default Required Description
attribute_minimums dict Minimum values for physical routing components to ensure numerical stability
parameter_ranges dict The parameter space bounds [min, max] to project learned physical values to
log_space_parameters list[string] Parameters to denormalize in log-space for right-skewed distributions
defaults dict Default parameter values for physical processes when not learned
tau integer 3 Routing time step adjustment parameter to handle double routing and timezone differences
save_path string . Directory path where model outputs and checkpoints will be saved

ExperimentConfig

Experiment configuration for training and testing

Field Type Default Required Description
batch_size integer 1 Number of gauge catchments processed simultaneously in each batch
start_time string 1981/10/01 Start date for time period selection in YYYY/MM/DD format
end_time string 1995/09/30 End date for time period selection in YYYY/MM/DD format
checkpoint string | None Path to checkpoint file (.pt) for resuming model from previous state
epochs integer 1 Number of complete passes through the training dataset
learning_rate dict Learning rate schedule mapping epoch numbers to learning rate values
rho integer | None Number of consecutive days selected in each training batch
shuffle boolean True Whether to randomize the order of samples in the dataloader
warmup integer 3 Number of days excluded from loss calculation as routing starts from dry conditions
max_area_diff_sqkm number | None 50 Maximum absolute drainage area difference (km²) between USGS gage and COMID. Gages exceeding this threshold are excluded from training/evaluation. None disables filtering. For MERIT geodataset, the DA_VALID column in gage CSVs is preferred.

Kan

KAN (Kolmogorov-Arnold Network) configuration

Field Type Default Required Description
hidden_size integer 11 Number of neurons in each hidden layer of the KAN. This should be 2n+1 where n is the number of input attributes
input_var_names list[string] yes Names of catchment attributes used as network inputs
num_hidden_layers integer 1 Number of hidden layers in the KAN architecture
learnable_parameters list[string] Names of physical parameters the network will learn to predict
grid integer 3 Grid size for KAN spline basis functions
k integer 3 Order of B-spline basis functions in KAN layers