Configuration Reference¶

All configuration fields for DDR, generated from the Pydantic models in src/ddr/validation/configs.py.

Config (top-level)¶

The base level configuration for the dMC (differentiable Muskingum-Cunge) model

Field	Type	Default	Required	Description
`name`	`string`	—	yes	Unique identifier name for this model run used in output file naming
`data_sources`	`DataSources`	—	yes	Configuration of all data source paths required by the model
`experiment`	`ExperimentConfig`	—		Experiment settings controlling training behavior and data selection
`geodataset`	`GeoDataset`	—	yes	The geospatial dataset used in predictions and routing
`mode`	`Mode`	—	yes	Operating mode: training, testing, or routing
`params`	`Params`	—	yes	Physical and numerical parameters for the routing model
`kan`	`Kan`	—	yes	Architecture and configuration settings for the Kolmogorov-Arnold Network
`np_seed`	`integer`	`1`		Random seed for NumPy operations to ensure reproducibility
`seed`	`integer`	`0`		Random seed for PyTorch operations to ensure reproducibility
`device`	`integer` \| `string`	`0`		Compute device specification (GPU index number, 'cpu', or 'cuda', or 'mps')
`s3_region`	`string`	`us-east-2`		AWS S3 region for accessing cloud-stored datasets

Represents the data path sources for the model

Field	Type	Default	Required	Description
`attributes`	`string`	`s3://mhpi-spatial/hydrofabric_v2.2_at...`		Path to the icechunk store containing catchment attribute data
`geospatial_fabric_gpkg`	`string`	—	yes	Path to the geospatial fabric geopackage containing network topology
`conus_adjacency`	`string`	—	yes	Path to the CONUS adjacency matrix created by engine/adjacency.py
`statistics`	`string`	`data`		Path to the folder where normalization statistics files are saved
`streamflow`	`string`	`s3://mhpi-spatial/hydrofabric_v2.2_dh...`		Path to the icechunk store containing modeled streamflow data
`observations`	`string`	`s3://mhpi-spatial/usgs_streamflow_obs...`		Path to the USGS streamflow observations for model validation
`gages`	`string` \| None	—		Path to CSV file containing gauge metadata, or None to use all segments
`gages_adjacency`	`string` \| None	—		Path to the gages adjacency matrix (required if gages is provided)
`target_catchments`	`array` \| None	—		Optional list of specific catchment IDs to route to (overrides gages)

Parameters configuration

Field	Type	Default	Description
`attribute_minimums`	`dict`	—	Minimum values for physical routing components to ensure numerical stability
`parameter_ranges`	`dict`	—	The parameter space bounds [min, max] to project learned physical values to
`log_space_parameters`	list[`string`]	—	Parameters to denormalize in log-space for right-skewed distributions
`defaults`	`dict`	—	Default parameter values for physical processes when not learned
`tau`	`integer`	`3`	Routing time step adjustment parameter to handle double routing and timezone differences
`save_path`	`string`	`.`	Directory path where model outputs and checkpoints will be saved

Experiment configuration for training and testing

Field	Type	Default	Description
`batch_size`	`integer`	`1`	Number of gauge catchments processed simultaneously in each batch
`start_time`	`string`	`1981/10/01`	Start date for time period selection in YYYY/MM/DD format
`end_time`	`string`	`1995/09/30`	End date for time period selection in YYYY/MM/DD format
`checkpoint`	`string` \| None	—	Path to checkpoint file (.pt) for resuming model from previous state
`epochs`	`integer`	`1`	Number of complete passes through the training dataset
`learning_rate`	`dict`	—	Learning rate schedule mapping epoch numbers to learning rate values
`rho`	`integer` \| None	—	Number of consecutive days selected in each training batch
`shuffle`	`boolean`	`True`	Whether to randomize the order of samples in the dataloader
`warmup`	`integer`	`3`	Number of days excluded from loss calculation as routing starts from dry conditions
`max_area_diff_sqkm`	`number` \| None	`50`	Maximum absolute drainage area difference (km²) between USGS gage and COMID. Gages exceeding this threshold are excluded from training/evaluation. None disables filtering. For MERIT geodataset, the DA_VALID column in gage CSVs is preferred.

KAN (Kolmogorov-Arnold Network) configuration

Field	Type	Default	Required	Description
`hidden_size`	`integer`	`11`		Number of neurons in each hidden layer of the KAN. This should be 2n+1 where n is the number of input attributes
`input_var_names`	list[`string`]	—	yes	Names of catchment attributes used as network inputs
`num_hidden_layers`	`integer`	`1`		Number of hidden layers in the KAN architecture
`learnable_parameters`	list[`string`]	—		Names of physical parameters the network will learn to predict
`grid`	`integer`	`3`		Grid size for KAN spline basis functions
`k`	`integer`	`3`		Order of B-spline basis functions in KAN layers