Configuration Reference¶
All configuration fields for DDR, generated from the Pydantic models in
src/ddr/validation/configs.py.
Config (top-level)¶
The base level configuration for the dMC (differentiable Muskingum-Cunge) model
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string |
— | yes | Unique identifier name for this model run used in output file naming |
data_sources |
DataSources |
— | yes | Configuration of all data source paths required by the model |
experiment |
ExperimentConfig |
— | Experiment settings controlling training behavior and data selection | |
geodataset |
GeoDataset |
— | yes | The geospatial dataset used in predictions and routing |
mode |
Mode |
— | yes | Operating mode: training, testing, or routing |
params |
Params |
— | yes | Physical and numerical parameters for the routing model |
kan |
Kan |
— | yes | Architecture and configuration settings for the Kolmogorov-Arnold Network |
np_seed |
integer |
1 |
Random seed for NumPy operations to ensure reproducibility | |
seed |
integer |
0 |
Random seed for PyTorch operations to ensure reproducibility | |
device |
integer | string |
0 |
Compute device specification (GPU index number, 'cpu', or 'cuda', or 'mps') | |
s3_region |
string |
us-east-2 |
AWS S3 region for accessing cloud-stored datasets |
DataSources¶
Represents the data path sources for the model
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
attributes |
string |
s3://mhpi-spatial/hydrofabric_v2.2_at... |
Path to the icechunk store containing catchment attribute data | |
geospatial_fabric_gpkg |
string |
— | yes | Path to the geospatial fabric geopackage containing network topology |
conus_adjacency |
string |
— | yes | Path to the CONUS adjacency matrix created by engine/adjacency.py |
statistics |
string |
data |
Path to the folder where normalization statistics files are saved | |
streamflow |
string |
s3://mhpi-spatial/hydrofabric_v2.2_dh... |
Path to the icechunk store containing modeled streamflow data | |
observations |
string |
s3://mhpi-spatial/usgs_streamflow_obs... |
Path to the USGS streamflow observations for model validation | |
gages |
string | None |
— | Path to CSV file containing gauge metadata, or None to use all segments | |
gages_adjacency |
string | None |
— | Path to the gages adjacency matrix (required if gages is provided) | |
target_catchments |
array | None |
— | Optional list of specific catchment IDs to route to (overrides gages) |
Params¶
Parameters configuration
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
attribute_minimums |
dict |
— | Minimum values for physical routing components to ensure numerical stability | |
parameter_ranges |
dict |
— | The parameter space bounds [min, max] to project learned physical values to | |
log_space_parameters |
list[string] |
— | Parameters to denormalize in log-space for right-skewed distributions | |
defaults |
dict |
— | Default parameter values for physical processes when not learned | |
tau |
integer |
3 |
Routing time step adjustment parameter to handle double routing and timezone differences | |
save_path |
string |
. |
Directory path where model outputs and checkpoints will be saved |
ExperimentConfig¶
Experiment configuration for training and testing
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
batch_size |
integer |
1 |
Number of gauge catchments processed simultaneously in each batch | |
start_time |
string |
1981/10/01 |
Start date for time period selection in YYYY/MM/DD format | |
end_time |
string |
1995/09/30 |
End date for time period selection in YYYY/MM/DD format | |
checkpoint |
string | None |
— | Path to checkpoint file (.pt) for resuming model from previous state | |
epochs |
integer |
1 |
Number of complete passes through the training dataset | |
learning_rate |
dict |
— | Learning rate schedule mapping epoch numbers to learning rate values | |
rho |
integer | None |
— | Number of consecutive days selected in each training batch | |
shuffle |
boolean |
True |
Whether to randomize the order of samples in the dataloader | |
warmup |
integer |
3 |
Number of days excluded from loss calculation as routing starts from dry conditions | |
max_area_diff_sqkm |
number | None |
50 |
Maximum absolute drainage area difference (km²) between USGS gage and COMID. Gages exceeding this threshold are excluded from training/evaluation. None disables filtering. For MERIT geodataset, the DA_VALID column in gage CSVs is preferred. |
Kan¶
KAN (Kolmogorov-Arnold Network) configuration
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
hidden_size |
integer |
11 |
Number of neurons in each hidden layer of the KAN. This should be 2n+1 where n is the number of input attributes | |
input_var_names |
list[string] |
— | yes | Names of catchment attributes used as network inputs |
num_hidden_layers |
integer |
1 |
Number of hidden layers in the KAN architecture | |
learnable_parameters |
list[string] |
— | Names of physical parameters the network will learn to predict | |
grid |
integer |
3 |
Grid size for KAN spline basis functions | |
k |
integer |
3 |
Order of B-spline basis functions in KAN layers |