|
| 1 | +# Function Documentation Summary |
| 2 | + |
| 3 | +## YAML and Configuration Functions |
| 4 | + |
| 5 | +### `load_yaml_dictionary(yaml_path: str) -> dict` |
| 6 | +**Purpose**: Loads a YAML file and extracts the COVARIATE_DICT section. |
| 7 | + |
| 8 | +**Inputs**: |
| 9 | +- `yaml_path`: String path to the YAML file |
| 10 | + |
| 11 | +**Outputs**: |
| 12 | +- Dictionary containing the COVARIATE_DICT from the YAML file |
| 13 | + |
| 14 | +**Functions it uses**: None (uses built-in `yaml.safe_load`) |
| 15 | + |
| 16 | +**Functions that use it**: `parse_yaml_dictionary()` |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +### `parse_yaml_dictionary(covariate: str) -> dict` |
| 21 | +**Purpose**: Parses covariate-specific configuration from the YAML dictionary and calculates derived values. |
| 22 | + |
| 23 | +**Inputs**: |
| 24 | +- `covariate`: String name of the covariate to extract configuration for |
| 25 | + |
| 26 | +**Outputs**: |
| 27 | +- Dictionary with parsed covariate configuration including: |
| 28 | + - `covariate_name`: Name of the covariate |
| 29 | + - `covariate_resolution`: Calculated resolution (numerator/denominator) |
| 30 | + - `years`: List of years from start to end |
| 31 | + - `synoptic`: Synoptic flag |
| 32 | + - `cc_sensitive`: Climate change sensitivity flag |
| 33 | + - `summary_statistic`: Summary statistic method |
| 34 | + - `path`: File path |
| 35 | + |
| 36 | +**Functions it uses**: `load_yaml_dictionary()` |
| 37 | + |
| 38 | +**Functions that use it**: Not directly called by other functions in this module |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Data Merging and Reading Functions |
| 43 | + |
| 44 | +### `merge_dataframes(model_df, dfs)` |
| 45 | +**Purpose**: Merges multiple DataFrames with a base model DataFrame on location_id and year_id. |
| 46 | + |
| 47 | +**Inputs**: |
| 48 | +- `model_df`: Base pandas DataFrame |
| 49 | +- `dfs`: Dictionary of DataFrames to merge |
| 50 | + |
| 51 | +**Outputs**: |
| 52 | +- Merged pandas DataFrame with suffixes added for duplicate columns |
| 53 | + |
| 54 | +**Functions it uses**: None (uses pandas merge) |
| 55 | + |
| 56 | +**Functions that use it**: Not directly called by other functions in this module |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +### `read_income_paths(income_paths, rcp_scenario, VARIABLE_DATA_PATH)` |
| 61 | +**Purpose**: Reads multiple income data files, filters by RCP scenario, and processes them. |
| 62 | + |
| 63 | +**Inputs**: |
| 64 | +- `income_paths`: Dictionary of file paths |
| 65 | +- `rcp_scenario`: RCP scenario to filter by |
| 66 | +- `VARIABLE_DATA_PATH`: Base path for variable data |
| 67 | + |
| 68 | +**Outputs**: |
| 69 | +- Dictionary of filtered pandas DataFrames (scenario column dropped) |
| 70 | + |
| 71 | +**Functions it uses**: `read_parquet_with_integer_ids()` |
| 72 | + |
| 73 | +**Functions that use it**: Not directly called by other functions in this module |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +### `read_urban_paths(urban_paths, VARIABLE_DATA_PATH)` |
| 78 | +**Purpose**: Reads multiple urban data files and standardizes column names. |
| 79 | + |
| 80 | +**Inputs**: |
| 81 | +- `urban_paths`: Dictionary of file paths |
| 82 | +- `VARIABLE_DATA_PATH`: Base path for variable data |
| 83 | + |
| 84 | +**Outputs**: |
| 85 | +- Dictionary of processed pandas DataFrames with standardized column names |
| 86 | + |
| 87 | +**Functions it uses**: None (uses pandas read_parquet) |
| 88 | + |
| 89 | +**Functions that use it**: Not directly called by other functions in this module |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +## Data Type and I/O Utility Functions |
| 94 | + |
| 95 | +### `ensure_id_columns_are_integers(df)` |
| 96 | +**Purpose**: Converts columns ending with '_id' to integer type. |
| 97 | + |
| 98 | +**Inputs**: |
| 99 | +- `df`: pandas DataFrame |
| 100 | + |
| 101 | +**Outputs**: |
| 102 | +- DataFrame with ID columns converted to integers |
| 103 | + |
| 104 | +**Functions it uses**: None (uses pandas type operations) |
| 105 | + |
| 106 | +**Functions that use it**: `read_parquet_with_integer_ids()` |
| 107 | + |
| 108 | +--- |
| 109 | + |
| 110 | +### `read_parquet_with_integer_ids(path, **kwargs)` |
| 111 | +**Purpose**: Reads a parquet file and ensures ID columns are integers. |
| 112 | + |
| 113 | +**Inputs**: |
| 114 | +- `path`: File path to parquet file |
| 115 | +- `**kwargs`: Additional arguments for pd.read_parquet |
| 116 | + |
| 117 | +**Outputs**: |
| 118 | +- pandas DataFrame with integer ID columns |
| 119 | + |
| 120 | +**Functions it uses**: `ensure_id_columns_are_integers()` |
| 121 | + |
| 122 | +**Functions that use it**: `read_income_paths()` |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +### `write_parquet(df, filepath, max_retries=3, compression='snappy', index=False, **kwargs)` |
| 127 | +**Purpose**: Writes parquet files with validation and retry logic for robustness. |
| 128 | + |
| 129 | +**Inputs**: |
| 130 | +- `df`: pandas DataFrame to write |
| 131 | +- `filepath`: Destination file path |
| 132 | +- `max_retries`: Number of retry attempts (default: 3) |
| 133 | +- `compression`: Compression method (default: 'snappy') |
| 134 | +- `index`: Whether to include index (default: False) |
| 135 | +- `**kwargs`: Additional arguments for to_parquet |
| 136 | + |
| 137 | +**Outputs**: |
| 138 | +- Boolean indicating success/failure |
| 139 | + |
| 140 | +**Functions it uses**: None (uses pandas and os operations) |
| 141 | + |
| 142 | +**Functions that use it**: |
| 143 | +- `rake_aa_count_lsae_to_gbd()` |
| 144 | +- `make_aa_rate_variable()` |
| 145 | +- `aggregate_aa_count_lsae_to_gbd()` |
| 146 | +- `make_full_aa_rate_df_from_aa_count_df()` |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Raking Functions |
| 151 | + |
| 152 | +### `prep_df(df, hierarchy_df)` |
| 153 | +**Purpose**: Prepares DataFrame by adding level column and removing parent_id if present. |
| 154 | + |
| 155 | +**Inputs**: |
| 156 | +- `df`: pandas DataFrame to prepare |
| 157 | +- `hierarchy_df`: Hierarchy DataFrame containing location_id and level mappings |
| 158 | + |
| 159 | +**Outputs**: |
| 160 | +- Prepared DataFrame with level column added and parent_id removed |
| 161 | + |
| 162 | +**Functions it uses**: None (uses pandas merge and drop) |
| 163 | + |
| 164 | +**Functions that use it**: |
| 165 | +- `rake_aa_count_lsae_to_gbd()` |
| 166 | +- `aggregate_aa_count_lsae_to_gbd()` |
| 167 | +- `aggregate_aa_rate_lsae_to_gbd()` |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +### `rake_level(count_variable, level_df, level_m1_df, hierarchy_df)` |
| 172 | +**Purpose**: Rakes (adjusts) data at one level to match aggregated totals from the next higher level. |
| 173 | + |
| 174 | +**Inputs**: |
| 175 | +- `count_variable`: Name of the count variable to rake |
| 176 | +- `level_df`: DataFrame for current level |
| 177 | +- `level_m1_df`: DataFrame for the level above (level minus 1) |
| 178 | +- `hierarchy_df`: Hierarchy DataFrame |
| 179 | + |
| 180 | +**Outputs**: |
| 181 | +- DataFrame with raked values that sum to the higher level totals |
| 182 | + |
| 183 | +**Functions it uses**: None (uses pandas operations) |
| 184 | + |
| 185 | +**Functions that use it**: `rake_aa_count_lsae_to_gbd()` |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +### `rake_aa_count_lsae_to_gbd(count_variable, hierarchy_df, gbd_aa_count_df, lsae_aa_count_df, full_aa_count_df_path, return_full_df=False)` |
| 190 | +**Purpose**: Rakes LSAE age-aggregated count data to match GBD totals across hierarchy levels. |
| 191 | + |
| 192 | +**Inputs**: |
| 193 | +- `count_variable`: Name of count variable |
| 194 | +- `hierarchy_df`: Hierarchy DataFrame |
| 195 | +- `gbd_aa_count_df`: GBD age-aggregated count data |
| 196 | +- `lsae_aa_count_df`: LSAE age-aggregated count data |
| 197 | +- `full_aa_count_df_path`: Output file path |
| 198 | +- `return_full_df`: Whether to return the DataFrame (default: False) |
| 199 | + |
| 200 | +**Outputs**: |
| 201 | +- Optionally returns full raked DataFrame if return_full_df=True |
| 202 | + |
| 203 | +**Functions it uses**: |
| 204 | +- `prep_df()` |
| 205 | +- `rake_level()` |
| 206 | +- `write_parquet()` |
| 207 | + |
| 208 | +**Functions that use it**: Not directly called by other functions in this module |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +### `make_aa_rate_variable(count_variable, full_aa_count_df, aa_population_df, full_lsae_aa_rate_df_path, return_full_df=False)` |
| 213 | +**Purpose**: Converts age-aggregated count data to rate data using population denominators. |
| 214 | + |
| 215 | +**Inputs**: |
| 216 | +- `count_variable`: Name of count variable |
| 217 | +- `full_aa_count_df`: Full age-aggregated count DataFrame |
| 218 | +- `aa_population_df`: Age-aggregated population DataFrame |
| 219 | +- `full_lsae_aa_rate_df_path`: Output file path |
| 220 | +- `return_full_df`: Whether to return DataFrame (default: False) |
| 221 | + |
| 222 | +**Outputs**: |
| 223 | +- Optionally returns rate DataFrame if return_full_df=True |
| 224 | + |
| 225 | +**Functions it uses**: `write_parquet()` |
| 226 | + |
| 227 | +**Functions that use it**: Not directly called by other functions in this module |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## Aggregation Functions |
| 232 | + |
| 233 | +### `aggregate_level(count_variable, level_df, hierarchy_df)` |
| 234 | +**Purpose**: Aggregates count data from one hierarchy level to the next higher level. |
| 235 | + |
| 236 | +**Inputs**: |
| 237 | +- `count_variable`: Name of count variable to aggregate |
| 238 | +- `level_df`: DataFrame for current level |
| 239 | +- `hierarchy_df`: Hierarchy DataFrame |
| 240 | + |
| 241 | +**Outputs**: |
| 242 | +- DataFrame with aggregated counts at the parent level |
| 243 | + |
| 244 | +**Functions it uses**: None (uses pandas operations) |
| 245 | + |
| 246 | +**Functions that use it**: `aggregate_aa_count_lsae_to_gbd()` |
| 247 | + |
| 248 | +--- |
| 249 | + |
| 250 | +### `aggregate_aa_count_lsae_to_gbd(count_variable, hierarchy_df, lsae_aa_count_df, full_aa_count_df_path, return_full_df=False)` |
| 251 | +**Purpose**: Aggregates LSAE age-aggregated count data up through all hierarchy levels (5 to 0). |
| 252 | + |
| 253 | +**Inputs**: |
| 254 | +- `count_variable`: Name of count variable |
| 255 | +- `hierarchy_df`: Hierarchy DataFrame |
| 256 | +- `lsae_aa_count_df`: LSAE age-aggregated count data |
| 257 | +- `full_aa_count_df_path`: Output file path |
| 258 | +- `return_full_df`: Whether to return DataFrame (default: False) |
| 259 | + |
| 260 | +**Outputs**: |
| 261 | +- Optionally returns full aggregated DataFrame if return_full_df=True |
| 262 | + |
| 263 | +**Functions it uses**: |
| 264 | +- `prep_df()` |
| 265 | +- `aggregate_level()` |
| 266 | +- `write_parquet()` |
| 267 | + |
| 268 | +**Functions that use it**: `aggregate_aa_rate_lsae_to_gbd()` |
| 269 | + |
| 270 | +--- |
| 271 | + |
| 272 | +### `make_full_aa_rate_df_from_aa_count_df(rate_variable, count_variable, full_aa_count_df, aa_population_df, full_aa_rate_df_path=None, return_full_df=False)` |
| 273 | +**Purpose**: Converts aggregated count data to rate data using population. |
| 274 | + |
| 275 | +**Inputs**: |
| 276 | +- `rate_variable`: Name of rate variable to create |
| 277 | +- `count_variable`: Name of count variable |
| 278 | +- `full_aa_count_df`: Full age-aggregated count DataFrame |
| 279 | +- `aa_population_df`: Age-aggregated population DataFrame |
| 280 | +- `full_aa_rate_df_path`: Optional output file path |
| 281 | +- `return_full_df`: Whether to return DataFrame (default: False) |
| 282 | + |
| 283 | +**Outputs**: |
| 284 | +- Optionally returns rate DataFrame if return_full_df=True |
| 285 | + |
| 286 | +**Functions it uses**: `write_parquet()` |
| 287 | + |
| 288 | +**Functions that use it**: `aggregate_aa_rate_lsae_to_gbd()` |
| 289 | + |
| 290 | +--- |
| 291 | + |
| 292 | +### `aggregate_aa_rate_lsae_to_gbd(rate_variable, hierarchy_df, lsae_aa_rate_df, aa_population_df, full_aa_rate_df_path=None, return_full_df=False)` |
| 293 | +**Purpose**: Aggregates LSAE age-aggregated rate data by first converting to counts, aggregating, then converting back to rates. |
| 294 | + |
| 295 | +**Inputs**: |
| 296 | +- `rate_variable`: Name of rate variable |
| 297 | +- `hierarchy_df`: Hierarchy DataFrame |
| 298 | +- `lsae_aa_rate_df`: LSAE age-aggregated rate data |
| 299 | +- `aa_population_df`: Age-aggregated population DataFrame |
| 300 | +- `full_aa_rate_df_path`: Optional output file path |
| 301 | +- `return_full_df`: Whether to return DataFrame (default: False) |
| 302 | + |
| 303 | +**Outputs**: |
| 304 | +- Optionally returns full aggregated rate DataFrame if return_full_df=True |
| 305 | + |
| 306 | +**Functions it uses**: |
| 307 | +- `prep_df()` |
| 308 | +- `aggregate_aa_count_lsae_to_gbd()` |
| 309 | +- `make_full_aa_rate_df_from_aa_count_df()` |
| 310 | + |
| 311 | +**Functions that use it**: Not directly called by other functions in this module |
| 312 | + |
| 313 | +--- |
| 314 | + |
| 315 | +## Function Dependency Tree |
| 316 | + |
| 317 | +``` |
| 318 | +Configuration Functions: |
| 319 | +├── load_yaml_dictionary() |
| 320 | +└── parse_yaml_dictionary() → uses load_yaml_dictionary() |
| 321 | +
|
| 322 | +Data I/O Functions: |
| 323 | +├── ensure_id_columns_are_integers() |
| 324 | +├── read_parquet_with_integer_ids() → uses ensure_id_columns_are_integers() |
| 325 | +├── read_income_paths() → uses read_parquet_with_integer_ids() |
| 326 | +├── read_urban_paths() |
| 327 | +├── merge_dataframes() |
| 328 | +└── write_parquet() → used by multiple functions |
| 329 | +
|
| 330 | +Raking Functions: |
| 331 | +├── prep_df() → used by multiple raking/aggregation functions |
| 332 | +├── rake_level() → used by rake_aa_count_lsae_to_gbd() |
| 333 | +├── rake_aa_count_lsae_to_gbd() → uses prep_df(), rake_level(), write_parquet() |
| 334 | +└── make_aa_rate_variable() → uses write_parquet() |
| 335 | +
|
| 336 | +Aggregation Functions: |
| 337 | +├── aggregate_level() → used by aggregate_aa_count_lsae_to_gbd() |
| 338 | +├── aggregate_aa_count_lsae_to_gbd() → uses prep_df(), aggregate_level(), write_parquet() |
| 339 | +├── make_full_aa_rate_df_from_aa_count_df() → uses write_parquet() |
| 340 | +└── aggregate_aa_rate_lsae_to_gbd() → uses prep_df(), aggregate_aa_count_lsae_to_gbd(), make_full_aa_rate_df_from_aa_count_df() |
| 341 | +``` |
0 commit comments