Skip to content

Commit 07e3a2d

Browse files
authored
Merge pull request #69 from NREL/yep/modularization
Modularization and rewrite of the main functionality
2 parents 2b3f047 + 3efd59b commit 07e3a2d

26 files changed

Lines changed: 3026 additions & 4328 deletions

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
.idea
33
demos/outputs/simulation/**/*
44
demos/outputs/figures
5-
.DS_Store
5+
.DS_Store
6+
.vscode

demos/TODO.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# TODOs
2+
- Why were there so many problems with file ownership?
3+
- What is in the runs folder?
4+
- Looks like there is some kind of intermediate output.
5+
- What steps from the bottom of `models.py` do we want to keep?
6+
- Refactor the model calibration steps
7+
- Why the laborforce models do not return a value for every person?
8+
- All computed columns should return a series with a valid index for the corresponding table
9+
- I noticed the number of households with multiple partners increases
10+
11+
- `models.py`
12+
- `work_location` seems obsolete
13+
14+
- `laborforce.py`
15+
- Fix the filter in estimated model for out workforce (worker==0 should be 1)
16+
- students are currently selected for work
17+
18+
- `education.py`
19+
- Is there no way for students to enter school?
20+
- School progression is faulty because of the order of the calculation
21+
- Double check the logic for transitioning from 14,15,16
22+
23+
- `household_reorg.py`
24+
- <del>Check if the three models interfere with themselves (they are applied according to filters to the same dataset)</del>
25+
- Models do not seem to intersect
26+
- Household 382474 has 4 people labeled as relate == 1 and some of them have MAR == 5
27+
- This means the outputs will not be exactly the same after the refactoring
28+
- `hh_income` is incorrectly computed: There is a hard-coded 30_000 and the rest are 60, 100, etc (not thousands)
29+
- `income_bin`s are inconsistent (income bin 1 is less than 250_000 instead of 25_000)
30+
- I believe the only reason `fix_erroneous_households` exists is in case people are flagged by two models at once
31+
32+
- `marriage.py`
33+
- There is a filter that when the number of people getting married is too low, the module does nothing
34+
- Filter for <= 10 weddings
35+
- There was also this code `if (min_mar == 0) or (min_mar == 0):`
36+
- <del>Discuss `CONDITIONS` part of the code</del>
37+
- `MAR` is not correctly being updated because final is filtered to those that move
38+
- If both new partners are head of household, one could potentially leave dependents behind.
39+
- I think the current code is just making the person that earns the most head of household
40+
- In fact at the moment there are children labeled as head of household (9 year olds earning 0 for instance)
41+
- I ignored the "marriage_table" and "divorce_table", consider re-implementing it after the refactor
42+
- What is `member_id` and why is set 1 for leaving person in a divorce but "relate" for those staying?
43+
44+
- `kids_moving`
45+
- The filters in the estimated model consider relate values of 7 and 9 children as well
46+
- I ignored the "kids_move_table", consider re-implementing it after the refactor
47+
48+
- `mortaility.py`
49+
- I ignored the "mortalities" table, consider re-implementing it after the refactor
50+
- In `rel_map` table, `6,6 = 1`, which assumes marriage?
51+
- In `rez` function, if spouse or partner becomes head, `relate` is not updated.
52+
- If `relate==13` dies, the head is also labeled as `MAR=3` (I thought that was necessarily marriage widow).
53+
54+
- `birth.py`
55+
- I ignored the "btable" table, consider re-implementing it after the refactor
56+
- Review the values of `education_group`, `age_group`, etc.
57+
- Default `MAR == 5`?
58+
- There is duplication of information between `race_id` and `race`
59+
- `race` ignores `asian` values (it only maps `white` and `black`)
60+
- Check for the need to add the logic of `hispanic`, `hispanic.1`, ...
61+
62+
- `transition`
63+
- If we need to increase the number of household and have none, skip
64+
- What are all the `yaml` configuration files that were updated in the other branch? (They start with elcm)
65+
- Are we using estimated models that are specific for each area?
66+
67+
68+
### Transition model summary
69+
- We get as input a dataframe with tuples (year, geoID (county/TAZ), hh_size (1-4+), # of households)
70+
- We randomly select households in each geoID to be added / removed **randomly**.
71+
- How do we select which county do new households are created?
72+
73+
## Ideas for cheking sanity of input data
74+
- Check there is only one head of household
75+
- We should fix the households that don't have a head at the start
76+
77+
# Commit history review
78+
August 8th, 2024
79+
- ⬜️ `cc558e4`: Edits `README.md`
80+
- 🟥 `03efa62`: Changes to cohabitation and marriage YAML files as well as changes to `models.py` and `datasources.py`
81+
- ⬜️ `f382ca0`: Edits to `README.md`
82+
- ⬜️ `67d3969`: Edits to `README.md`
83+
- ⬜️ `5e3d53e`: Edits to `README.md`
84+
85+
----
86+
August 7th, 2024
87+
- 🟥 `d5aae57`: Changes to marriage YAML (changed a table name) files as well as changes to `models.py`
88+
- 🟥 `2f29b7a`: Changes to cohabitation and marriage YAML files as well as changes to `models.py`, `datasources.py` and `multinomial_logit.py`
89+
90+
----
91+
August 5th, 2024
92+
- 🟥 `f3f3c8d`: Too much to describe. Commit message says "detach all urbansim packages from demos"
93+
----
94+
July 31st, 2024
95+
- 🟥 `747a8aa`: Too much to describe. Seems that most of it is moving files around
96+
- 🟥 `9474012`: Too much to describe. Commit message says "detach all urbansim packages from demos" (again)
97+
- 🟥 `5d4f8a5`: "Import urbansim template as templates"
98+
----
99+
July 30th, 2024
100+
- ⬜️ `2b73d21`: Small change to function that creates a directory
101+
- ⬜️ `dbf8dbc`: Small change to function that creates a directory
102+
----
103+
July 29th, 2024
104+
- 🟧 `394d459`: Changes to `plotting.py` and a bunch of csv files
105+
----
106+
July 26th, 2024
107+
- 🟧 `bac973e`: Changes to `plotting.py`
108+
- ⬜️ `082ba53`: Changes a bunch of csv files
109+
- ⬜️ `180ab57`: Remove `process_skims.py` and `settings.yaml` from `demos_urbansim`
110+
- ⬜️ `11bea38`: Same as `180ab57`
111+
- 🟥 `bda827d`: "Removing unused models in configs"
112+
----
113+
July 25th, 2024
114+
- 🟥 `f35650d`: Same as `bda827d`
115+
----
116+
July 23rd, 2024
117+
- ⬜️ `994852f`: Changes `.gitignore` and a hardcoded string to and `.h5` file
118+
- ⬜️ `9960309`: Same as `994852f`
119+
----
120+
July 22nd, 2024
121+
- ⬜️ `a3e8656`: Removing `utils.py`
122+
----
123+
July 20th, 2024
124+
- ⬜️ `9c2dd6e`: Same as `a3e8656`
125+
- 🟧 `3b9781c`: Changes to `plotting.py`
126+
----
127+
July 19th, 2024
128+
- 🟧 `fb78d01`: Changes to `plotting.py`
129+
- 🟦 `2de236f`: Changes to how `simulate.py` loads some parameters
130+
- 🟦 `985dff3`: Same as `2de236f`
131+
- 🟥 `85a61de`: A lot happening. Seems more like a refactoring
132+
----
133+
July 18th, 2024
134+
- 🟥 `0af091e`: Same as `85a61de`
135+
----
136+
July 1st, 2024
137+
- ⬜️ `261fdeb`: Removing files from `demos_urbansim`
138+
- 🟧 `231bfc0`: Changes to printing statements in `models.py`
139+
- ⬜️ `9820535`: Removing files from `demos_urbansim`
140+
- 🟧 `1c9b154`: Changes to printing statements in `models.py`
141+
----
142+
June 12th, 2024
143+
- ⬜️ `1bf7356`: Added `cmp_hdf5_files.py`
144+
----
145+
June 10th, 2024
146+
- ⬜️ `bcb6a7d`: Same as `1bf7356`. Apparently this functions compare outputs
147+
----
148+
June 6th, 2024
149+
- ⬜️ `fb2e4eb`: Removing unused imports
150+
- ⬜️ `006b48e`: Removing a call to `os.chown`
151+
----
152+
June 5th, 2024
153+
- ⬜️ `2758a74`: Removing unused imports
154+
- 🟥 `e1849a4`: Commenting out a bunch of steps from the model
155+
----
156+
May 28th, 2024
157+
- 🟦 `974daee`: Very similar to `e1849a4`. Commit message says "Fix code to run"
158+
- ⬜️ `49935ef`: pycache handling
159+
----
160+
May 27th, 2024
161+
- 🟩 `58c5fe2`: First commit

demos/cmp_hdf5_files.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
1-
from pandas import HDFStore, DataFrame
21
import time
2+
import numpy as np
3+
from pandas import HDFStore, DataFrame
34

45
def compare_datasets(dset1, dset2):
56
df1 = DataFrame(dset1).sort_index(axis=1)
67
df2 = DataFrame(dset2).sort_index(axis=1)
7-
8-
if not df1.equals(df2):
9-
print(f"Datasets are different.")
10-
return False
8+
comparison = df1.compare(df2)
9+
10+
if len(comparison) > 0:
11+
print(comparison)
12+
return np.allclose(comparison.swaplevel(axis=1)['self'],comparison.swaplevel(axis=1)['other'], equal_nan=True)
1113

1214
return True
1315

16+
1417
def compare_hdf5_files(file1_path, file2_path):
1518
with HDFStore(file1_path, 'r') as store1, HDFStore(file2_path, 'r') as store2:
1619
keys1 = set(store1.keys())
@@ -22,25 +25,25 @@ def compare_hdf5_files(file1_path, file2_path):
2225

2326
if only_in_store1:
2427
print(f"Keys only in {file1_path}: {only_in_store1}")
25-
return False
28+
# return False
2629
if only_in_store2:
2730
print(f"Keys only in {file2_path}: {only_in_store2}")
28-
return False
31+
# return False
2932

3033
for key in common_keys:
3134
print(f"Comparing dataset {key}......", end=" ")
3235
dset1 = store1[key]
3336
dset2 = store2[key]
3437
if not compare_datasets(dset1, dset2):
3538
print("Not Equal.")
36-
return False
39+
# return False
3740
else:
3841
print("Equal.")
3942
return True
4043

4144
# Example usage:
4245
start = time.time()
43-
file1_path = 'data/model_data_origin_win.h5' #you may change file path here, like 'data/model_data_origin_linux.h5' if you're in Linux
46+
file1_path = 'data/model_data_2011_yamil_version.h5' #you may change file path here, like 'data/model_data_origin_linux.h5' if you're in Linux
4447
file2_path = 'data/model_data_2011.h5' #you may change file path here
4548
if compare_hdf5_files(file1_path, file2_path):
4649
print("All output datasets are equal.")

demos/configs/calibrated_configs/custom/06197001/demos_out_labor_force.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
modelmanager_version: 0.2.dev9
22
saved_object:
33
filters:
4-
- (worker == 0) & (age >= 18)
4+
- (worker == 1) & (age >= 18)
55
fitted_parameters:
66
- -1.554846
77
- 0.100934
@@ -16,7 +16,7 @@ saved_object:
1616
name: exit_labor_force
1717
out_column: null
1818
out_filters:
19-
- (worker == 0) & (age >= 18)
19+
- (worker == 1) & (age >= 18)
2020
out_tables: persons
2121
out_transform: null
2222
out_value_false: 0

demos/datasources.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import glob
2+
import time
23
import os
34
from itertools import product
45

@@ -212,7 +213,7 @@
212213
age_intervals = [0, 20, 30, 40, 50, 65, 900]
213214
education_intervals = [0, 18, 22, 200]
214215
# Define the labels for age and education groups
215-
age_labels = ['lte20', '21-29', '30-39', '40-49', '50-64', 'gte65']
216+
age_labels = ['lte19', '20-29', '30-39', '40-49', '50-64', 'gte65']
216217
education_labels = ['lte17', '18-21', 'gte22']
217218
# Create age and education groups with labels
218219
persons['age_group'] = pd.cut(persons['age'], bins=age_intervals, labels=age_labels, include_lowest=True).astype(str)
@@ -523,9 +524,14 @@ def add_missing_combinations(df):
523524
"school_locations",
524525
"work_locations"
525526
]
527+
# TODO: This apparently does nothing
526528
for table in demos_tables:
527529
orca.add_table(table, pd.DataFrame())
528530

531+
# Tables for rebalancing process
532+
orca.add_table("rebalanced_households", pd.DataFrame(columns=orca.get_table("households").local_columns))
533+
orca.add_table("rebalanced_persons", pd.DataFrame(columns=orca.get_table("persons").local_columns))
534+
529535
print("Register persons and households columns.")
530536
orca.add_injectable("persons_local_cols", orca.get_table("persons").local.columns)
531537
orca.add_injectable("households_local_cols", orca.get_table("households").local.columns)
@@ -586,4 +592,12 @@ def read_yaml(path):
586592
configs_folder = os.path.join('configs', calibrated_path if orca.get_injectable('calibrated') else 'estimated_configs')
587593
print("Models' folder: ", configs_folder)
588594

589-
print("********** End importing datasources **********")
595+
print("********** End importing datasources **********")
596+
597+
def log_execution_time(start_time, year, module_name):
598+
now = time.time()
599+
run_table = orca.get_table('run_times')
600+
run_table.local = pd.concat([run_table.local,
601+
pd.DataFrame([[year, module_name, now - start_time]],
602+
columns=["year", "module", "walltime"])
603+
])

0 commit comments

Comments
 (0)