NatLabRockies
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎demos/TODO.md‎
Lines changed: 161 additions & 0 deletions b/‎demos/TODO.md‎
Lines changed: 161 additions & 0 deletions
diff --git a/‎demos/cmp_hdf5_files.py‎
Lines changed: 12 additions & 9 deletions b/‎demos/cmp_hdf5_files.py‎
Lines changed: 12 additions & 9 deletions
diff --git a/‎demos/configs/calibrated_configs/custom/06197001/demos_out_labor_force.yaml‎
Lines changed: 2 additions & 2 deletions b/‎demos/configs/calibrated_configs/custom/06197001/demos_out_labor_force.yaml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎demos/datasources.py‎
Lines changed: 16 additions & 2 deletions b/‎demos/datasources.py‎
Lines changed: 16 additions & 2 deletions
@@ -2,4 +2,5 @@
 .idea
 demos/outputs/simulation/**/*
 demos/outputs/figures
-.DS_Store
+.DS_Store
+.vscode
@@ -0,0 +1,161 @@
+# TODOs
+- Why were there so many problems with file ownership?
+- What is in the runs folder?
+    - Looks like there is some kind of intermediate output.
+- What steps from the bottom of `models.py` do we want to keep?
+- Refactor the model calibration steps
+- Why the laborforce models do not return a value for every person?
+- All computed columns should return a series with a valid index for the corresponding table
+- I noticed the number of households with multiple partners increases
+
+- `models.py`
+    - `work_location` seems obsolete
+
+- `laborforce.py`
+    - Fix the filter in estimated model for out workforce (worker==0 should be 1)
+    - students are currently selected for work
+
+- `education.py`
+    - Is there no way for students to enter school?
+    - School progression is faulty because of the order of the calculation
+    - Double check the logic for transitioning from 14,15,16
+
+- `household_reorg.py`
+    - <del>Check if the three models interfere with themselves (they are applied according to filters to the same dataset)</del>
+        - Models do not seem to intersect
+    - Household 382474 has 4 people labeled as relate == 1 and some of them have MAR == 5
+        - This means the outputs will not be exactly the same after the refactoring
+    - `hh_income` is incorrectly computed: There is a hard-coded 30_000 and the rest are 60, 100, etc (not thousands)
+    - `income_bin`s are inconsistent (income bin 1 is less than 250_000 instead of 25_000)
+    - I believe the only reason `fix_erroneous_households` exists is in case people are flagged by two models at once
+
+- `marriage.py`
+    - There is a filter that when the number of people getting married is too low, the module does nothing
+    - Filter for <= 10 weddings
+    - There was also this code `if (min_mar == 0) or (min_mar == 0):`
+    - <del>Discuss `CONDITIONS` part of the code</del>
+    - `MAR` is not correctly being updated because final is filtered to those that move
+    - If both new partners are head of household, one could potentially leave dependents behind.
+        - I think the current code is just making the person that earns the most head of household
+        - In fact at the moment there are children labeled as head of household (9 year olds earning 0 for instance)
+    - I ignored the "marriage_table" and "divorce_table", consider re-implementing it after the refactor
+    - What is `member_id` and why is set 1 for leaving person in a divorce but "relate" for those staying?
+
+- `kids_moving`
+    - The filters in the estimated model consider relate values of 7 and 9 children as well
+    - I ignored the "kids_move_table", consider re-implementing it after the refactor
+
+- `mortaility.py`
+    - I ignored the "mortalities" table, consider re-implementing it after the refactor
+    - In `rel_map` table, `6,6 = 1`, which assumes marriage?
+    - In `rez` function, if spouse or partner becomes head, `relate` is not updated.
+    - If `relate==13` dies, the head is also labeled as `MAR=3` (I thought that was necessarily marriage widow).
+
+- `birth.py`
+    - I ignored the "btable" table, consider re-implementing it after the refactor
+    - Review the values of `education_group`, `age_group`, etc.
+    - Default `MAR == 5`?
+    - There is duplication of information between `race_id` and `race`
+    - `race` ignores `asian` values (it only maps `white` and `black`)
+    - Check for the need to add the logic of `hispanic`, `hispanic.1`, ... 
+
+- `transition`
+    - If we need to increase the number of household and have none, skip
+    - What are all the `yaml` configuration files that were updated in the other branch? (They start with elcm)
+    - Are we using estimated models that are specific for each area?
+
+
+### Transition model summary
+- We get as input a dataframe with tuples (year, geoID (county/TAZ), hh_size (1-4+), # of households)
+- We randomly select households in each geoID to be added / removed **randomly**.
+- How do we select which county do new households are created?
+
+## Ideas for cheking sanity of input data
+- Check there is only one head of household
+    - We should fix the households that don't have a head at the start
+
+# Commit history review
+August 8th, 2024
+- ⬜️ `cc558e4`: Edits `README.md`
+- 🟥 `03efa62`: Changes to cohabitation and marriage YAML files as well as changes to `models.py` and `datasources.py`
+- ⬜️ `f382ca0`: Edits to `README.md`
+- ⬜️ `67d3969`: Edits to `README.md`
+- ⬜️ `5e3d53e`: Edits to `README.md`
+
+----
+August 7th, 2024
+- 🟥 `d5aae57`: Changes to marriage YAML (changed a table name) files as well as changes to `models.py`
+- 🟥 `2f29b7a`: Changes to cohabitation and marriage YAML files as well as changes to `models.py`, `datasources.py` and `multinomial_logit.py`
+
+----
+August 5th, 2024
+- 🟥 `f3f3c8d`: Too much to describe. Commit message says "detach all urbansim packages from demos"
+----
+July 31st, 2024
+- 🟥 `747a8aa`: Too much to describe. Seems that most of it is moving files around
+- 🟥 `9474012`: Too much to describe. Commit message says "detach all urbansim packages from demos" (again)
+- 🟥 `5d4f8a5`: "Import urbansim template as templates"
+----
+July 30th, 2024
+- ⬜️ `2b73d21`: Small change to function that creates a directory
+- ⬜️ `dbf8dbc`: Small change to function that creates a directory
+----
+July 29th, 2024
+- 🟧 `394d459`: Changes to `plotting.py` and a bunch of csv files
+----
+July 26th, 2024
+- 🟧 `bac973e`: Changes to `plotting.py`
+- ⬜️ `082ba53`: Changes a bunch of csv files
+- ⬜️ `180ab57`: Remove `process_skims.py` and `settings.yaml` from `demos_urbansim`
+- ⬜️ `11bea38`: Same as `180ab57`
+- 🟥 `bda827d`: "Removing unused models in configs"
+----
+July 25th, 2024
+- 🟥 `f35650d`: Same as `bda827d`
+----
+July 23rd, 2024
+- ⬜️ `994852f`: Changes `.gitignore` and a hardcoded string to and `.h5` file
+- ⬜️ `9960309`: Same as `994852f`
+----
+July 22nd, 2024
+- ⬜️ `a3e8656`: Removing `utils.py`
+----
+July 20th, 2024
+- ⬜️ `9c2dd6e`: Same as `a3e8656`
+- 🟧 `3b9781c`: Changes to `plotting.py`
+----
+July 19th, 2024
+- 🟧 `fb78d01`: Changes to `plotting.py`
+- 🟦 `2de236f`: Changes to how `simulate.py` loads some parameters
+- 🟦 `985dff3`: Same as `2de236f`
+- 🟥 `85a61de`: A lot happening. Seems more like a refactoring
+----
+July 18th, 2024
+- 🟥 `0af091e`: Same as `85a61de`
+----
+July 1st, 2024
+- ⬜️ `261fdeb`: Removing files from `demos_urbansim`
+- 🟧 `231bfc0`: Changes to printing statements in `models.py`
+- ⬜️ `9820535`: Removing files from `demos_urbansim`
+- 🟧 `1c9b154`: Changes to printing statements in `models.py`
+----
+June 12th, 2024
+- ⬜️ `1bf7356`: Added `cmp_hdf5_files.py`
+----
+June 10th, 2024
+- ⬜️ `bcb6a7d`: Same as `1bf7356`. Apparently this functions compare outputs
+----
+June 6th, 2024
+- ⬜️ `fb2e4eb`: Removing unused imports
+- ⬜️ `006b48e`: Removing a call to `os.chown`
+----
+June 5th, 2024
+- ⬜️ `2758a74`: Removing unused imports
+- 🟥 `e1849a4`: Commenting out a bunch of steps from the model
+----
+May 28th, 2024
+- 🟦 `974daee`: Very similar to `e1849a4`. Commit message says "Fix code to run"
+- ⬜️ `49935ef`: pycache handling
+----
+May 27th, 2024
+- 🟩 `58c5fe2`: First commit
@@ -1,16 +1,19 @@
-from pandas import HDFStore, DataFrame
 import time
+import numpy as np
+from pandas import HDFStore, DataFrame
 
 def compare_datasets(dset1, dset2):
     df1 = DataFrame(dset1).sort_index(axis=1)
     df2 = DataFrame(dset2).sort_index(axis=1)
-
-    if not df1.equals(df2):
-        print(f"Datasets are different.")
-        return False
+    comparison = df1.compare(df2)
+    
+    if len(comparison) > 0:
+        print(comparison)
+        return np.allclose(comparison.swaplevel(axis=1)['self'],comparison.swaplevel(axis=1)['other'], equal_nan=True)
 
     return True
 
+
 def compare_hdf5_files(file1_path, file2_path):
     with HDFStore(file1_path, 'r') as store1, HDFStore(file2_path, 'r') as store2:
         keys1 = set(store1.keys())
@@ -22,25 +25,25 @@ def compare_hdf5_files(file1_path, file2_path):
 
         if only_in_store1:
             print(f"Keys only in {file1_path}: {only_in_store1}")
-            return False
+            # return False
         if only_in_store2:
             print(f"Keys only in {file2_path}: {only_in_store2}")
-            return False
+            # return False
 
         for key in common_keys:
             print(f"Comparing dataset {key}......", end=" ")
             dset1 = store1[key]
             dset2 = store2[key]
             if not compare_datasets(dset1, dset2):
                 print("Not Equal.")
-                return False
+                # return False
             else:
                 print("Equal.")
         return True
 
 # Example usage:
 start = time.time()
-file1_path = 'data/model_data_origin_win.h5' #you may change file path here, like 'data/model_data_origin_linux.h5' if you're in Linux
+file1_path = 'data/model_data_2011_yamil_version.h5' #you may change file path here, like 'data/model_data_origin_linux.h5' if you're in Linux
 file2_path = 'data/model_data_2011.h5' #you may change file path here
 if compare_hdf5_files(file1_path, file2_path):
     print("All output datasets are equal.")
 
@@ -1,7 +1,7 @@
 modelmanager_version: 0.2.dev9
 saved_object:
   filters:
-    - (worker == 0) & (age >= 18)
+    - (worker == 1) & (age >= 18)
   fitted_parameters:
     - -1.554846
     - 0.100934
@@ -16,7 +16,7 @@ saved_object:
   name: exit_labor_force
   out_column: null
   out_filters:
-    - (worker == 0) & (age >= 18)
+    - (worker == 1) & (age >= 18)
   out_tables: persons
   out_transform: null
   out_value_false: 0
 
@@ -1,4 +1,5 @@
 import glob
+import time
 import os
 from itertools import product
 
@@ -212,7 +213,7 @@
 age_intervals = [0, 20, 30, 40, 50, 65, 900]
 education_intervals = [0, 18, 22, 200]
 # Define the labels for age and education groups
-age_labels = ['lte20', '21-29', '30-39', '40-49', '50-64', 'gte65']
+age_labels = ['lte19', '20-29', '30-39', '40-49', '50-64', 'gte65']
 education_labels = ['lte17', '18-21', 'gte22']
 # Create age and education groups with labels
 persons['age_group'] = pd.cut(persons['age'], bins=age_intervals, labels=age_labels, include_lowest=True).astype(str)
@@ -523,9 +524,14 @@ def add_missing_combinations(df):
     "school_locations",
     "work_locations"
 ]
+# TODO: This apparently does nothing
 for table in demos_tables:
     orca.add_table(table, pd.DataFrame())
 
+# Tables for rebalancing process
+orca.add_table("rebalanced_households", pd.DataFrame(columns=orca.get_table("households").local_columns))
+orca.add_table("rebalanced_persons", pd.DataFrame(columns=orca.get_table("persons").local_columns))
+
 print("Register persons and households columns.")
 orca.add_injectable("persons_local_cols", orca.get_table("persons").local.columns)
 orca.add_injectable("households_local_cols", orca.get_table("households").local.columns)
@@ -586,4 +592,12 @@ def read_yaml(path):
 configs_folder = os.path.join('configs', calibrated_path if orca.get_injectable('calibrated') else 'estimated_configs')
 print("Models' folder: ", configs_folder)
 
-print("********** End importing datasources **********")
+print("********** End importing datasources **********")
+
+def log_execution_time(start_time, year, module_name):
+    now = time.time()
+    run_table = orca.get_table('run_times')
+    run_table.local = pd.concat([run_table.local,
+                                 pd.DataFrame([[year, module_name, now - start_time]],
+                                              columns=["year", "module", "walltime"])
+                                 ])