Open
Description
DFS currently builds features for each entity only once. This is problematic because depending on which path is taken to an entity different features may be built. For example, the following test currently fails:
def test_makes_direct_of_agg_on_all_paths(diamond_es):
dfs_obj = DeepFeatureSynthesis(target_entity_id='transactions',
entityset=diamond_es,
max_depth=3,
agg_primitives=[Count],
trans_primitives=[])
features = dfs_obj.build_features()
# These two pass
assert feature_with_name(features, 'stores.regions.COUNT(stores)')
assert feature_with_name(features, 'stores.regions.COUNT(customers)')
# These two fail
assert feature_with_name(features, 'customers.regions.COUNT(stores)')
assert feature_with_name(features, 'customers.regions.COUNT(customers)')
This is because the customers
features are built before the aggregations on regions
. The execution looks something like this
_run_dfs(transactions)
_run_dfs(stores)
build_agg_features
_run_dfs(regions)
_run_dfs(customers)
build_agg_features
build_direct_features
build_agg_features
build_direct_features
build_direct_features
When there are not multiple paths this is fine because you don't want features like regions.MEAN(customers.regions.COUNT(stores))
. But when there are multiple paths the features on customers
may be used by entities other than regions
.