Skip to content

DFS does not build some features when there are multiple paths to an entity #652

Open
@CJStadler

Description

@CJStadler

DFS currently builds features for each entity only once. This is problematic because depending on which path is taken to an entity different features may be built. For example, the following test currently fails:

def test_makes_direct_of_agg_on_all_paths(diamond_es):
    dfs_obj = DeepFeatureSynthesis(target_entity_id='transactions',
                                   entityset=diamond_es,
                                   max_depth=3,
                                   agg_primitives=[Count],
                                   trans_primitives=[])

    features = dfs_obj.build_features()
    # These two pass
    assert feature_with_name(features, 'stores.regions.COUNT(stores)')
    assert feature_with_name(features, 'stores.regions.COUNT(customers)')
    # These two fail
    assert feature_with_name(features, 'customers.regions.COUNT(stores)')
    assert feature_with_name(features, 'customers.regions.COUNT(customers)')

This is because the customers features are built before the aggregations on regions. The execution looks something like this

_run_dfs(transactions)
    _run_dfs(stores)
        build_agg_features
        _run_dfs(regions)
            _run_dfs(customers)
                build_agg_features
                build_direct_features
            build_agg_features
            build_direct_features
    build_direct_features

When there are not multiple paths this is fine because you don't want features like regions.MEAN(customers.regions.COUNT(stores)). But when there are multiple paths the features on customers may be used by entities other than regions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs designIssues requiring design documentation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions