Skip to content

Support entitysets with cycles in their relationship graph #601

Open
@CJStadler

Description

@CJStadler

For example, a self-loop:

Entities:
  employees
Relationships:
  employees.manager_id -> employees.id

Or, a cycle involving multiple entities:

Entities:
  users
  roles
Relationships
  users.role_id -> roles.id
  roles.creator_id -> users.id

We should be able to create features that traverse these cycles once or more. For example, "The average salary of the direct reports of an employee's direct reports": MEAN(employees.employees.salary).

To support this there are at least two places in the code which assume there are no cycles and so will need to be updated:

  1. EntitySet.has_unique_forward_path: When searching for paths this ignores entities which have already been seen – only traversing cycles once. In the case of the "employees" entityset above it would say that there is a unique path from employees to employees, even though there are infinite.
  2. DeepFeatureSynthesis.build_features: Currently this will get stuck in an infinite loop when run on an entityset with a cycle (in the call to EntitySet.get_backward_entities(eid, deep=True)). To fix this we could add a max_relationship_depth param to limit the number of relationships which will be traversed. This would change existing behavior even in entitysets without cycles because there is currently no such limit (max_depth only limits the nesting of features, not the lengths of their relationship paths).

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs designIssues requiring design documentation.new featuresuggestions for new functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions