Skip to content

Design of subgraph_sum and subtree_sum leads to very suboptimal performance #145

@ilumsden

Description

@ilumsden

Rule number 1 of any dataframe library is "don't do operations by iterating over rows." However, this is exactly what we do in subgraph_sum and subtree_sum. We need to refactor this to use a better mechanism (e.g., DataFrame.apply).

To get a sense of the performance impact, I can anecdotally say that subgraph_sum is 3-4x slower than the query language. And the query language is solving a version of subgraph isomorphism, an NP Hard problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions