Skip to content

Enabled flexible feature selection and config portability#24

Open
MihirT906 wants to merge 46 commits intoUnravelSports:mainfrom
MihirT906:feat/polar-flex
Open

Enabled flexible feature selection and config portability#24
MihirT906 wants to merge 46 commits intoUnravelSports:mainfrom
MihirT906:feat/polar-flex

Conversation

@MihirT906
Copy link

@MihirT906 MihirT906 commented Apr 3, 2025

Overview

This PR enhances the GraphConverter class by introducing flexible feature selection and configuration portability. Users can now define feature_specs as a dictionary to specify which graph features to include and configure their parameters. With this update, the GraphConverter is more adaptable to different datasets, research objectives, and analytical approaches. Additionally, configurations can be imported from or exported to a JSON file, ensuring reproducibility and easy sharing.

# converter can be configured with customized feature specifications
converter = SoccerGraphConverterPolars(
        dataset=kloppy_polars_dataset,
        feature_specs = my_feature_specs,
)

# converter configurations can be exported and imported
converter.save("./config_file.json")
converter.load_from_json("./config_file.json")

Summary of Changes

  • Added support for customizable feature_specs attribute in GraphConverter
  • Implemented validation to ensure the provided feature_specs configuration is accurate
  • Created feature function maps in unravel/soccer/graphs/features/edge_feature_func_map.py and unravel/soccer/graphs/features/node_feature_func_map.py to compute graph features based on feature_specs
  • Added export functionality that saves "package_version", "graph_converter_attributes", "graph_settings", "graph_feature_cols" and "dataset_features". A sample of this can be seen in tests/files/default_feature_specs.json
  • Added import functionality that imports all the above configurations from a JSON

Testing

These functionalities can be validated by running pytest tests/test_polar_flex

Using new features:

from unravel.soccer import SoccerGraphConverterPolars, KloppyPolarsDataset
from unravel.utils import CustomSpektralDataset

from kloppy import sportec

# Load Kloppy dataset
kloppy_dataset = sportec.load_open_tracking_data(only_alive=True, limit=500)
kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=kloppy_dataset,
)
kloppy_polars_dataset.add_dummy_labels()
kloppy_polars_dataset.add_graph_ids(by=["frame_id"])

# Initialize desired feature specs
my_feature_specs={
        "node_features": {
                "x_normed": {},
                "y_normed": {"max_value": 100.0},
                "v_cos_normed": {},
                "normed_dist_to_goal": {"max_distance": 50.0},
        },
        "edge_features": {
                "dist_matrix_normed": {"max_distance": 100.0},
                "speed_diff_matrix_normed": {},
        },
}

# Initialize the Graph Converter with custom feature specs
converter = SoccerGraphConverterPolars(
        dataset=kloppy_polars_dataset,
        chunk_size=2_0000,
        non_potential_receiver_node_value=0.1,
        self_loop_ball=True,
        adjacency_matrix_connect_type="ball",
        adjacency_matrix_type="split_by_team",
        label_type="binary",
        defending_team_node_value=0.0,
        random_seed=False,
        pad=False,
        verbose=False,
        feature_specs = my_feature_specs
    )
    
# Export Graph Converter configuration
converter.save("./save_test.json")

# Import Graph Converter from JSON
converter.load_from_json("./save_test.json")

@MihirT906 MihirT906 changed the title Feat/polar flex Enabled flexible feature selection and config portability Apr 3, 2025
@UnravelSports
Copy link
Owner

Hi @MihirT906, apologies for the long reply. I think this is really starting to look good, but here are some things I think we need to change to make this as seamless as possible.


From JSON
The from_json functionality should be a parameter in the GraphConverter itself. This will avoid confusion on initializing and then overwriting parameters.

converter = SoccerGraphConverterPolars(
        dataset=kloppy_polars_dataset,
        from_json="tests/files/default_feature_specs.json"
)

To make this work properly we should borrow:
from kloppy.io import FileLike, open_as_file (kloppy is already a dependency)

This would then allow us to do something like this, although we probably need to verify that from_json is indeed a FileLike object.

if self.from_json is not None:
      with open_as_file(self.from_json) as json_config:
          self.load_from_json(json.load(json_config))

When we now change def load_from_json(self, file_path: str) -> None: to def load_from_json(self, configuration: dict) -> None: and run this I get a RecursionError: maximum recursion depth exceeded.

The load_from_json should be made private to avoid people calling it outside the class. We want to avoid converter.from_json as mentioned above, since it's confusing with the other provided settings.


Validating Feature Specs
The 'global' feature spec validation that we do now inside __post_init__ should probably be put in it's own private function. Something like this:

def _validate_feature_specs_general(self):
        if self.feature_specs is None or not self.feature_specs:
            self.feature_specs = DEFAULT_SOCCER_FEATURE_SPECS
                    
        .... etc

        if (
            self.feature_specs["node_features"] == {}
            and self.feature_specs["edge_features"] == {}
        ):
            raise ValueError(
                "Please provide feature_specs for either 'node_features' or 'edge_features' or both..."
            )

Then, inside this one function we can call all the other things related to feature specs validation too (i.e. self._validate_feature_specs and self._populate_feature_specs. This cleans up the __post_init__


None > {}
For features that have no parameters (and where the specs are always empty, e.g. "is_gk": {}, should always be None over {} or event False. Just to make it explicit we can't add anything there.

It might even be a good idea to differentiate between 'is not set(None) orcannot have any parameters` False. Curious to hear what you think.


Default Settings
The default feature spec should be stored somewhere outside of the class as a "constant".

DEFAULT_SOCCER_FEATURE_SPECS = {
    "edge_features": {
        "x_normed": None,
        "y_normed": None,,
        "s_normed": None,
        "v_sin_normed": None,
        "v_cos_normed": None,
        "normed_dist_to_goal": None,
        "normed_dist_to_ball": None,
        "is_possession_team": None,
        "is_gk": False,
        "is_ball": False,
        "goal_sin_normed": None,
        "goal_cos_normed": None,
        "ball_sin_normed":None,
        "ball_cos_normed":None,
        "ball_carrier": None,
    },
    "node_features": {
        "dist_matrix_normed": None,
        "speed_diff_matrix_normed": None,
        "pos_cos_matrix": None,
        "pos_sin_matrix": None,
        "vel_cos_matrix": None,
        "vel_sin_matrix": None, (or False, not sure)
    },
}

Settings Output
From looking at the new_feature_specs.json file and the default_feature_specs.json it seems like we are storing some irrelevant data, although I'm not sure.

When we use "graph_settings" do we, from scratch, create the PitchDimensions object with all it's settings? Because I think that should/could simply come from the KloppyPolarsDataset? Additionally, the x_dim min/max values are different for each game, so I don't think this should be stored and used in this way here.

However, I do think 'orientation' is important, and we should verify the orientation is the same.


Settings Output 2
I also think the "graph_feature_cols" is incorrect. We're storing this as "graph_feature_cols": self.dataset_checkpoint.data.columns + (self.graph_feature_cols or []). However, the graph_feature_cols specifically is a list of columns in the self.dataset (or self.dataset_checkpoint?) that are explicitly joined to kloppy_polars_dataset. We should simply store "graph_feature_cols"as self.graph_feature_cols, and when initializing from json we should check if these columns exist within the dataset. If they don't exist we throw an error telling the user they need to add these columns to theirkloppy_polars_dataset`.

I don't mind storing all the dataset columns btw, but they should probably be it's own key in the json, and then we can verify that we have the same columns after loading from json. That's probably a check that's always going to be correct, but it's good to verify we have all the columns in our kloppy_polars_dataset.


American Football
In my local copy of your PR I've added a check for the big data bowl data that throws an error when we try to specify feature_specs for the AmericanFootballGraph converter, because it's currently not supported. I've also updated the test. You don't need to do anything with this, I'll implement this after merging this PR is complete.

@MihirT906
Copy link
Author

Hi @UnravelSports, please review the following:

From JSON

I have modified the implementation to take the json file as a parameter during the initialization. Now, it will be passed as such:

converter = SoccerGraphConverterPolars(
        dataset=kloppy_polars_dataset,
        from_json="tests/files/default_feature_specs.json"
)

This calls def _load_from_json(self, configuration: dict) (a private function that takes a dictionary instead of a FileLike object)

Validating Feature Specs
Your above suggestions have been implemented to clean up __post_init__

None > {}
Above suggestion was implemented. Now, all the feature specs are of the format: "feature_name": None instead of "feature_name": {}
I don't think it's a good idea to differentiate between None and False based on whether a feature can accept parameters. That would require users to already know whether a feature accepts parameters. Using None consistently keeps it simple. A user can use None to configure any parameter without worrying about whether it can be configured or not

Default Settings
Default feature specs stored as a constant

Settings Output
The PitchDimentions object comes from KloppyPolarsDataset now. Orientation is verified

I am confused about x_dim min/max. Aren't all the pitch dimensions different for each game?

Settings Ouptut 2
Fixed graph_feature_cols and added a validation to see if columns exist in the dataset. Dataset columns now has its own keys in the json and is being verified to have al columns in kloppy_polars_dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants