Skip to content

Use dataclass to annotate typing of SVM+PCA instead of using raw dictionary #12

@rafaeljacov

Description

@rafaeljacov

@GizakiF

📦 IMPORTANT
inside svm_with_pca_fold_4.pkl is a dictionary: "svm": clf.best_estimator_, "pca": pca

However, this approach is not reliable or future-proof, especially if the structure of the dictionary changes over time. Relying only on external documentation (like this README) means there's no way for the code or your IDE to automatically understand the structure of the loaded data.

Solution: Define a Python @dataclass that clearly specifies the expected structure of the data. This enables:

  • Built-in type safety
  • Better IDE support with autocompletion and type hints (via LSP)
  • Easier debugging and code maintenance

By type casting the loaded .pkl content into a dataclass, you make your code more robust and self-documenting.

🧠 Example:

from dataclasses import dataclass
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import pickle

@dataclass
class SVMWithPCA:
    svm: SVC
    pca: PCA

# Load the raw dictionary from the pickle file
with open("svm_with_pca_fold_4.pkl", "rb") as f:
    raw_data = pickle.load(f)

# `raw_data` is the original dictionary loaded from the file
# which looks like: { "svm": <SVC object>, "pca": <PCA object> }

# Convert (type-cast) the dictionary into a dataclass instance
typed_data = SVMWithPCA(**raw_data)

About the **raw_data syntax:

The ** operator in Python is used to unpack a dictionary into keyword arguments. In this case:

SVMWithPCA(**raw_data)

# is equivalent to:

SVMWithPCA(svm=raw_data["svm"], pca=raw_data["pca"])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions