Skip to content

GMM test#12

Draft
Haffi112 wants to merge 1 commit into
mainfrom
gmm_sanity_check
Draft

GMM test#12
Haffi112 wants to merge 1 commit into
mainfrom
gmm_sanity_check

Conversation

@Haffi112
Copy link
Copy Markdown
Collaborator

This is just a sanity check to see if GMMs produce more stable results than the kernel density estimator. The aim is not necessarily to merge this with main, but just to check if it produces better results.

Copilot AI review requested due to automatic review settings August 19, 2025 16:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the kernel density estimator with a Gaussian Mixture Model (GMM) for generative modeling of EU survey data. The change aims to evaluate whether GMMs provide more stable results than the previous approach.

  • Replaces KernelDensity with GaussianMixture from sklearn
  • Adds component selection methodology with configurable parameters for GMM optimization
  • Introduces validation-based model selection using BIC/AIC criteria

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/scripts/train_generative_model.py Passes new GMM configuration parameters to the training function
src/european_values/generative_training.py Core implementation change from KernelDensity to GaussianMixture with component selection
config/config.yaml Adds GMM-specific configuration parameters with default values

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

plot_component_selection,
save_evaluation_results,
)

Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import of gmm_component_selection module will fail as this module is not included in the diff and likely doesn't exist yet. The functions evaluate_gmm_components, plot_component_selection, and save_evaluation_results are being imported but their implementation is missing.

Suggested change
# --- Implementations for missing functions ---
import numpy as np
import matplotlib.pyplot as plt
def evaluate_gmm_components(
X_train,
X_val,
max_components,
criterion="bic",
covariance_type="full",
random_state=None,
n_init=1,
reg_covar=1e-6,
):
"""Evaluate GMMs with different numbers of components and select the best one."""
scores = []
n_components_range = range(1, max_components + 1)
for n in n_components_range:
gmm = GaussianMixture(
n_components=n,
covariance_type=covariance_type,
random_state=random_state,
n_init=n_init,
reg_covar=reg_covar,
)
gmm.fit(X_train)
if criterion == "bic":
score = gmm.bic(X_val)
else:
score = gmm.aic(X_val)
scores.append(score)
optimal_n = n_components_range[np.argmin(scores)]
evaluation_results = {
"n_components": list(n_components_range),
"scores": scores,
"criterion": criterion,
}
return optimal_n, evaluation_results
def plot_component_selection(evaluation_results, output_path=None):
"""Plot the selection criterion scores for different numbers of GMM components."""
plt.figure()
plt.plot(
evaluation_results["n_components"],
evaluation_results["scores"],
marker="o",
)
plt.xlabel("Number of GMM components")
plt.ylabel(evaluation_results.get("criterion", "Score").upper())
plt.title("GMM Component Selection")
if output_path:
plt.savefig(output_path)
else:
plt.show()
plt.close()
def save_evaluation_results(evaluation_results, output_path):
"""Save evaluation results to a CSV file."""
df = pd.DataFrame({
"n_components": evaluation_results["n_components"],
"score": evaluation_results["scores"],
})
df.to_csv(output_path, index=False)

Copilot uses AI. Check for mistakes.
Comment thread config/config.yaml
selection_criterion: bic # 'bic' or 'aic'
covariance_type: diag # 'full', 'tied', 'diag', 'spherical' - using 'diag' for stability
n_init: 5 # Number of GMM initializations
reg_covar: 1e-5 # Regularization for covariance matrices (increased for extra stability)
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for reg_covar in the function signature (1e-6) differs from the config file value (1e-5). This inconsistency could lead to confusion about which value is actually used.

Suggested change
reg_covar: 1e-5 # Regularization for covariance matrices (increased for extra stability)
reg_covar: 1e-6 # Regularization for covariance matrices (matches function default)

Copilot uses AI. Check for mistakes.
seed: int,
n_components_max: int = 50,
selection_criterion: str = "bic",
covariance_type: str = "full",
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for covariance_type in the function signature is 'full' but the config file sets it to 'diag'. This mismatch could cause unexpected behavior when the function is called without explicit parameters.

Suggested change
covariance_type: str = "full",
covariance_type: str = "diag",

Copilot uses AI. Check for mistakes.
@saattrupdan saattrupdan marked this pull request as draft August 19, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants