Added print statements for probabilities without .max(axis=1), made various changes in training script, evaluation script and data processing script. by AnnikaSimonsen · Pull Request #9 · alexandrainst/european_values

AnnikaSimonsen · 2025-08-05T21:05:44Z

No description provided.

…arious changes in training script, evaluation script and data processing script.

Copilot

Pull Request Overview

This PR refactors the generative model training and evaluation pipeline to use scikit-learn pipelines with integrated normalization, adds a new evaluation script for LLM benchmarking, and includes debugging output for probability analysis.

Key changes:

Modified data processing to optionally skip normalization and return fitted scalers
Refactored training to use sklearn pipelines combining MinMaxScaler and GaussianMixture
Added comprehensive evaluation script with probability analysis and debugging output

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/scripts/train_generative_model.py	Simplified data loading logic and integrated with new pipeline approach
src/scripts/evaluate_llm_benchmark.py	New evaluation script for LLM benchmarking with probability analysis
src/european_values/llm_evaluation.py	New module implementing GMM-based evaluation with pipeline support
src/european_values/generative_training.py	Refactored to use sklearn pipelines and added extensive probability debugging
src/european_values/data_processing.py	Modified to support optional normalization and return fitted scalers

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

… config updates

saattrupdan · 2025-08-06T14:30:49Z

@AnnikaSimonsen Please mark my comments as resolved if you've fixed them by now. Also, there's a code check failing, which is due to the processing function returning a tuple now. Needs to be fixed in the train_discriminative_classifier, optimise_survey, and create_plot scripts.

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

- Update train_discriminative_classifier.py to handle (df, scaler) return - Update optimise_survey.py to handle (df, scaler) return - Update create_plot.py to handle (df, scaler) return - Fixes failing CI code check

…llm_benchmark.py - Remove unnecessary load_gmm_pipeline function, use joblib.load directly - Simplify process_responses function with minimal NaN handling - Try pipeline.predict_proba() first, fallback to component access - Add flexible data loading to support both EVS trend and EVS/WVS datasets - Fix tuple unpacking for process_data return value Addresses reviewer feedback

saattrupdan · 2025-08-07T11:05:12Z

+    """Main evaluation function."""
+    # Load data
+    logger.info("Loading data...")
+    df = load_evs_wvs_data()


Still missing this one

saattrupdan · 2025-08-07T11:06:00Z

 from european_values.data_processing import process_data
-from european_values.generative_training import train_generative_model
+from european_values.generative_training import (
+    train_generative_model,  # <-- This was missing!


Nit: No internal comments in the code please 🙂

Suggested change

train_generative_model, # <-- This was missing!

train_generative_model

saattrupdan

Looks good now!

AnnikaSimonsen added 2 commits August 4, 2025 16:52

Fix remaining line length issues in print statements

b156496

Added print statements for probabilities without .max(axis=1), made v…

33700f1

…arious changes in training script, evaluation script and data processing script.

AnnikaSimonsen requested a review from saattrupdan August 5, 2025 21:05

saattrupdan marked this pull request as ready for review August 6, 2025 10:09

Copilot AI review requested due to automatic review settings August 6, 2025 10:09

Copilot AI reviewed Aug 6, 2025

View reviewed changes

Comment thread src/european_values/generative_training.py Outdated

Comment thread src/european_values/generative_training.py

Comment thread src/european_values/generative_training.py

Comment thread src/european_values/llm_evaluation.py

Comment thread src/european_values/llm_evaluation.py

saattrupdan requested changes Aug 6, 2025

View reviewed changes

saattrupdan assigned AnnikaSimonsen Aug 6, 2025

AnnikaSimonsen and others added 9 commits August 6, 2025 10:36

Update src/european_values/data_processing.py

5261120

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/european_values/generative_training.py

437ef04

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/european_values/generative_training.py

a666c21

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/european_values/generative_training.py

58f0c6f

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/scripts/evaluate_llm_benchmark.py

205725b

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/european_values/generative_training.py

2def0f6

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/scripts/evaluate_llm_benchmark.py

96021dd

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/scripts/train_generative_model.py

3136d90

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/scripts/evaluate_llm_benchmark.py

803b806

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

AnnikaSimonsen requested a review from saattrupdan August 6, 2025 12:35

Address review feedback: flexible data loading, evaluation fixes, and…

a9aba95

… config updates

AnnikaSimonsen force-pushed the annika branch from c7bc8f8 to a9aba95 Compare August 6, 2025 12:37

AnnikaSimonsen and others added 5 commits August 7, 2025 08:01

Update src/european_values/generative_training.py

0bab4dc

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Update src/european_values/generative_training.py

6aae6a8

Co-authored-by: Dan Saattrup Smart <47701536+saattrupdan@users.noreply.github.com>

Apply ruff formatting

ae00d74

saattrupdan requested changes Aug 7, 2025

View reviewed changes

Update data loading patterns and fix tuple unpacking

012ded9

saattrupdan approved these changes Aug 7, 2025

View reviewed changes

saattrupdan merged commit 0bda7c4 into main Aug 7, 2025
2 checks passed

saattrupdan deleted the annika branch August 7, 2025 11:37

	train_generative_model, # <-- This was missing!
	train_generative_model

Uh oh!

Conversation

AnnikaSimonsen commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saattrupdan commented Aug 6, 2025

Uh oh!

saattrupdan Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

saattrupdan Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saattrupdan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saattrupdan Aug 7, 2025 •

edited

Loading