Skip to content

A comprehensive training & experimentation pipeline for multi-label classification.

Notifications You must be signed in to change notification settings

thethinkmachine/ekman-emotions

Repository files navigation

Ekman Emotions – Technical Handbook

📌 Overview

This repo contains an experimentation training workflow for multi-label emotion classification on the Ekman emotions dataset (thethinkmachine/ekman-emotions). The pipeline fine-tunes a transformer encoder either with full-parameter updates or with Parameter-Efficient Fine-Tuning (LoRA) adapters. Experiments can be tracked with Weights & Biases, evaluation metrics are computed via sklearn, and trained artefacts can be pushed directly to the Hugging Face Hub.

Main stuff:

  • train.py – orchestrates configuration loading, data preparation, model initialisation, training, evaluation, and hub upload.
  • config.yaml – single source of truth for model, LoRA, training, data, logging, and hub settings.
  • utils.py – helper utilities for run naming, hub identifier generation, config sanity checks, and dataset re-splitting.
  • eval.py – defines benchmark, the metric callback used by the trainer.

🧱 Repository layout

ekman-emotions/
├── config.yaml          # Main experiment configuration
├── train.py             # Training / evaluation entry point
├── eval.py              # Metric computation helper
├── utils.py             # Run-name, hub-id, sanity checks, dataset utilities
├── requirements.txt     # Python dependencies
├── data/                # Optional scripts + prepared datasets
├── checkpoints/, logs/  # Trainer outputs (created at runtime)
└── notebooks/           # Exploratory work and reporting

⚙️ Configuration reference (config.yaml)

All experiments are fully driven by config.yaml. Edit values there to change behaviour. No direct code modifications required.

wandb

  • entity, project: Identifiers for logging to Weights & Biases. Credentials must be available in the environment (e.g., WANDB_API_KEY).

model

  • base_checkpoint: Hugging Face model ID used as the starting point. train.py loads its tokenizer and sequence classification head (with ignore_mismatched_sizes=True for label-count changes).

data

  • dataset_name: Hugging Face dataset ID used by datasets.load_dataset.
  • labels: Ordered emotion list for multi-label classification.
  • Optional keys mirroring the legacy dataset block can also live here:
    • resplit: When true, train.py will call utils.resplit_dataset.
    • custom_split_ratio: Colon-delimited ratios (e.g. "0.8:0.1:0.1").
    • shuffle_before_resplit: Shuffle before re-splitting.
    • random_seed: Controls deterministic shuffles.

lora

  • use_lora: Toggles adapter-based fine-tuning.
  • target_modules: List of module names to receive LoRA adapters. utils.validate_config_sanity currently enforces this list even if use_lora=False—keep it populated or adjust the sanity check.
  • r, lora_alpha, lora_dropout, bias: Fed into peft.LoraConfig when adapters are enabled. The helper functions also bake r and alpha into run names and hub IDs for traceability.

training

Mapped directly into transformers.TrainingArguments (see the call in train.py). Highlights:

  • num_train_epochs, learning_rate, weight_decay, adam_epsilon, max_grad_norm: Core optimisation knobs.
  • warmup_ratio, warmup_steps, lr_scheduler_type: Scheduler behaviour.
  • auto_find_batch_size: When true, the trainer will try the provided per_device_*_batch_size (64 by default) and automatically halve on Out-Of-Memory (OOM) during an internal warm-up step. Start with an upper bound you believe your GPU can handle—Trainer handles the back-off.
  • optim: Optimiser choice. Defaults to adamw_torch; alternatives such as adamw_bnb_8bit require matching dependencies (bitsandbytes).
  • gradient_accumulation_steps, gradient_checkpointing, group_by_length: Memory/performance trade-offs.
  • fp16, bf16, tf32: Mixed-precision toggles. Only enable one of fp16 or bf16.
  • eval_strategy / save_strategy: Evaluation/checkpoint cadence (supports "no", "steps", "epoch"). When set to "steps", ensure the matching eval_steps / save_steps are > 0. If load_best_model_at_end=True, strategies must match and cannot be "no".
  • per_device_train_batch_size, per_device_eval_batch_size: Max values tried before auto tuning. Also reusable when auto_find_batch_size=False.
  • push_to_hub, hub_private_repo, hub_strategy, hub_model_id: Configure Hub uploads; blank hub_model_id lets utilities autogenerate a descriptive ID.
  • run_name: Optional override for W&B run names. Defaults to the auto-generated string from utils.make_run_name.

logging

  • logging_steps, logging_strategy, report_to: Logging cadence and destinations. report_to typically includes "wandb" when W&B tracking is required.

🚀 Training pipeline (train.py)

  1. Configuration load & validation

    • Loads config.yaml, runs utils.validate_config_sanity. The current sanity check ensures LoRA settings are well-formed and basic training parameters are positive.
  2. Experiment tracking

    • Initialises Weights & Biases via wandb.init, with a descriptive run name from utils.make_run_name (encodes base model, LoRA/full-finetune, LR, epochs, weight decay, warmup, timestamp).
  3. Tokenizer & model

    • Loads tokenizer + sequence classification head using AutoTokenizer and AutoModelForSequenceClassification. The classification head is configured for problem_type="multi_label_classification" and inherits label mappings.
  4. Dataset ingestion

    • Pulls the dataset referenced in config['data']['dataset_name'] via datasets.load_dataset.
    • Applies tokenisation and optional re-splitting (utils.resplit_dataset).
    • Uses DataCollatorWithPadding for dynamic padding.
  5. LoRA adapters (optional)

    • When lora.use_lora=True, wraps the base model with PEFT’s get_peft_model. Adapter hyperparameters mirror YAML values. Console logs confirm the configuration.
  6. TrainingArguments assembly

    • Builds TrainingArguments from the training section, with file-system paths rooted under the project directory (checkpoints/, logs/). save_safetensors=True enforces safetensor checkpoints.
  7. Trainer setup

    • Instantiates transformers.Trainer with:
      • benchmark metric function from eval.py (computes macro/micro F1, per-label F1s, and Hamming loss).
      • Tokeniser + data collator for consistent padding.
  8. Train / evaluate / hub upload

    • Runs .train() and .evaluate() on the held-out test split.
    • Pushes model + tokenizer to the Hugging Face Hub using the descriptive ID from utils.make_hub_id.
    • Finishes the W&B run with wandb.finish().

📊 Metrics (eval.py)

The benchmark function receives raw logits and one-hot label vectors, applies a sigmoid → 0.5 threshold, and returns:

  • f1_macro, f1_micro
  • hamming_loss
  • f1_<label> for each Ekman emotion

Use these keys for metric_for_best_model in config.yaml.

🧰 Utilities (utils.py)

  • make_run_name(config): Generates consistent W&B/HF run names embedding base model, adaptation mode, LR, epochs, weight decay, warmup ratio, timestamp, and (if applicable) LoRA r/alpha.
  • make_hub_id(config): Mirrors the above to create unique, self-describing Hub repo IDs.
  • validate_config_sanity(cfg): Lightweight assertions for early config errors (positive epochs, matching eval/save strategies, etc.). Adjust as your config schema evolves—the current implementation assumes LoRA fields are always present.
  • resplit_dataset(...): Concatenates existing splits and re-divides according to custom ratios, with optional shuffle.

🧪 Running an experiment

  1. Create environment

    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    pip install -r requirements.txt
  2. Prepare credentials

    • Hugging Face Hub token (HUGGINGFACE_HUB_TOKEN) for pushing models.
    • W&B API key (WANDB_API_KEY) if logging is enabled.
    • Store them in .env (loaded via python-dotenv) or export in the shell.
  3. Edit config.yaml

    • Update dataset, training, logging, or LoRA parameters as needed.
    • Ensure labels matches the dataset’s column order; problem_type is multi-label.
  4. Launch training

    python train.py
  5. Outputs

    • Checkpoints: checkpoints/
    • Logs (for TensorBoard/W&B syncing): logs/
    • Final evaluation metrics: printed to stdout and available via Trainer logs.
    • Model artefacts: Pushed to the Hub if enabled; run name encodes settings for reproducibility.

📈 Logging & experiment tracking

  • Weights & Biases: Controlled by the logging.report_to list and wandb section. Run names are auto-generated unless overridden (training.run_name).
  • Gradient accumulation & auto batch sizing: Large effective batch sizes can be achieved through gradient_accumulation_steps; auto_find_batch_size halves the per-device batch on OOM and retries until success.
  • Hub pushes: hub_strategy controls cadence ("end", "every_save", etc.). Make sure your token has write access; private repos are supported when hub_private_repo=True.

🩺 LoRA Troubeshooting

  • LoRA checks when disabled: The sanity validator still requires target_modules to be populated. Either keep the list filled or relax the check if you foresee pure full-finetune runs.
  • Adapter target names: Ensure entries in lora.target_modules match modules inside the chosen transformer (inspect model.named_modules() as needed).

👥 Credits

About

A comprehensive training & experimentation pipeline for multi-label classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published