Training framework for deep learning models that leverage image data to improve tabular data imputation via cross-attention.
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtThe framework expects a single CSV file containing all tabular features, image references, and split assignments. Columns must follow a strict ordering convention.
Columns must appear in this exact order:
[categorical features] | [continuous features] | image_name | [taxonomy] | split
Place all categorical columns first. Each value must be integer-encoded (label-mapped). Missing values are supported and represented as empty cells.
⚠️ You must declare the number of classes for each categorical column — in the same order they appear in the CSV — via thefield_lengths_tabularparameter in your experiment config:data: field_lengths_tabular: [5, 12, 3] # one entry per categorical column
Place all continuous columns after the categorical ones. Missing values are supported (empty cells).
Recommended preprocessing:
- Z-score normalization — zero mean, unit variance (required for stable training)
- Log₁₀ scaling — recommended for heavily skewed or scale-varying features (e.g. body length, weight). Apply before z-scoring and store both the raw and log-transformed columns if needed.
A column named image_name containing the filename of the associated image (e.g. Salmo_trutta.png). The directory containing all images is set separately in the config:
data:
dataset:
dir_image_path: datasets/phenofish/images/For now datasets classes expect images of size (256x256)
Becareful, for training speed up, datatset __init__ fonction cache images into RAM memory.
This column is only required when using the image or multimodal modalities.
To include taxonomic classification in the loss, put with_taxonomy parameter to True and add the following three columns (in this order), each integer-encoded:
| Column | Description |
|---|---|
order_idx |
Taxonomic order index |
family_idx |
Taxonomic family index |
genus_idx |
Taxonomic genus index |
Declare the total number of classes for each level in your experiment config:
data:
taxonomy:
n_order: 37
n_family: 213
n_genus: 1899
train_genus: true # or falseThe last column must be split, with values train or val. Rows with an empty split value are ignored during training.
The example below has no categorical features, 30 z-scored continuous traits, one log₁₀-transformed body length column, image filenames, taxonomy indices, and split assignment:
diet_troph,...,body_elongation,image_name,order_idx,family_idx,genus_idx,split
,-0.3762,...,-0.3675,Petroleuciscus_borysthenicus.png,10,69,1343,train
,...,1.3433,Rineloricaria_heteroptera.png,33,123,1605,val
In this example
field_lengths_tabular: []since there are no categorical columns.
Set the following fields in your dataset config (or experiment override) to match your CSV:
data:
dataset:
data_frame_path: datasets/phenofish/train_val.csv
dir_image_path: datasets/phenofish/images/
n_features: 30 # number of continuous + categorical features
batch_size: 124
num_workers: 8
multi_img_per_line: False
with_taxonomy: True
field_lengths_tabular: [] # number of classes per categorical column, in
taxonomy:
order
n_order: 37
n_family: 213
n_genus: 1899
train_genus: TrueTraining is handled by train.py using Hydra for configuration. You must always specify a modality and optionally an experiment override.
| Modality | Description |
|---|---|
image |
Train image encoder only |
tabular |
Train tabular encoder only |
multimodal |
Train full cross-attention model (image + tabular) |
Train tabular encoder:
python train.py data=phenofish modality=tabularTrain image encoder:
python train.py data=phenofish modality=imageTrain multimodal model:
python train.py data=phenofish modality=multimodalWith a specific experiment config:
python train.py data=phenofish modality=multimodal experiment=phenofishOverride specific parameters at runtime:
python train.py data=phenofish modality=multimodal experiment=phenofish model.multimodal.lr=1e-4 model.multimodal.freeze_encoders=TrueConfigs are organized under conf/ and follow Hydra's config group structure:
conf/
├── main_config.yaml
├── modality/
│ ├── image.yaml
│ ├── tabular.yaml
│ └── multimodal.yaml
├── model/
│ ├── image.yaml
│ ├── tabular.yaml
│ └── multimodal.yaml
├── trainer.yaml
├── config.yaml
└── experiment/
└── phenofish.yaml
Both data and modality are mandatory — Hydra will raise an error if either is not specified.
Experiment configs (under conf/experiment/) are the recommended way to customize a run without modifying the base configs. They use # @package _global_ and can override any parameter from any config group:
# conf/experiment/phenofish.yaml
# @package _global_
data :
dataset:
data_frame_path: datasets/phenofish/dataset_multimodal_v1/train_val_dataset_30_traits_zscore_preprocess.csv
dir_image_path: datasets/phenofish/dataset_multimodal_v1/preprocess_fishmorph_1_2
cpm_aug : 1
n_features: 30
batch_size: 124
num_workers: 8
#multi_img_per_line: False
field_lengths_tabular: []
# data specific
taxonomy:
n_order: 37
n_family: 213
n_genus: 1899
train_genus: true
model:
tabular:
ckpt_path: #runs/phenofish/tabular/best_try_1/checkpoints/best-epoch=971.ckpt
image :
ckpt_path: #runs/phenofish/image/best_try_1/checkpoints/best-epoch=00.ckpt
multimodal:
ckpt_path: runs/phenofish/multimodal/now_way_from_pt_best_try_1_reprod_lambda/checkpoints/last-v1.ckpt
hprams:
lr: 0.0001
weight_decay: 1.0e-05
lambda_classif: 1
lambda_recon: 3.0
lambda_clip: 0.5
lambda_encoders: 1
freeze_encoders: FalseThis keeps base configs untouched and makes each experiment fully reproducible and self-contained.