A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models
Official code of "A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models" by Alejandro Alonso, Sawaiz A. Chaudhry, Juan C. San Miguel, Álvaro García Martín, Pablo Ayuso Albizu and Pablo Carballeira.
Link to Paper | Link to Supplementary Material
In this paper, we propose a data-centric approach to improve Pedestrian Attribute Recognition through synthetic data augmentation guided by textual descriptions. Specifically, our approach comprises three main steps:
- First, we define a protocol to systematically identify weakly recognized attributes across multiple datasets.
- Second, we propose a prompt-driven pipeline that leverages diffusion models to generate synthetic pedestrian images while preserving the consistency of PAR datasets.
- Finally, we derive a strategy to seamlessly incorporate synthetic samples into training data, which considers prompt-based annotation rules and modifies the loss function.
- Step 1: Obtain Baseline Results and Select Target Attributes
- Step 2: Generate Synthetic Data via ComfyUI
- Step 3: Label Synthetic Data
- Step 4: Train and Evaluate with Augmented Data
This step involves using the Rethinking-of-PAR baseline to establish initial performance and identify attributes for augmentation.
-
Setup Rethinking-of-PAR (if not already done):
- Our work builds upon the Rethinking-of-PAR framework. If you plan to reproduce the full training and evaluation pipeline, you'll need to clone their repository and follow their setup instructions:
- Note: If you only intend to use our data generation scripts and not the training/evaluation parts, you do not need the Rethinking-of-PAR setup. However, the subsequent steps assume Rethinking-of-PAR is correctly set up and its directory structure is accessible.
-
Obtain Baseline Results: Utilize the Rethinking-of-PAR codebase to evaluate its performance. This will serve as baseline.
-
Identify Target Attributes for Augmentation: Based on the baseline results, the weak attributs are selected. Our criteria for selecting attributes are detailed in Sec. 3.1 of the paper.
This step uses ComfyUI for text-to-image diffusion to generate synthetic pedestrian images.
- Setup ComfyUI
- Ensure you have a working installation of ComfyUI
- After installing ComfyUI, also install additional dependencies from
requirements_generation.txtin this repo. - Note: While we use ComfyUI, the general principles of prompt-driven generation should be adaptable to other diffusion model interfaces if you implement the necessary logic.
-
Load Workflow and Wildcards in ComfyUI:
- Launch ComfyUI.
- Import our provided
generation_workflow.json. This file defines the pipeline for prompt generation, image diffusion, and initial post-processing. - Ensure any custom nodes referenced in the workflow are installed in your ComfyUI setup. None of the nodes used in our workflow were created by us; they should all be publicly available. For direct references, please see our supplementary material.
- For using our wildcard setup, place the wildcard files (e.g., those in the
data_augmentaton/wildcards/directory) where ComfyUI can access them (typically itsComfyUI/custom_nodes/ComfyUI-DynamicPrompts/wildcards/). - Important: The files under
text_files/in this repository are not directly for ComfyUI's wildcard system; they are used by ouradd_synthetic_labels.pyscript in Step 3. If you modify wildcards for ComfyUI, ensure corresponding changes are made intext_files/for consistent labeling.
-
Generate Images
- Configure the number of samples, noise levels, etc., in ComfyUI according to your needs.
- For guidance on how many synthetic images to create (e.g., 3× or 5× per real sample) or some specific parameters or generation process, reffer to the supplementary material.
-
Optional: Manual Verification and Cleaning:
- In our experiments, we checked every image used for augmentation, and so, we highly recommend manually reviewing the generated images.
- Verify that the intended attribute is clearly present in each synthetic image.
- Discard images with significant generation artifacts (e.g., extra limbs, distorted faces, impossible poses).
This step involves assigning PAR labels to the generated synthetic images and merging them with an existing dataset.
- Run Labeling Script:
- Set the parameters on the
add_synthetic_labels.pyscript from this repository and execute it. This script uses the.txtfiles in thetext_files/directory (which map prompt components to attribute presence) to assign extended labels (-1, 0, 1, 2, 3) to the synthetic images. - For a detailed explanation of these label values and their significance, please refer to Sec. 3.3 of our paper.
- The script will merge your original dataset annotations (e.g., from a
.pklfile) with the new synthetic images and their labels, outputting a new.pklfile for training. - Note: The current
add_synthetic_labels.pyscript is tailored for RAP dataset structures. You may need to adapt it if you are using other PAR datasets with different annotation formats.
- Set the parameters on the
This step uses the augmented dataset created in Step 3 to train and evaluate the Rethinking-of-PAR model. This step assumes you have Rethinking-of-PAR set up (from Step 1).
- Update Baseline Codebase (Rethinking-of-PAR):
- Configuration Files:
- Copy the
default.pyfrom our repository into the Rethinking-of-PARconfigs/subdirectory.
- Copy the
- Loss Function:
- Copy our augmented loss function file into the appropriate loss module directory within the Rethinking-of-PAR codebase (
Rethinking_of_PAR/losses/).
- Copy our augmented loss function file into the appropriate loss module directory within the Rethinking-of-PAR codebase (
- Dataset Path in Config:
- In your Rethinking-of-PAR training configuration file, update the dataset path to point to the new
.pklfile generated in Step 3. - Ensure the configuration also references the augmented loss function you copied.
- An example of a full training configuration (
example_augmented_config.yaml) demonstrating necessary changes is provided in this repository.
- In your Rethinking-of-PAR training configuration file, update the dataset path to point to the new
- Configuration Files:
- Train and Evaluate:
- Train the model using the updated configuration file and the augmented dataset.
- Compare the results against the baseline performance (obtained in Step 1) to quantify the impact of the synthetic data augmentation.
We provide our generated synthetic images to facilitate reproduction or quick testing. You can download them from the following links:
- hs-BaldHead: Download Link_BaldHead
- lb-ShortSkirt: Download Link_ShortSkirt
- AgeLess16: Download Link_Age16
- ub-SuitUp: Download Link_SuitUp
- attach-PlasticBag: Download Link_PlasticBag
Backup/alternative link: Download Link all data
If you find this work useful in your research, please consider citing our paper:
@InProceedings{Alonso_2025_AVSS,
author = {Alejandro Alonso, Sawaiz A. Chaudhry, Juan C. SanMiguel, \'{A}lvaro Garc\'{i}a-Mart\'{i}n, Pablo Ayuso-Albizu, Pablo Carballeira},
title = {A Data-Centric Approach to Pedestrian Attribute Recognition:Synthetic Augmentation via Prompt-driven Diffusion Models},
booktitle = {Proceedings of the IEEE International Conference on Advanced Visual and Signal-Based Systems },
month = {August},
year = {2025},
pages = {1-6}
}