Skip to content

vpulab/txt2imgPAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models

Official code of "A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models" by Alejandro Alonso, Sawaiz A. Chaudhry, Juan C. San Miguel, Álvaro García Martín, Pablo Ayuso Albizu and Pablo Carballeira.

Workflow Diagram Link to Paper | Link to Supplementary Material

Summary

In this paper, we propose a data-centric approach to improve Pedestrian Attribute Recognition through synthetic data augmentation guided by textual descriptions. Specifically, our approach comprises three main steps:

  • First, we define a protocol to systematically identify weakly recognized attributes across multiple datasets.
  • Second, we propose a prompt-driven pipeline that leverages diffusion models to generate synthetic pedestrian images while preserving the consistency of PAR datasets.
  • Finally, we derive a strategy to seamlessly incorporate synthetic samples into training data, which considers prompt-based annotation rules and modifies the loss function.

Getting started


Step 1: Obtain Baseline Results and Select Target Attributes

This step involves using the Rethinking-of-PAR baseline to establish initial performance and identify attributes for augmentation.

  1. Setup Rethinking-of-PAR (if not already done):

    • Our work builds upon the Rethinking-of-PAR framework. If you plan to reproduce the full training and evaluation pipeline, you'll need to clone their repository and follow their setup instructions:
    • Note: If you only intend to use our data generation scripts and not the training/evaluation parts, you do not need the Rethinking-of-PAR setup. However, the subsequent steps assume Rethinking-of-PAR is correctly set up and its directory structure is accessible.
  2. Obtain Baseline Results: Utilize the Rethinking-of-PAR codebase to evaluate its performance. This will serve as baseline.

  3. Identify Target Attributes for Augmentation: Based on the baseline results, the weak attributs are selected. Our criteria for selecting attributes are detailed in Sec. 3.1 of the paper.

Step 2: Generate Synthetic Data via ComfyUI

This step uses ComfyUI for text-to-image diffusion to generate synthetic pedestrian images.

  1. Setup ComfyUI
  • Ensure you have a working installation of ComfyUI
  • After installing ComfyUI, also install additional dependencies from requirements_generation.txt in this repo.
  • Note: While we use ComfyUI, the general principles of prompt-driven generation should be adaptable to other diffusion model interfaces if you implement the necessary logic.
  1. Load Workflow and Wildcards in ComfyUI:

    • Launch ComfyUI.
    • Import our provided generation_workflow.json. This file defines the pipeline for prompt generation, image diffusion, and initial post-processing.
    • Ensure any custom nodes referenced in the workflow are installed in your ComfyUI setup. None of the nodes used in our workflow were created by us; they should all be publicly available. For direct references, please see our supplementary material.
    • For using our wildcard setup, place the wildcard files (e.g., those in the data_augmentaton/wildcards/ directory) where ComfyUI can access them (typically its ComfyUI/custom_nodes/ComfyUI-DynamicPrompts/wildcards/).
    • Important: The files under text_files/ in this repository are not directly for ComfyUI's wildcard system; they are used by our add_synthetic_labels.py script in Step 3. If you modify wildcards for ComfyUI, ensure corresponding changes are made in text_files/ for consistent labeling.
  2. Generate Images

    • Configure the number of samples, noise levels, etc., in ComfyUI according to your needs.
    • For guidance on how many synthetic images to create (e.g., 3× or 5× per real sample) or some specific parameters or generation process, reffer to the supplementary material.
  3. Optional: Manual Verification and Cleaning:

    • In our experiments, we checked every image used for augmentation, and so, we highly recommend manually reviewing the generated images.
    • Verify that the intended attribute is clearly present in each synthetic image.
    • Discard images with significant generation artifacts (e.g., extra limbs, distorted faces, impossible poses).

Step 3: Label Synthetic Data

This step involves assigning PAR labels to the generated synthetic images and merging them with an existing dataset.

  1. Run Labeling Script:
    • Set the parameters on the add_synthetic_labels.py script from this repository and execute it. This script uses the .txt files in the text_files/ directory (which map prompt components to attribute presence) to assign extended labels (-1, 0, 1, 2, 3) to the synthetic images.
    • For a detailed explanation of these label values and their significance, please refer to Sec. 3.3 of our paper.
    • The script will merge your original dataset annotations (e.g., from a .pkl file) with the new synthetic images and their labels, outputting a new .pkl file for training.
    • Note: The current add_synthetic_labels.py script is tailored for RAP dataset structures. You may need to adapt it if you are using other PAR datasets with different annotation formats.

Step 4: Train and Evaluate with Augmented Data

This step uses the augmented dataset created in Step 3 to train and evaluate the Rethinking-of-PAR model. This step assumes you have Rethinking-of-PAR set up (from Step 1).

  1. Update Baseline Codebase (Rethinking-of-PAR):
    • Configuration Files:
      • Copy the default.py from our repository into the Rethinking-of-PAR configs/ subdirectory.
    • Loss Function:
      • Copy our augmented loss function file into the appropriate loss module directory within the Rethinking-of-PAR codebase (Rethinking_of_PAR/losses/).
    • Dataset Path in Config:
      • In your Rethinking-of-PAR training configuration file, update the dataset path to point to the new .pkl file generated in Step 3.
      • Ensure the configuration also references the augmented loss function you copied.
      • An example of a full training configuration (example_augmented_config.yaml) demonstrating necessary changes is provided in this repository.
  2. Train and Evaluate:
    • Train the model using the updated configuration file and the augmented dataset.
    • Compare the results against the baseline performance (obtained in Step 1) to quantify the impact of the synthetic data augmentation.

Pre-generated Synthetic Data

We provide our generated synthetic images to facilitate reproduction or quick testing. You can download them from the following links:

Backup/alternative link: Download Link all data

Citation

If you find this work useful in your research, please consider citing our paper:

@InProceedings{Alonso_2025_AVSS,
    author    = {Alejandro Alonso, Sawaiz A. Chaudhry, Juan C. SanMiguel, \'{A}lvaro Garc\'{i}a-Mart\'{i}n, Pablo Ayuso-Albizu, Pablo Carballeira},
    title     = {A Data-Centric Approach to Pedestrian Attribute Recognition:Synthetic Augmentation via Prompt-driven Diffusion Models},
    booktitle = {Proceedings of the IEEE International Conference on Advanced Visual and Signal-Based Systems },
    month     = {August},
    year      = {2025},
    pages     = {1-6}
}

About

Code for paper accepted at AVSS 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages