PoseLLM: Enhancing Language-Guided Human Pose Estimation with Multilayer Perceptron Alignment

[arXiv]

Installation

1. Clone code

    git clone https://github.com/Ody-trek/PoseLLM
    cd ./PoseLLM

2. Create a conda environment for this repo

    conda create -n PoseLLM python=3.10
    conda activate PoseLLM

3. Install CUDA 11.7 (other version may not work)

    conda install -c conda-forge cudatoolkit-dev

4. Install PyTorch following official instruction (should match cuda version)

    conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia

4. Install other dependency python packages (do not change package version)

    pip install pycocotools
    pip install opencv-python
    pip install accelerate==0.21.0
    pip install sentencepiece==0.1.99
    pip install transformers==4.31.0

5. Prepare dataset

Download COCO , MPII and Human-Art from website and put the zip file under the directory following below structure, (xxx.json) denotes their original name.

./data
|── coco
│   └── annotations
|   |   └──coco_train.json(person_keypoints_train2017.json)
|   |   └──coco_val.json(person_keypoints_val2017.json)
|   └── images
|   |   └──train2017
|   |   |   └──000000000009.jpg
|   |   └──val2017
|   |   |   └──000000000139.jpg
├── HumanArt
│   └── annotations
|   |   └──validation_humanart.json
|   └── images
|   |   └──2D_virtual_human
├── mpii
│   └── annot
|   |   └──valid.json
|   |   └──gt_valid.mat
|   └── images
|   |   └──000001163.jpg

Usage

1. Download trained model

    git lfs install

    git clone https://huggingface.co/KTrek/PoseLLM

    mkdir checkpoints
    mkdir checkpoints/ckpts
    mv PoseLLM/coco checkpoints/ckpts
    # for training
    mkdir checkpoints/model_weights
    mv PoseLLM/pretrained/dinov2_vitl14_pretrain.pth checkpoints/model_weights
    # clone vicuna1.5
    cd checkpoints/model_weights
    git clone https://huggingface.co/lmsys/vicuna-7b-v1.5

2. Evaluate Model

Change IDX option in script to specify the gpu ids for evaluation, multiple ids denotes multiple gpu evaluation.

    # evaluate on coco val set
    bash scripts/valid_coco.sh
    # evaluate on humanart set
    bash scripts/valid_humanart.sh
    # evaluate on mpii set
    bash scripts/valid_mpii.sh

3. Train Model

    # train on coco
    bash scripts/train_coco.sh

Note that GPU memory should not be less than 24GB, training on 2 RTX A6000 GPUs takes about 4 days.

Citations

If you find this code useful for your research, please cite our paper:

@article{zhang2025posellm,
  title={PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment},
  author={Zhang, Dewen and Hussain, Tahir and An, Wangpeng and Shouno, Hayaru},
  journal={arXiv preprint arXiv:2507.09139},
  year={2025}
}

Acknowledgement

The code is mainly encouraged by LocLLM.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
img		img
models		models
scripts		scripts
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PoseLLM: Enhancing Language-Guided Human Pose Estimation with Multilayer Perceptron Alignment

Installation

1. Clone code

2. Create a conda environment for this repo

3. Install CUDA 11.7 (other version may not work)

4. Install PyTorch following official instruction (should match cuda version)

4. Install other dependency python packages (do not change package version)

5. Prepare dataset

Usage

1. Download trained model

2. Evaluate Model

3. Train Model

Citations

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

Ody-trek/PoseLLM

Folders and files

Latest commit

History

Repository files navigation

PoseLLM: Enhancing Language-Guided Human Pose Estimation with Multilayer Perceptron Alignment

Installation

1. Clone code

2. Create a conda environment for this repo

3. Install CUDA 11.7 (other version may not work)

4. Install PyTorch following official instruction (should match cuda version)

4. Install other dependency python packages (do not change package version)

5. Prepare dataset

Usage

1. Download trained model

2. Evaluate Model

3. Train Model

Citations

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages