This repository provides methods for generating Keypoint-Integrated Instruction-Following Data to enhance multimodal models' understanding of human poses and actions. It is built on the LLaVA framework and is detailed in our research paper:
Keypoint-Instruction-Tuning/
├── data_generation/
│ ├── conversation_gen.py
│ ├── detailed_description_gen.py
│ └── complex_reasoning_gen.py
├── datasets/
│ ├── generated_data_conversation.json
│ ├── generated_data_detailed.json
│ └── generated_data_reasoning.json
├── LLaVA/
│ └── [LLaVA original files here]
├── requirements.txt
└── README.md
Install dependencies and initialize the submodule for LLaVA:
pip install -r requirements.txtThis repository integrates the original LLaVA as a git submodule for easy synchronization.
Use the provided scripts to generate the fine-tuning data types described in our paper:
-
Conversation
python data_generation/conversation_gen.py
-
Detailed Description
python data_generation/detailed_description_gen.py
-
Complex Reasoning
python data_generation/complex_reasoning_gen.py
All generated data is saved to the
datasets/directory.
For fine-tuning and model training instructions, please visit the original LLaVA repository.
To securely use your OpenAI API key, set it as an environment variable:
import os
import openai
openai.api_key = os.getenv('OPENAI_API_KEY')Contributions, discussions, and issues are welcome! Please feel free to open an issue or submit a pull request.