Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions docs/01-Intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,12 @@ slug: /

# Introduction

CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [CogView](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [CogVideoX](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as text-to-image(T2I), text-to-video(T2V), and image-to-video(I2V). Users must comply with legal and ethical guidelines to ensure responsible implementation.

## Supported Models

Please refer to the [Model Card](./05-Model%20Card.mdx) for more details.

## Environment Testing

This repository has been tested in environments with `1×A100` and `8×A100` GPUs, using `CUDA 12.4, Python 3.10.16`.

- Cog series models typically do not support `FP16` precision (Only `CogVideoX-2B` support); GPUs like the `V100` cannot be fine-tuned properly (Will cause `loss=nan` for example). At a minimum, an `A100` or other GPUs supporting `BF16` precision should be used.
- We have not yet systematically tested the minimum GPU memory requirements for each model. For `LORA(bs=1 with offload)`, a single `A100` GPU is sufficient. For `SFT`, our tests have passed in an `8×A100` environment.
This repository has been tested in environments with 8×A100 GPUs, using CUDA 12.4, Python 3.10.16.
118 changes: 34 additions & 84 deletions docs/04-Finetune/01-Prerequisites.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,109 +3,69 @@

# Prerequisites

Before starting fine-tuning, please ensure your machine meets the minimum hardware requirements listed in the tables below. The tables show the minimum VRAM (GPU memory) requirements for different models under various configurations.
Before starting fine-tuning, please ensure your machine meets the minimum hardware requirements listed in the tables below. The tables show the minimum VRAM requirements for different models under various configurations (test on 8xA100).

## CogVideo Series

<table style={{ textAlign: "center" }}>
<thead>
<tr>
<th style={{ textAlign: "center" }}>Model</th>
<th style={{ textAlign: "center" }}>Training Type</th>
<th style={{ textAlign: "center" }}>Distribution Strategy</th>
<th style={{ textAlign: "center" }}>Training Resolution (FxHxW)</th>
<th style={{ textAlign: "center" }}>Type</th>
<th style={{ textAlign: "center" }}>Strategy</th>
<th style={{ textAlign: "center" }}>Resolution <br /> (FxHxW)</th>
<th style={{ textAlign: "center" }}>Requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="6">cogvideox-t2v-2b</td>
<td rowSpan="2">cogvideox-t2v-2b</td>
<td>lora</td>
<td>DDP</td>
<td>49x480x720</td>
<td>16GB VRAM</td>
<td>1 GPU with <br /> 12GB VRAM</td>
</tr>
<tr>
<td rowspan="5">sft</td>
<td rowSpan="1">sft</td>
<td>DDP</td>
<td>49x480x720</td>
<td>36GB VRAM</td>
<td>1 GPU with <br /> 25GB VRAM</td>
</tr>
<tr>
<td>1-GPU zero-2 + opt offload</td>
<td>49x480x720</td>
<td>17GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-2</td>
<td>49x480x720</td>
<td>17GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3</td>
<td>49x480x720</td>
<td>19GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3 + opt and param offload</td>
<td>49x480x720</td>
<td>14GB VRAM</td>
</tr>
<tr>
<td rowspan="5">cogvideox-\{t2v,i2v\}-5b</td>
<td rowSpan="3">cogvideox-\{t2v,i2v\}-5b</td>
<td>lora</td>
<td>DDP</td>
<td>49x480x720</td>
<td>24GB VRAM</td>
</tr>
<tr>
<td rowspan="4">sft</td>
<td>1-GPU zero-2 + opt offload</td>
<td>49x480x720</td>
<td>42GB VRAM</td>
<td>1 GPU with <br /> 24GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-2</td>
<td rowSpan="2">sft</td>
<td>FSDP fullshard</td>
<td>49x480x720</td>
<td>42GB VRAM</td>
<td>8 GPU with <br /> 20GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3</td>
<td>FSDP fullshard + offload</td>
<td>49x480x720</td>
<td>43GB VRAM</td>
<td>1 GPU with <br /> 16GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3 + opt and param offload</td>
<td>49x480x720</td>
<td>28GB VRAM</td>
</tr>
<tr>
<td rowspan="5">cogvideox1.5-\{t2v,i2v\}-5b</td>
<td rowSpan="3">cogvideox1.5-\{t2v,i2v\}-5b</td>
<td>lora</td>
<td>DDP</td>
<td>81x768x1360</td>
<td>35GB VRAM</td>
</tr>
<tr>
<td rowspan="4">sft</td>
<td>1-GPU zero-2 + opt offload</td>
<td>81x768x1360</td>
<td>56GB VRAM</td>
<td>1 GPU with <br /> 32GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-2</td>
<td rowSpan="2">sft</td>
<td>FSDP fullshard</td>
<td>81x768x1360</td>
<td>55GB VRAM</td>
<td>8 GPUs with <br /> 31GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3</td>
<td>FSDP fullshard + offload</td>
<td>81x768x1360</td>
<td>55GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3 + opt and param offload</td>
<td>81x768x1360</td>
<td>40GB VRAM</td>
<td>8 GPUs with <br /> 27GB VRAM</td>
</tr>
</tbody>
</table>
Expand All @@ -116,46 +76,36 @@ Before starting fine-tuning, please ensure your machine meets the minimum hardwa
<thead>
<tr>
<th style={{ textAlign: "center" }}>Model</th>
<th style={{ textAlign: "center" }}>Training Type</th>
<th style={{ textAlign: "center" }}>Distribution Strategy</th>
<th style={{ textAlign: "center" }}>Training Resolution (HxW)</th>
<th style={{ textAlign: "center" }}>Type</th>
<th style={{ textAlign: "center" }}>Strategy</th>
<th style={{ textAlign: "center" }}>Resolution <br /> (HxW)</th>
<th style={{ textAlign: "center" }}>Requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="6">CogView4-6B</td>
<td>qlora + param offload <br />(`--low_vram`)</td>
<td rowSpan="4">CogView4-6B</td>
<td>qlora + offload <br />(enable --low_vram)</td>
<td>DDP</td>
<td>1024x1024</td>
<td>9GB VRAM</td>
<td>1 GPU with <br /> 9GB VRAM</td>
</tr>
<tr>
<td>lora</td>
<td>DDP</td>
<td>1024x1024</td>
<td>30GB VRAM</td>
</tr>
<tr>
<td rowspan="4">sft</td>
<td>1-GPU zero-2 + opt offload</td>
<td>1024x1024</td>
<td>42GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-2</td>
<td>1024x1024</td>
<td>50GB VRAM</td>
<td>1 GPU with <br /> 20GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3</td>
<td rowSpan="2">sft</td>
<td>FSDP fullshard</td>
<td>1024x1024</td>
<td>47GB VRAM</td>
<td>8 GPUs with <br /> 28GB VRAM</td>
</tr>
<tr>
<td>8-GPU zero-3 + opt and param offload</td>
<td>FSDP fullshard + offload</td>
<td>1024x1024</td>
<td>28GB VRAM</td>
<td>8 GPUs with <br /> 22GB VRAM</td>
</tr>
</tbody>
</table>
29 changes: 14 additions & 15 deletions docs/04-Finetune/02-Quick Start.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,36 +27,35 @@ We recommend that you read the corresponding [model card](../05-Model%20Card.mdx
:::

1. Navigate to the `CogKit/` directory after cloning the repository

```bash
cd CogKit/
```

2. Choose the appropriate training script from the `quickstart/scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task

3. Review and adjust the parameters in the selected training script (e.g., `--data_root`, `--output_dir`, etc.)
2. Choose the appropriate subdirectory from the `quickstart/scripts` based on your task type and distribution strategy. For example, `t2i` corresponds to text-to-image task

4. [Optional] If you are using ZeRO strategy, refer to `quickstart/configs/accelerate_config.yaml` to confirm your ZeRO config file and number of GPUs.
3. Review and adjust the parameters in `config.yaml` in the selected training directory

5. Run the script, for example:
4. Run the script in the selected directory:

```bash
cd quickstart/scripts
bash train_ddp_t2i.sh
bash start_train.sh
```

## Load Fine-tuned Model

### LoRA

After fine-tuning with LoRA, you can load your trained weights during inference using the `--lora_model_id_or_path` option or parameter. For more details, please refer to the inference guide.
### Merge Checkpoint

### ZeRO

After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `quickstart/tools/converters` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
After fine-tuning, you need to use the `merge.py` script to merge the distributed checkpoint weights into a single checkpoint (**except for QLoRA fine-tuning**).
The script can be found in the `quickstart/tools/converters` directory.
For example:

```bash
cd quickstart/tools/converters
python zero2diffusers.py checkpoint_dir/ output_dir/ --bfloat16
python merge.py --checkpoint_dir ckpt/ --output_dir output_dir/
# Add --lora option if you are using LoRA fine-tuning
```

During inference, pass the `output_dir/` to the `--transformer_path` option or parameter. For more details, please refer to the inference guide.
### Load Checkpoint

You can pass the `output_dir` to the `--lora_model_id_or_path` option if you are using LoRA fine-tuning, or to the `--transformer_path` option if you are using FSDP fine-tuning. For more details, please refer to the inference guide.
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ dependencies = [
"pydantic~=2.10",
"sentencepiece==0.2.0",
"transformers~=4.49",
"wandb~=0.19.8",
"fastapi[standard]~=0.115.11",
"fastapi_cli~=0.0.7",
"openai~=1.67",
Expand All @@ -31,10 +30,10 @@ dependencies = [
[project.optional-dependencies]
finetune = [
"datasets~=3.4",
"deepspeed~=0.16.4",
"wandb~=0.19.8",
"av~=14.2.0",
"bitsandbytes~=0.45.4",
"tensorboard~=2.19",
"pyyaml>=6.0.2",
]

[project.urls]
Expand Down
26 changes: 0 additions & 26 deletions quickstart/configs/accelerate_config.yaml

This file was deleted.

38 changes: 0 additions & 38 deletions quickstart/configs/zero/zero2.yaml

This file was deleted.

42 changes: 0 additions & 42 deletions quickstart/configs/zero/zero2_offload.yaml

This file was deleted.

Loading