Skip to content

Commit aedd37e

Browse files
Merge pull request #7 from zRzRzRzRzRzRzR/docs-finetune
[docs] Update finetune documentation and related guides
2 parents 5ad85ab + 3f1f5d9 commit aedd37e

File tree

6 files changed

+38
-40
lines changed

6 files changed

+38
-40
lines changed

docs/01-Intro.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ slug: /
44

55
# Introduction
66

7-
CogKit is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
7+
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
88

99
## Supported Models
1010

@@ -16,9 +16,3 @@ This repository has been tested in environments with `1×A100` and `8×A100` GPU
1616

1717
- Cog series models typically do not support `FP16` precision (Only `CogVideoX-2B` support); GPUs like the `V100` cannot be fine-tuned properly (Will cause `loss=nan` for example). At a minimum, an `A100` or other GPUs supporting `BF16` precision should be used.
1818
- We have not yet systematically tested the minimum GPU memory requirements for each model. For `LORA(bs=1 with offload)`, a single `A100` GPU is sufficient. For `SFT`, our tests have passed in an `8×A100` environment.
19-
20-
## Roadmap
21-
22-
- [ ] Add support for CogView4 ControlNet model
23-
- [ ] Docker Image for easy deployment
24-
- [ ] Embedding Cache to Reduce GPU Memory Usage

docs/02-Installation.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,30 @@
66
## Requirements
77

88
- Python 3.10 or higher
9-
- PyTorch
9+
- PyTorch, OpenCV, decord
1010

1111
## Installation Steps
1212

1313
### PyTorch
1414

1515
Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.
1616

17-
### CogKit
17+
### OpenCV
18+
19+
Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`
1820

19-
1. Install `cogkit`:
21+
### decord
2022

21-
```bash
22-
pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
23-
```
23+
Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`
24+
25+
### CogKit
2426

25-
2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
27+
Install `cogkit` from github source:
28+
29+
```bash
30+
pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
31+
```
2632

27-
```bash
28-
pip install "cogkit[video]@git+https://github.com/THUDM/cogkit.git"
29-
```
3033

3134
### Verify installation
3235

docs/03-Inference/01-CLI.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
---
33

4-
<!-- TODO: check this doc -->
54
# Command-Line Interface
65

76
CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
@@ -36,28 +35,26 @@ cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
3635
cogkit inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"
3736
```
3837

39-
<!-- TODO: Add example for i2v -->
40-
4138
:::tip
4239
See `cogkit inference --help` for more information.
4340
:::
4441

45-
<!-- TODO: add docs for launch server -->
4642
## Launch Command
4743

4844
The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
4945

46+
<!-- FIXME: check url -->
5047
```bash
51-
pip install "cogkit[api]@git+https://github.com/THUDM/cogkit.git"
48+
pip install "cogkit[api]@git+https://github.com/THUDM/CogKit.git"
5249
```
5350

54-
<!-- FIXME: correct url -->
51+
<!-- FIXME: check url -->
5552
Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
5653

5754
To configure the model paths:
5855

5956
1. Create a `.env` file in your working directory
60-
2. Refer to the [environment template]() and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
57+
2. Refer to the [environment template](https://github.com/THUDM/CogKit/blob/main/.env.template) and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
6158

6259
```bash
6360
# /your/workdir/.env

docs/03-Inference/02-API.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,4 @@ video = generate_video(
3636
video.save("cat_video.mp4")
3737
```
3838

39-
<!-- FIXME: correct url -->
40-
See function signatures in [generation.py]() for more details.
39+
See function signatures in for more details.

docs/04-Finetune/02-Quick Start.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,14 @@
22

33
## Setup
44

5-
Please refer to the [installation guide](../02-Installation.md) to setup your environment
5+
* Please refer to the [installation guide](../02-Installation.md) to setup your environment first
66

7-
<!-- TODO: clone the repo to finetune? clone -->
7+
<!-- FIXME: change to THUDM later -->
8+
* We provide various training scripts and example datasets in the `CogKit/quickstart/` directory, so you need to clone our repository before training:
9+
10+
```bash
11+
git clone https://github.com/zRzRzRzRzRzRzR/CogKit.git
12+
```
813

914
## Data
1015

@@ -16,23 +21,22 @@ Before fine-tuning, you need to prepare your dataset according to the expected f
1621
We recommend that you read the corresponding [model card](../05-Model%20Card.mdx) before starting training to follow the parameter settings requirements and fine-tuning best practices
1722
:::
1823

19-
<!-- TODO: move training script to cli folder? -->
20-
<!-- TODO: add link to corresponding folder -->
21-
1. Navigate to the `src/cogkit/finetune/diffusion` directory
24+
1. Navigate to the `CogKit/` directory after cloning the repository
25+
```bash
26+
cd CogKit/
27+
```
2228

23-
<!-- TODO: add link to training script folder -->
24-
<!-- TODO: add link to train_ddp_t2i.sh -->
25-
2. Choose the appropriate training script from the `scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
29+
2. Choose the appropriate training script from the `quickstart/scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
2630

2731
3. Review and adjust the parameters in the selected training script (e.g., `--data_root`, `--output_dir`, etc.)
2832

29-
<!-- TODO: add link to accelerate config -->
30-
4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs
33+
4. [Optional] If you are using ZeRO strategy, refer to `quickstart/configs/accelerate_config.yaml` to confirm your ZeRO config file and number of GPUs.
3134

3235
5. Run the script, for example:
3336

3437
```bash
35-
bash scripts/train_ddp_t2i.sh
38+
cd quickstart/scripts
39+
bash train_ddp_t2i.sh
3640
```
3741

3842
## Load Fine-tuned Model
@@ -43,10 +47,10 @@ After fine-tuning with LoRA, you can load your trained weights during inference
4347

4448
### ZeRO
4549

46-
After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `scripts` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
50+
After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `quickstart/tools/converters` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
4751

48-
<!-- FIXME: path to zero2diffusers.py? -->
4952
```bash
53+
cd quickstart/tools/converters
5054
python zero2diffusers.py checkpoint_dir/ output_dir/ --bfloat16
5155
```
5256

docs/04-Finetune/03-Data Format.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# Dataset Format
55

66
<!-- TODO: add link to data dir -->
7-
`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
7+
`CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
88

99
## Text-to-Image Conversion Dataset (t2i)
1010

@@ -42,4 +42,5 @@
4242
## Notes
4343

4444
- Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance
45+
4546
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.

0 commit comments

Comments
 (0)