You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/01-Intro.md
+1-7Lines changed: 1 addition & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ slug: /
4
4
5
5
# Introduction
6
6
7
-
CogKit is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
7
+
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
8
8
9
9
## Supported Models
10
10
@@ -16,9 +16,3 @@ This repository has been tested in environments with `1×A100` and `8×A100` GPU
16
16
17
17
- Cog series models typically do not support `FP16` precision (Only `CogVideoX-2B` support); GPUs like the `V100` cannot be fine-tuned properly (Will cause `loss=nan` for example). At a minimum, an `A100` or other GPUs supporting `BF16` precision should be used.
18
18
- We have not yet systematically tested the minimum GPU memory requirements for each model. For `LORA(bs=1 with offload)`, a single `A100` GPU is sufficient. For `SFT`, our tests have passed in an `8×A100` environment.
Copy file name to clipboardExpand all lines: docs/02-Installation.md
+13-10Lines changed: 13 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,27 +6,30 @@
6
6
## Requirements
7
7
8
8
- Python 3.10 or higher
9
-
- PyTorch
9
+
- PyTorch, OpenCV, decord
10
10
11
11
## Installation Steps
12
12
13
13
### PyTorch
14
14
15
15
Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.
16
16
17
-
### CogKit
17
+
### OpenCV
18
+
19
+
Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`
Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`
24
+
25
+
### CogKit
24
26
25
-
2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
Copy file name to clipboardExpand all lines: docs/03-Inference/01-CLI.md
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
---
2
2
---
3
3
4
-
<!-- TODO: check this doc -->
5
4
# Command-Line Interface
6
5
7
6
CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
@@ -36,28 +35,26 @@ cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
36
35
cogkit inference "a cat playing with a ball""THUDM/CogVideoX1.5-5B"
37
36
```
38
37
39
-
<!-- TODO: Add example for i2v -->
40
-
41
38
:::tip
42
39
See `cogkit inference --help` for more information.
43
40
:::
44
41
45
-
<!-- TODO: add docs for launch server -->
46
42
## Launch Command
47
43
48
44
The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
56
53
57
54
To configure the model paths:
58
55
59
56
1. Create a `.env` file in your working directory
60
-
2. Refer to the [environment template]() and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
57
+
2. Refer to the [environment template](https://github.com/THUDM/CogKit/blob/main/.env.template) and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
@@ -16,23 +21,22 @@ Before fine-tuning, you need to prepare your dataset according to the expected f
16
21
We recommend that you read the corresponding [model card](../05-Model%20Card.mdx) before starting training to follow the parameter settings requirements and fine-tuning best practices
17
22
:::
18
23
19
-
<!-- TODO: move training script to cli folder? -->
20
-
<!-- TODO: add link to corresponding folder -->
21
-
1. Navigate to the `src/cogkit/finetune/diffusion` directory
24
+
1. Navigate to the `CogKit/` directory after cloning the repository
25
+
```bash
26
+
cd CogKit/
27
+
```
22
28
23
-
<!-- TODO: add link to training script folder -->
24
-
<!-- TODO: add link to train_ddp_t2i.sh -->
25
-
2. Choose the appropriate training script from the `scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
29
+
2. Choose the appropriate training script from the `quickstart/scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
26
30
27
31
3. Review and adjust the parameters in the selected training script (e.g., `--data_root`, `--output_dir`, etc.)
28
32
29
-
<!-- TODO: add link to accelerate config -->
30
-
4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs
33
+
4.[Optional] If you are using ZeRO strategy, refer to `quickstart/configs/accelerate_config.yaml` to confirm your ZeRO config file and number of GPUs.
31
34
32
35
5. Run the script, for example:
33
36
34
37
```bash
35
-
bash scripts/train_ddp_t2i.sh
38
+
cd quickstart/scripts
39
+
bash train_ddp_t2i.sh
36
40
```
37
41
38
42
## Load Fine-tuned Model
@@ -43,10 +47,10 @@ After fine-tuning with LoRA, you can load your trained weights during inference
43
47
44
48
### ZeRO
45
49
46
-
After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `scripts` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
50
+
After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `quickstart/tools/converters` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
Copy file name to clipboardExpand all lines: docs/04-Finetune/03-Data Format.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
# Dataset Format
5
5
6
6
<!-- TODO: add link to data dir -->
7
-
`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
7
+
`CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
8
8
9
9
## Text-to-Image Conversion Dataset (t2i)
10
10
@@ -42,4 +42,5 @@
42
42
## Notes
43
43
44
44
- Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance
45
+
45
46
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.
0 commit comments