Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions docs/01-Intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ slug: /

# Introduction

CogKit is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.

## Supported Models

Expand All @@ -16,9 +16,3 @@ This repository has been tested in environments with `1×A100` and `8×A100` GPU

- Cog series models typically do not support `FP16` precision (Only `CogVideoX-2B` support); GPUs like the `V100` cannot be fine-tuned properly (Will cause `loss=nan` for example). At a minimum, an `A100` or other GPUs supporting `BF16` precision should be used.
- We have not yet systematically tested the minimum GPU memory requirements for each model. For `LORA(bs=1 with offload)`, a single `A100` GPU is sufficient. For `SFT`, our tests have passed in an `8×A100` environment.

## Roadmap

- [ ] Add support for CogView4 ControlNet model
- [ ] Docker Image for easy deployment
- [ ] Embedding Cache to Reduce GPU Memory Usage
23 changes: 13 additions & 10 deletions docs/02-Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,30 @@
## Requirements

- Python 3.10 or higher
- PyTorch
- PyTorch, OpenCV, decord

## Installation Steps

### PyTorch

Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.

### CogKit
### OpenCV

Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`

1. Install `cogkit`:
### decord

```bash
pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
```
Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`

### CogKit

2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
Install `cogkit` from github source:

```bash
pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
```

```bash
pip install "cogkit[video]@git+https://github.com/THUDM/cogkit.git"
```

### Verify installation

Expand Down
11 changes: 4 additions & 7 deletions docs/03-Inference/01-CLI.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
---

<!-- TODO: check this doc -->
# Command-Line Interface

CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
Expand Down Expand Up @@ -36,28 +35,26 @@ cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
cogkit inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"
```

<!-- TODO: Add example for i2v -->

:::tip
See `cogkit inference --help` for more information.
:::

<!-- TODO: add docs for launch server -->
## Launch Command

The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:

<!-- FIXME: check url -->
```bash
pip install "cogkit[api]@git+https://github.com/THUDM/cogkit.git"
pip install "cogkit[api]@git+https://github.com/THUDM/CogKit.git"
```

<!-- FIXME: correct url -->
<!-- FIXME: check url -->
Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.

To configure the model paths:

1. Create a `.env` file in your working directory
2. Refer to the [environment template]() and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
2. Refer to the [environment template](https://github.com/THUDM/CogKit/blob/main/.env.template) and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:

```bash
# /your/workdir/.env
Expand Down
3 changes: 1 addition & 2 deletions docs/03-Inference/02-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,4 @@ video = generate_video(
video.save("cat_video.mp4")
```

<!-- FIXME: correct url -->
See function signatures in [generation.py]() for more details.
See function signatures in for more details.
30 changes: 17 additions & 13 deletions docs/04-Finetune/02-Quick Start.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@

## Setup

Please refer to the [installation guide](../02-Installation.md) to setup your environment
* Please refer to the [installation guide](../02-Installation.md) to setup your environment first

<!-- TODO: clone the repo to finetune? clone -->
<!-- FIXME: change to THUDM later -->
* We provide various training scripts and example datasets in the `CogKit/quickstart/` directory, so you need to clone our repository before training:

```bash
git clone https://github.com/zRzRzRzRzRzRzR/CogKit.git
```

## Data

Expand All @@ -16,23 +21,22 @@ Before fine-tuning, you need to prepare your dataset according to the expected f
We recommend that you read the corresponding [model card](../05-Model%20Card.mdx) before starting training to follow the parameter settings requirements and fine-tuning best practices
:::

<!-- TODO: move training script to cli folder? -->
<!-- TODO: add link to corresponding folder -->
1. Navigate to the `src/cogkit/finetune/diffusion` directory
1. Navigate to the `CogKit/` directory after cloning the repository
```bash
cd CogKit/
```

<!-- TODO: add link to training script folder -->
<!-- TODO: add link to train_ddp_t2i.sh -->
2. Choose the appropriate training script from the `scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
2. Choose the appropriate training script from the `quickstart/scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task

3. Review and adjust the parameters in the selected training script (e.g., `--data_root`, `--output_dir`, etc.)

<!-- TODO: add link to accelerate config -->
4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs
4. [Optional] If you are using ZeRO strategy, refer to `quickstart/configs/accelerate_config.yaml` to confirm your ZeRO config file and number of GPUs.

5. Run the script, for example:

```bash
bash scripts/train_ddp_t2i.sh
cd quickstart/scripts
bash train_ddp_t2i.sh
```

## Load Fine-tuned Model
Expand All @@ -43,10 +47,10 @@ After fine-tuning with LoRA, you can load your trained weights during inference

### ZeRO

After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `scripts` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `quickstart/tools/converters` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:

<!-- FIXME: path to zero2diffusers.py? -->
```bash
cd quickstart/tools/converters
python zero2diffusers.py checkpoint_dir/ output_dir/ --bfloat16
```

Expand Down
3 changes: 2 additions & 1 deletion docs/04-Finetune/03-Data Format.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Dataset Format

<!-- TODO: add link to data dir -->
`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
`CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:

## Text-to-Image Conversion Dataset (t2i)

Expand Down Expand Up @@ -42,4 +42,5 @@
## Notes

- Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance

- Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.