Merge pull request #7 from zRzRzRzRzRzRzR/docs-finetune

zRzRzRzRzRzRzR · web-flow · commit aedd37e825f6 · 2025-03-21T17:48:29.000+08:00
[docs] Update finetune documentation and related guides
diff --git a/docs/01-Intro.md b/docs/01-Intro.md
@@ -4,7 +4,7 @@ slug: /
 
 # Introduction
 
-CogKit is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
+CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
 
 ## Supported Models
 
@@ -16,9 +16,3 @@ This repository has been tested in environments with `1×A100` and `8×A100` GPU
 
 - Cog series models typically do not support `FP16` precision (Only `CogVideoX-2B` support); GPUs like the `V100` cannot be fine-tuned properly (Will cause `loss=nan` for example). At a minimum, an `A100` or other GPUs supporting `BF16` precision should be used.
 - We have not yet systematically tested the minimum GPU memory requirements for each model. For `LORA(bs=1 with offload)`, a single `A100` GPU is sufficient. For `SFT`, our tests have passed in an `8×A100` environment.
-
-## Roadmap
-
-- [ ] Add support for CogView4 ControlNet model
-- [ ] Docker Image for easy deployment
-- [ ] Embedding Cache to Reduce GPU Memory Usage
diff --git a/docs/02-Installation.md b/docs/02-Installation.md
@@ -6,27 +6,30 @@
 ## Requirements
 
 - Python 3.10 or higher
-- PyTorch
+- PyTorch, OpenCV, decord
 
 ## Installation Steps
 
 ### PyTorch
 
 Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.
 
-### CogKit
+### OpenCV
+
+Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`
 
-1. Install `cogkit`:
+### decord
 
-   ```bash
-   pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
-   ```
+Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`
+
+### CogKit
 
-2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
+Install `cogkit` from github source:
+
+```bash
+pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
+```
 
-   ```bash
-   pip install "cogkit[video]@git+https://github.com/THUDM/cogkit.git"
-   ```
 
 ### Verify installation
 
diff --git a/docs/03-Inference/01-CLI.md b/docs/03-Inference/01-CLI.md
@@ -1,7 +1,6 @@
 ---
 ---
 
-<!-- TODO: check this doc -->
 # Command-Line Interface
 
 CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
@@ -36,28 +35,26 @@ cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
 cogkit inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"
 ```
 
-<!-- TODO: Add example for i2v -->
-
 :::tip
 See `cogkit inference --help` for more information.
 :::
 
-<!-- TODO: add docs for launch server -->
 ## Launch Command
 
 The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
 
+<!-- FIXME: check url -->
 ```bash
-pip install "cogkit[api]@git+https://github.com/THUDM/cogkit.git"
+pip install "cogkit[api]@git+https://github.com/THUDM/CogKit.git"
 ```
 
-<!-- FIXME: correct url -->
+<!-- FIXME: check url -->
 Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
 
 To configure the model paths:
 
 1. Create a `.env` file in your working directory
-2. Refer to the [environment template]() and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
+2. Refer to the [environment template](https://github.com/THUDM/CogKit/blob/main/.env.template) and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
 
     ```bash
     # /your/workdir/.env
diff --git a/docs/03-Inference/02-API.md b/docs/03-Inference/02-API.md
@@ -36,5 +36,4 @@ video = generate_video(
 video.save("cat_video.mp4")
 ```
 
-<!-- FIXME: correct url -->
-See function signatures in [generation.py]() for more details.
+See function signatures in for more details.
diff --git a/docs/04-Finetune/02-Quick Start.md b/docs/04-Finetune/02-Quick Start.md
@@ -2,9 +2,14 @@
 
 ## Setup
 
-Please refer to the [installation guide](../02-Installation.md) to setup your environment
+* Please refer to the [installation guide](../02-Installation.md) to setup your environment first
 
-<!-- TODO: clone the repo to finetune? clone -->
+<!-- FIXME: change to THUDM later -->
+* We provide various training scripts and example datasets in the `CogKit/quickstart/` directory, so you need to clone our repository before training:
+
+   ```bash
+   git clone https://github.com/zRzRzRzRzRzRzR/CogKit.git
+   ```
 
 ## Data
 
@@ -16,23 +21,22 @@ Before fine-tuning, you need to prepare your dataset according to the expected f
 We recommend that you read the corresponding [model card](../05-Model%20Card.mdx) before starting training to follow the parameter settings requirements and fine-tuning best practices
 :::
 
-<!-- TODO: move training script to cli folder? -->
-<!-- TODO: add link to corresponding folder -->
-1. Navigate to the `src/cogkit/finetune/diffusion` directory
+1. Navigate to the `CogKit/` directory after cloning the repository
+   ```bash
+   cd CogKit/
+   ```
 
-<!-- TODO: add link to training script folder -->
-<!-- TODO: add link to train_ddp_t2i.sh -->
-2. Choose the appropriate training script from the `scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
+2. Choose the appropriate training script from the `quickstart/scripts` directory based on your task type and distribution strategy. For example, `train_ddp_t2i.sh` corresponds to DDP strategy + text-to-image task
 
 3. Review and adjust the parameters in the selected training script (e.g., `--data_root`, `--output_dir`, etc.)
 
-<!-- TODO: add link to accelerate config -->
-4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs
+4. [Optional] If you are using ZeRO strategy, refer to `quickstart/configs/accelerate_config.yaml` to confirm your ZeRO config file and number of GPUs.
 
 5. Run the script, for example:
 
    ```bash
-   bash scripts/train_ddp_t2i.sh
+   cd quickstart/scripts
+   bash train_ddp_t2i.sh
    ```
 
 ## Load Fine-tuned Model
@@ -43,10 +47,10 @@ After fine-tuning with LoRA, you can load your trained weights during inference
 
 ### ZeRO
 
-After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `scripts` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
+After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `quickstart/tools/converters` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
 
-<!-- FIXME: path to zero2diffusers.py? -->
 ```bash
+cd quickstart/tools/converters
 python zero2diffusers.py checkpoint_dir/ output_dir/ --bfloat16
 ```
 
diff --git a/docs/04-Finetune/03-Data Format.md b/docs/04-Finetune/03-Data Format.md
@@ -4,7 +4,7 @@
 # Dataset Format
 
 <!-- TODO: add link to data dir -->
-`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
+`CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
 
 ## Text-to-Image Conversion Dataset (t2i)
 
@@ -42,4 +42,5 @@
 ## Notes
 
 - Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance
+
 - Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.