[docs] Pre-release

OleehyO · OleehyO · commit 6c7761926dc8 · 2025-03-14T07:48:24.000Z
diff --git a/docs/01-Intro.md b/docs/01-Intro.md
@@ -5,6 +5,7 @@ slug: /
 # Introduction
 Cogkit is a powerful framework for working with cognitive AI models, focusing on multimodal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
 
+<!-- TODO: key features? -->
 ## Key Features
 
 - **Command-line Interface**: Easy-to-use CLI and Python API for both inference and fine-tuning
@@ -19,12 +20,3 @@ Please refer to the [Model Card](./05-Model%20Card.mdx) for more details.
 
 <!-- FIXME: add link to the issues pages -->
 For more detailed troubleshooting (bug related issues), please refer to our GitHub issues page.
-
-<!-- FIXME: add link to wechat? discord? or github disccussions? -->
-For general discussions and support, please join our [Discord server](https://discord.gg/cogmodels).
-
-## License
-
-<!-- FIXME: LICENSE file is not present in the repo -->
-
-<!-- Cogkit is licensed under the [MIT License](./LICENSE). -->
diff --git a/docs/02-Installation.md b/docs/02-Installation.md
@@ -32,20 +32,14 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
 ### Install Cogkit
 <!-- FIXME: Install via pip install cogkit or via clone&local install? -->
 
-1. Clone the repository:
+1. Install Cogkit:
 
-   <!-- FIXME: add link to the repo -->
+   <!-- TODO: add github link -->
    ```bash
-   git clone https://github.com/yourusername/cogkit.git
+   pip install cogkit@git+https:
    ```
 
-2. Install Cogkit:
-
-   ```bash
-   pip install -e .
-   ```
-
-3. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
+2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
 
    ```bash
    pip install -e .[video]
@@ -57,9 +51,10 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
 You can verify that cogkit is installed correctly by running:
 
 ```bash
-python -c "import cogkit; print(cogkit.__version__)"
+python -c "import cogkit"
 ```
 
+<!-- TODO: add in roadmap -->
 ## [Optional] Install via docker
 
 If you have any issues with the installation, you can install Cogkit via Docker. We provide a Docker image that includes all dependencies. You can pull the image from Docker Hub:
diff --git a/docs/03-Inference/01-CLI.md b/docs/03-Inference/01-CLI.md
@@ -34,15 +34,15 @@ cogmodels inference [OPTIONS] PROMPT MODEL_ID_OR_PATH
 
 ### Examples
 
+<!-- FIXME: Add example for i2v -->
+
 ```bash
 # Generate an image from text
-cogmodels inference "a beautiful sunset over mountains" runwayml/stable-diffusion-v1-5 --task t2i
+cogmodels inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
 
 # Generate a video from text
-cogmodels inference "a cat playing with a ball" stabilityai/stable-video-diffusion-img2vid --task t2v
+cogmodels inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"
 
-# Generate a video from an image
-cogmodels inference "extend this image into a video" stabilityai/stable-video-diffusion-img2vid --task i2v --image_file input.png
 ```
 
 <!-- FIXME: remove this? -->
@@ -56,6 +56,7 @@ cogmodels finetune [OPTIONS]
 
 > Note: The fine-tuning command is currently under development. Please check back for updates.
 
+<!-- TODO: add docs for launch server -->
 ## Launch Command
 
 The `launch` command starts a web UI for interactive use:
@@ -84,34 +85,4 @@ This launches a web interface where you can:
 # Launch the web UI on the default port
 cogmodels launch
 
-# Launch the web UI with a public URL
-cogmodels launch --share
-```
-
-## Logging and Debugging
-
-CogModels CLI provides different verbosity levels for logging:
-
-```bash
-# Normal output
-cogmodels inference "prompt" model_id
-
-# Verbose output (info level)
-cogmodels -v inference "prompt" model_id
-
-# Very verbose output (debug level)
-cogmodels -vv inference "prompt" model_id
-```
-
-## Environment Variables
-
-The CLI behavior can be modified with environment variables:
-
-- `COGMODELS_CACHE_DIR`: Directory to store cached models and data
-- `COGMODELS_OFFLINE`: Set to "1" to run in offline mode
-- `COGMODELS_VERBOSE`: Set verbosity level (0-2)
-
-Example:
-```bash
-COGMODELS_CACHE_DIR=/path/to/cache cogmodels inference "prompt" model_id
 ```
diff --git a/docs/03-Inference/02-API.md b/docs/03-Inference/02-API.md
@@ -15,24 +15,26 @@ from cogkit.generation import generate_image, generate_video
 # Text-to-Image generation
 image = generate_image(
     prompt="a beautiful sunset over mountains",
-    model_id_or_path="runwayml/stable-diffusion-v1-5",
-    num_inference_steps=50,
-    seed=42
+    model_id_or_path="THUDM/CogView4-6B",
+    lora_model_id_or_path=None,
+    transformer_path=None,
 )
 image.save("sunset.png")
 
 # Text-to-Video generation
 video = generate_video(
     prompt="a cat playing with a ball",
-    model_id_or_path="stabilityai/stable-video-diffusion-img2vid",
+    model_id_or_path="THUDM/CogVideoX1.5-5B",
+    lora_model_id_or_path=None,
+    transformer_path=None,
     num_frames=81,
     fps=16,
-    num_inference_steps=50,
-    seed=42
 )
 video.save("cat_video.mp4")
 ```
 
 ## API Server
 
 <!-- FIXME: add docs for the API server -->
+
+<!-- TODO: add examples -->
diff --git a/docs/04-Finetune/01-Prerequisites.mdx b/docs/04-Finetune/01-Prerequisites.mdx
@@ -129,7 +129,6 @@ Before starting fine-tuning, please ensure your machine meets the minimum hardwa
 
 ## CogView Series
 
-{/* <!-- TODO: add table for Cogview Series --> */}
 <table style={{ textAlign: "center" }}>
   <thead>
     <tr>
diff --git a/docs/04-Finetune/02-Quick Start.md b/docs/04-Finetune/02-Quick Start.md
@@ -4,14 +4,12 @@
 
 Please refer to the [installation guide](../02-Installation.md) to setup your environment
 
-<!-- TODO: clone the repo to finetune? -->
+<!-- TODO: clone the repo to finetune? clone -->
 
 ## Data
 
 Before fine-tuning, you need to prepare your dataset according to the expected format. See the [data format](./03-Data%20Format.md) documentation for details on how to structure your data
 
-<!-- TODO: add link to data format-->
-
 ## Training
 
 :::info
@@ -39,12 +37,17 @@ We recommend that you read the corresponding [model card](../05-Model%20Card.mdx
 
 ## Load Fine-tuned Model
 
-<!-- TODO: 缺一个合并zero权重的脚本（合并后只有一个transformer的权重，让用户自己把这个权重
-    替换到pipeline文件里，还是cli/api里直接提供一个transformer的权重路径？） -->
-
 ### LoRA
 
-After fine-tuning with LoRA, you can load your trained weights during inference using the `--lora_model_id_or_path` parameter. For more details, please refer to the inference guide.
-
+After fine-tuning with LoRA, you can load your trained weights during inference using the `--lora_model_id_or_path` option or parameter. For more details, please refer to the inference guide.
 
 ### ZeRO
+
+After fine-tuning with ZeRO strategy, you need to use the `zero_to_fp32.py` script provided in the `scripts` directory to convert the ZeRO checkpoint weights into Diffusers format. For example:
+
+<!-- FIXME: path to zero2diffusers.py? -->
+```bash
+python zero2diffusers.py checkpoint_dir/ output_dir/ --bfloat16
+```
+
+During inference, pass the `output_dir/` to the `--transformer_path` option or parameter. For more details, please refer to the inference guide.
diff --git a/src/cogmodels/finetune/data/README.md b/src/cogmodels/finetune/data/README.md
diff --git a/src/cogmodels/finetune/diffusion/models/cogvideo/README.md b/src/cogmodels/finetune/diffusion/models/cogvideo/README.md
diff --git a/src/cogmodels/finetune/diffusion/models/cogview/README.md b/src/cogmodels/finetune/diffusion/models/cogview/README.md