THUDM
diff --git a/‎docs/02-Installation.md‎
Lines changed: 1 addition & 5 deletions b/‎docs/02-Installation.md‎
Lines changed: 1 addition & 5 deletions
diff --git a/‎docs/03-Inference/01-CLI.md‎
Lines changed: 1 addition & 9 deletions b/‎docs/03-Inference/01-CLI.md‎
Lines changed: 1 addition & 9 deletions
diff --git a/‎docs/03-Inference/02-API.md‎
Lines changed: 46 additions & 21 deletions b/‎docs/03-Inference/02-API.md‎
Lines changed: 46 additions & 21 deletions
diff --git a/‎docs/04-Finetune/02-Quick Start.md‎
Lines changed: 7 additions & 2 deletions b/‎docs/04-Finetune/02-Quick Start.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎docs/04-Finetune/03-Data Format.md‎
Lines changed: 26 additions & 13 deletions b/‎docs/04-Finetune/03-Data Format.md‎
Lines changed: 26 additions & 13 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 9 additions & 8 deletions b/‎pyproject.toml‎
Lines changed: 9 additions & 8 deletions
diff --git a/‎…ain/0bb5f6dbf8ed2e0060f0ac4164b24847.png‎ ‎…ges/0bb5f6dbf8ed2e0060f0ac4164b24847.png‎quickstart/data/i2v/train/0bb5f6dbf8ed2e0060f0ac4164b24847.png renamed to quickstart/data/i2v/train/images/0bb5f6dbf8ed2e0060f0ac4164b24847.png b/‎…ain/0bb5f6dbf8ed2e0060f0ac4164b24847.png‎ ‎…ges/0bb5f6dbf8ed2e0060f0ac4164b24847.png‎quickstart/data/i2v/train/0bb5f6dbf8ed2e0060f0ac4164b24847.png renamed to quickstart/data/i2v/train/images/0bb5f6dbf8ed2e0060f0ac4164b24847.png
diff --git a/‎…ain/1d50a3d9703f152758d5422c8b48010f.png‎ ‎…ges/1d50a3d9703f152758d5422c8b48010f.png‎quickstart/data/i2v/train/1d50a3d9703f152758d5422c8b48010f.png renamed to quickstart/data/i2v/train/images/1d50a3d9703f152758d5422c8b48010f.png b/‎…ain/1d50a3d9703f152758d5422c8b48010f.png‎ ‎…ges/1d50a3d9703f152758d5422c8b48010f.png‎quickstart/data/i2v/train/1d50a3d9703f152758d5422c8b48010f.png renamed to quickstart/data/i2v/train/images/1d50a3d9703f152758d5422c8b48010f.png
diff --git a/‎…ain/2c1ed5408882479b06681f7cf372916a.png‎ ‎…ges/2c1ed5408882479b06681f7cf372916a.png‎quickstart/data/i2v/train/2c1ed5408882479b06681f7cf372916a.png renamed to quickstart/data/i2v/train/images/2c1ed5408882479b06681f7cf372916a.png b/‎…ain/2c1ed5408882479b06681f7cf372916a.png‎ ‎…ges/2c1ed5408882479b06681f7cf372916a.png‎quickstart/data/i2v/train/2c1ed5408882479b06681f7cf372916a.png renamed to quickstart/data/i2v/train/images/2c1ed5408882479b06681f7cf372916a.png
diff --git a/‎…ain/3f0979e6cae25447f416372c49ad5e07.png‎ ‎…ges/3f0979e6cae25447f416372c49ad5e07.png‎quickstart/data/i2v/train/3f0979e6cae25447f416372c49ad5e07.png renamed to quickstart/data/i2v/train/images/3f0979e6cae25447f416372c49ad5e07.png b/‎…ain/3f0979e6cae25447f416372c49ad5e07.png‎ ‎…ges/3f0979e6cae25447f416372c49ad5e07.png‎quickstart/data/i2v/train/3f0979e6cae25447f416372c49ad5e07.png renamed to quickstart/data/i2v/train/images/3f0979e6cae25447f416372c49ad5e07.png
@@ -6,7 +6,7 @@
 ## Requirements
 
 - Python 3.10 or higher
-- PyTorch, OpenCV, decord
+- PyTorch, OpenCV
 
 ## Installation Steps
 
@@ -18,10 +18,6 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
 
 Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`
 
-### decord
-
-Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`
-
 ### CogKit
 
 Install `cogkit` from github source:
 
@@ -1,7 +1,7 @@
 ---
 ---
 
-# Command-Line Interface
+# CLI
 
 CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
 
@@ -44,14 +44,6 @@ See `cogkit inference --help` for more information.
 
 ## Launch Command
 
-The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
-
-<!-- FIXME: check url -->
-```bash
-pip install "cogkit[api]@git+https://github.com/THUDM/CogKit.git"
-```
-
-<!-- FIXME: check url -->
 Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
 
 To configure the model paths:
 
@@ -1,41 +1,66 @@
 ---
 ---
 
+<!-- TODO: refactor Python API as an unique document, (and redirect related chapters (like Quick_Start.md) to Python API document)-->
 # API
 
-CogKit provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
 
-## Python API
+<!-- TODO: List all supported oprations in Python API, rather than present as a demo -->
+## Python
 
-You can also use `cogkit` programmatically in your Python code:
+We provide a Python API for CogKit, including load and inference related operations.
 
 ```python
-from cogkit.generation import generate_image, generate_video
+import torch
+from PIL import Image
 
-# Text-to-Image generation
-image = generate_image(
-    prompt="a beautiful sunset over mountains",
-    model_id_or_path="THUDM/CogView4-6B",
-    lora_model_id_or_path=None,
+from cogkit import (
+    load_pipeline,
+    load_lora_checkpoint,
+    unload_lora_checkpoint,
+
+    generate_image,
+    generate_video,
+)
+from diffusers.utils import export_to_video
+
+
+model_id_or_path = "THUDM/CogView4-6B"  # t2i generation task, for example.
+pipeline = load_pipeline(
+    model_id_or_path,
     transformer_path=None,
-    output_file="sunset.png", # Images will be saved here.
+    dtype=torch.bfloat16,
+)
+
+###### [Optional] Load/Unload LoRA weights
+# lora_model_id_or_path = "/path/to/lora/checkpoint"
+# load_lora_checkpoint(pipeline, lora_model_id_or_path)
+# ...
+# unload_lora_checkpoint(pipeline)
+
+
+###### Text-to-Image generation
+batched_image = generate_image(
+    prompt="a beautiful sunset over mountains",
+    pipeline=pipeline,
     height=1024,
     width=1024,
+    output_type="pil",
 )
+batched_image[0].save("output.png")
 
 
-# Text/Image-to-Video generation
-video = generate_video(
+###### Text/Image-to-Video generation
+batched_video, fps = generate_video(
     prompt="a cat playing with a ball",
-    image_file="path/to/image.png",  # Needed for Image-to-Video task
-    model_id_or_path="THUDM/CogVideoX1.5-5B",
-    lora_model_id_or_path=None,
-    transformer_path=None,
-    output_file="cat.mp4", # Videos will be saved here.
-    num_frames=81,
-    fps=16,
+    pipeline=pipeline,
+    # input_image=Image.open("/path/to/image.png"),  # only for i2v generation
+    output_type="pil",
 )
-
+export_to_video(batched_video[0], "output.mp4", fps=fps)
 ```
 
-See function signatures in for more details.
+See function signatures for more details.
+
+<!-- TODO: Add documentation for API server endpoints -->
+## API Server Endpoints
@@ -4,8 +4,13 @@
 
 * Please refer to the [installation guide](../02-Installation.md) to setup your environment first
 
-<!-- FIXME: change to THUDM later -->
-* We provide various training scripts and example datasets in the `CogKit/quickstart` directory, so you need to clone our repository before training:
+* Install finetune dependencies:
+
+   ```bash
+   pip install "cogkit[finetune]@git+https://github.com/THUDM/CogKit.git"
+   ```
+
+* We provide various training scripts and example datasets in the `CogKit/quickstart` directory. Please clone the repository before training:
 
    ```bash
    git clone https://github.com/THUDM/CogKit.git
 
@@ -3,7 +3,6 @@
 
 # Dataset Format
 
-<!-- TODO: add link to data dir -->
 `CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
 
 ## Text-to-Image Conversion Dataset (t2i)
@@ -26,18 +25,32 @@
 
 ## Image-to-Video (i2v)
 
-- Each directory contains video files (`.mp4`) and **optional** corresponding image files (`.png`)
-- The `metadata.jsonl` file contains metadata information for each sample
-
-    ```json
-    {"file_name": "example.mp4", "id": 0, "prompt": "Detailed video description text..."}
-    {"file_name": "example.png", "id": 0}  // optional
-    ```
-
-    :::info
-  - Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
-  - When image files are provided, they are associated with the video file of the same name through the id field
-    :::
+- The dataset is organized with the following structure:
+  - `train/` and `test/` directories each containing:
+    - `videos/` directory for video files (`.mp4`)
+    - `images/` directory for input image files (`.png`)
+    - `metadata.jsonl` file in the root containing prompt descriptions
+
+- The main `metadata.jsonl` file in the root directory contains prompt information for each sample:
+  ```json
+  {"id": 0, "prompt": "Detailed video description text..."}
+  {"id": 1, "prompt": "Detailed video description text..."}
+  ```
+
+- The `videos/metadata.jsonl` file maps video files to their corresponding IDs:
+  ```json
+  {"file_name": "example.mp4", "id": 0}
+  ```
+
+- The `images/metadata.jsonl` file maps image files to their corresponding IDs:
+  ```json
+  {"file_name": "example.png", "id": 0}
+  ```
+
+:::info
+- Image and video files are linked by sharing the same ID
+- If image files are not provided, the system will default to using the first frame of the corresponding video as the input image
+:::
 
 ## Notes
 
 
@@ -11,8 +11,6 @@ authors = [{ name = 'ZhipuAI', email = '[email protected]' }]
 # maintainers = []
 dependencies = [
   "click~=8.1",
-  "datasets~=3.4",
-  "deepspeed~=0.16.4",
   "diffusers @ git+https://github.com/huggingface/diffusers.git",
   "imageio-ffmpeg~=0.6.0",
   "imageio~=2.37",
@@ -21,20 +19,23 @@ dependencies = [
   "sentencepiece==0.2.0",
   "transformers~=4.49",
   "wandb~=0.19.8",
-]
 
-[project.optional-dependencies]
-torch = ["numpy", "torch", "torchvision"]
-api = [
   "fastapi[standard]~=0.115.11",
   "fastapi_cli~=0.0.7",
   "openai~=1.67",
   "pydantic_settings~=2.8.1",
   "python-dotenv~=1.0",
 ]
 
-# TODO: adds project urls
-# [project.urls]
+[project.optional-dependencies]
+finetune = [
+  "datasets~=3.4",
+  "deepspeed~=0.16.4",
+  "av~=14.2.0",
+]
+
+[project.urls]
+"Repository" = "https://github.com/THUDM/CogKit"
 
 
 [project.scripts]