Skip to content

Commit 8d7fe91

Browse files
Merge pull request #17 from THUDM/dev
Refactor | Fix | Deps | Docs
2 parents 390ac3b + e71185d commit 8d7fe91

36 files changed

+225
-147
lines changed

docs/02-Installation.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
## Requirements
77

88
- Python 3.10 or higher
9-
- PyTorch, OpenCV, decord
9+
- PyTorch, OpenCV
1010

1111
## Installation Steps
1212

@@ -18,10 +18,6 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
1818

1919
Please refer to the [OpenCV installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) to install opencv-python. In most cases, you can simply install by `pip install opencv-python-headless`
2020

21-
### decord
22-
23-
Please refer to the [decord installation guide](https://github.com/dmlc/decord?tab=readme-ov-file#installation) to install decord dependencies. If you don't need GPU acceleration, you can simply install by `pip install decord`
24-
2521
### CogKit
2622

2723
Install `cogkit` from github source:

docs/03-Inference/01-CLI.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
---
33

4-
# Command-Line Interface
4+
# CLI
55

66
CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
77

@@ -44,14 +44,6 @@ See `cogkit inference --help` for more information.
4444

4545
## Launch Command
4646

47-
The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
48-
49-
<!-- FIXME: check url -->
50-
```bash
51-
pip install "cogkit[api]@git+https://github.com/THUDM/CogKit.git"
52-
```
53-
54-
<!-- FIXME: check url -->
5547
Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
5648

5749
To configure the model paths:

docs/03-Inference/02-API.md

Lines changed: 46 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,66 @@
11
---
22
---
33

4+
<!-- TODO: refactor Python API as an unique document, (and redirect related chapters (like Quick_Start.md) to Python API document)-->
45
# API
56

6-
CogKit provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
77

8-
## Python API
8+
<!-- TODO: List all supported oprations in Python API, rather than present as a demo -->
9+
## Python
910

10-
You can also use `cogkit` programmatically in your Python code:
11+
We provide a Python API for CogKit, including load and inference related operations.
1112

1213
```python
13-
from cogkit.generation import generate_image, generate_video
14+
import torch
15+
from PIL import Image
1416

15-
# Text-to-Image generation
16-
image = generate_image(
17-
prompt="a beautiful sunset over mountains",
18-
model_id_or_path="THUDM/CogView4-6B",
19-
lora_model_id_or_path=None,
17+
from cogkit import (
18+
load_pipeline,
19+
load_lora_checkpoint,
20+
unload_lora_checkpoint,
21+
22+
generate_image,
23+
generate_video,
24+
)
25+
from diffusers.utils import export_to_video
26+
27+
28+
model_id_or_path = "THUDM/CogView4-6B" # t2i generation task, for example.
29+
pipeline = load_pipeline(
30+
model_id_or_path,
2031
transformer_path=None,
21-
output_file="sunset.png", # Images will be saved here.
32+
dtype=torch.bfloat16,
33+
)
34+
35+
###### [Optional] Load/Unload LoRA weights
36+
# lora_model_id_or_path = "/path/to/lora/checkpoint"
37+
# load_lora_checkpoint(pipeline, lora_model_id_or_path)
38+
# ...
39+
# unload_lora_checkpoint(pipeline)
40+
41+
42+
###### Text-to-Image generation
43+
batched_image = generate_image(
44+
prompt="a beautiful sunset over mountains",
45+
pipeline=pipeline,
2246
height=1024,
2347
width=1024,
48+
output_type="pil",
2449
)
50+
batched_image[0].save("output.png")
2551

2652

27-
# Text/Image-to-Video generation
28-
video = generate_video(
53+
###### Text/Image-to-Video generation
54+
batched_video, fps = generate_video(
2955
prompt="a cat playing with a ball",
30-
image_file="path/to/image.png", # Needed for Image-to-Video task
31-
model_id_or_path="THUDM/CogVideoX1.5-5B",
32-
lora_model_id_or_path=None,
33-
transformer_path=None,
34-
output_file="cat.mp4", # Videos will be saved here.
35-
num_frames=81,
36-
fps=16,
56+
pipeline=pipeline,
57+
# input_image=Image.open("/path/to/image.png"), # only for i2v generation
58+
output_type="pil",
3759
)
38-
60+
export_to_video(batched_video[0], "output.mp4", fps=fps)
3961
```
4062

41-
See function signatures in for more details.
63+
See function signatures for more details.
64+
65+
<!-- TODO: Add documentation for API server endpoints -->
66+
## API Server Endpoints

docs/04-Finetune/02-Quick Start.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,13 @@
44

55
* Please refer to the [installation guide](../02-Installation.md) to setup your environment first
66

7-
<!-- FIXME: change to THUDM later -->
8-
* We provide various training scripts and example datasets in the `CogKit/quickstart` directory, so you need to clone our repository before training:
7+
* Install finetune dependencies:
8+
9+
```bash
10+
pip install "cogkit[finetune]@git+https://github.com/THUDM/CogKit.git"
11+
```
12+
13+
* We provide various training scripts and example datasets in the `CogKit/quickstart` directory. Please clone the repository before training:
914

1015
```bash
1116
git clone https://github.com/THUDM/CogKit.git

docs/04-Finetune/03-Data Format.md

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33

44
# Dataset Format
55

6-
<!-- TODO: add link to data dir -->
76
`CogKit/quickstart/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
87

98
## Text-to-Image Conversion Dataset (t2i)
@@ -26,18 +25,32 @@
2625

2726
## Image-to-Video (i2v)
2827

29-
- Each directory contains video files (`.mp4`) and **optional** corresponding image files (`.png`)
30-
- The `metadata.jsonl` file contains metadata information for each sample
31-
32-
```json
33-
{"file_name": "example.mp4", "id": 0, "prompt": "Detailed video description text..."}
34-
{"file_name": "example.png", "id": 0} // optional
35-
```
36-
37-
:::info
38-
- Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
39-
- When image files are provided, they are associated with the video file of the same name through the id field
40-
:::
28+
- The dataset is organized with the following structure:
29+
- `train/` and `test/` directories each containing:
30+
- `videos/` directory for video files (`.mp4`)
31+
- `images/` directory for input image files (`.png`)
32+
- `metadata.jsonl` file in the root containing prompt descriptions
33+
34+
- The main `metadata.jsonl` file in the root directory contains prompt information for each sample:
35+
```json
36+
{"id": 0, "prompt": "Detailed video description text..."}
37+
{"id": 1, "prompt": "Detailed video description text..."}
38+
```
39+
40+
- The `videos/metadata.jsonl` file maps video files to their corresponding IDs:
41+
```json
42+
{"file_name": "example.mp4", "id": 0}
43+
```
44+
45+
- The `images/metadata.jsonl` file maps image files to their corresponding IDs:
46+
```json
47+
{"file_name": "example.png", "id": 0}
48+
```
49+
50+
:::info
51+
- Image and video files are linked by sharing the same ID
52+
- If image files are not provided, the system will default to using the first frame of the corresponding video as the input image
53+
:::
4154

4255
## Notes
4356

pyproject.toml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@ authors = [{ name = 'ZhipuAI', email = '[email protected]' }]
1111
# maintainers = []
1212
dependencies = [
1313
"click~=8.1",
14-
"datasets~=3.4",
15-
"deepspeed~=0.16.4",
1614
"diffusers @ git+https://github.com/huggingface/diffusers.git",
1715
"imageio-ffmpeg~=0.6.0",
1816
"imageio~=2.37",
@@ -21,20 +19,23 @@ dependencies = [
2119
"sentencepiece==0.2.0",
2220
"transformers~=4.49",
2321
"wandb~=0.19.8",
24-
]
2522

26-
[project.optional-dependencies]
27-
torch = ["numpy", "torch", "torchvision"]
28-
api = [
2923
"fastapi[standard]~=0.115.11",
3024
"fastapi_cli~=0.0.7",
3125
"openai~=1.67",
3226
"pydantic_settings~=2.8.1",
3327
"python-dotenv~=1.0",
3428
]
3529

36-
# TODO: adds project urls
37-
# [project.urls]
30+
[project.optional-dependencies]
31+
finetune = [
32+
"datasets~=3.4",
33+
"deepspeed~=0.16.4",
34+
"av~=14.2.0",
35+
]
36+
37+
[project.urls]
38+
"Repository" = "https://github.com/THUDM/CogKit"
3839

3940

4041
[project.scripts]

quickstart/data/i2v/train/0bb5f6dbf8ed2e0060f0ac4164b24847.png renamed to quickstart/data/i2v/train/images/0bb5f6dbf8ed2e0060f0ac4164b24847.png

File renamed without changes.

quickstart/data/i2v/train/1d50a3d9703f152758d5422c8b48010f.png renamed to quickstart/data/i2v/train/images/1d50a3d9703f152758d5422c8b48010f.png

File renamed without changes.

quickstart/data/i2v/train/2c1ed5408882479b06681f7cf372916a.png renamed to quickstart/data/i2v/train/images/2c1ed5408882479b06681f7cf372916a.png

File renamed without changes.

quickstart/data/i2v/train/3f0979e6cae25447f416372c49ad5e07.png renamed to quickstart/data/i2v/train/images/3f0979e6cae25447f416372c49ad5e07.png

File renamed without changes.

0 commit comments

Comments
 (0)