Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
9c5021d
feat: implements finetune API
OleehyO Mar 17, 2025
4eaabc2
Merge branch 'dev-lhy' into 'dev'
Mar 17, 2025
23bd035
feat: adds unit testing
Mar 17, 2025
6fbdf61
Merge branch 'dev-hxw' into 'dev'
Mar 17, 2025
af861b9
Merge remote-tracking branch 'test/main' into refactor
zhangch9 Mar 18, 2025
c1ed583
nit: resolves conflicts
zhangch9 Mar 18, 2025
8e47011
nit: `CogKit` -> `cogkit`
zhangch9 Mar 18, 2025
02abb3c
nit: adds params checking
zhangch9 Mar 18, 2025
714feb4
fix: failed to import `LoraBaseMixin`
zhangch9 Mar 18, 2025
511323f
[fix] Correct logical error
OleehyO Mar 18, 2025
6817196
[feat] Add auto-inference for num_frames and fps with resolution vali…
OleehyO Mar 18, 2025
236050a
[docs] Align function docstring with CLI documentation
OleehyO Mar 18, 2025
80d6f14
[WIP][docs] Wait for fix
OleehyO Mar 18, 2025
5572ef4
Merge branch 'main' into refactor
zRzRzRzRzRzRzR Mar 18, 2025
7d4d302
feat: implements openai-compatible image generation API
zhangch9 Mar 19, 2025
183c021
add guess validate dimensions (#3)
sixsixcoder Mar 19, 2025
a454a2e
chore: cleanup
zhangch9 Mar 19, 2025
9ed13d9
add image generation api
Mar 19, 2025
421aed2
add image generation api
sixsixcoder Mar 19, 2025
11f5e0e
add image generation api
sixsixcoder Mar 19, 2025
b18db86
del route response_class
sixsixcoder Mar 19, 2025
f3b8576
chore: updates pyproject metadata
zhangch9 Mar 19, 2025
15089b5
Merge remote-tracking branch 'test/main' into refactor
zhangch9 Mar 19, 2025
634c835
nit: removes unused files
zhangch9 Mar 19, 2025
cf91ec8
nit: adds `video` deps
zhangch9 Mar 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
COGVIEW4_PATH=THUDM/CogView4-6B
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,13 @@ Temporary Items

# End of https://www.toptal.com/developers/gitignore/api/macos

# * hatch-vcs
src/cogkit/_version.py

# * pdm
.pdm-python

# * a temporary directory to store files you do not wish to share.
tmp/

**/*.safetensor
Expand All @@ -254,7 +260,6 @@ tmp/
**/*foo*
**/train_result

**/uv.lock

webdoc/
**/wandb/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## Introduction

**CogKit** is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
**`cogkit`** is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.

Visit our [**Docs**](https://thudum.github.io/CogKit) to start.
Visit our [**Docs**](https://thudm.github.io/CogKit) to start.

## Features

Expand Down
5 changes: 3 additions & 2 deletions docker/dockerfile → docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ RUN apt-get update && \


###### install cogkit ######
REPO_NAME=cogkit
WORKDIR /app

RUN git https://github.com/thudm/CogKit
# refactor url later (and maybe repo name)
RUN git clone https://github.com/THUDM/CogKit.git
WORKDIR CogKit

# TODO: use `pdm sync`
RUN pip install uv
RUN uv pip install . --system
8 changes: 1 addition & 7 deletions docs/01-Intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,7 @@ slug: /

# Introduction

CogKit is a powerful framework for working with ZhipuAI Cog Series models, focusing on multimodal generation and fine-tuning capabilities.
It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.

## Key Features

- **Command-line Interface**: Easy-to-use CLI and Python API for both inference and fine-tuning
- **Fine-tuning Support**: With LoRA or full model fine-tuning support to customize models with your own data
`cogkit` is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.

## Supported Models

Expand Down
27 changes: 9 additions & 18 deletions docs/02-Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,29 @@

# Installation

CogKit can be installed using pip. We recommend using a virtual environment to avoid conflicts with other packages.
`cogkit` can be installed using pip. We recommend using a virtual environment to avoid conflicts with other packages.

## Requirements

- Python 3.10 or higher
- CUDA-compatible GPU (for optimal performance)
- At least 8GB of GPU memory for inference, 16GB+ recommended for fine-tuning
- OpenCV and PyTorch

## Installation Steps

### Create a virtual environment (recommended)
### OpenCV

```bash
# Using venv
python -m venv cogkit-env
source cogkit-env/bin/activate

# Or using conda
conda create -n cogkit-env python=3.10
conda activate cogkit-env
```
Please refer to the [opencv-python installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) for instructions on installing OpenCV according to your system.

### Install PyTorch
### PyTorch

Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.

### Install Cogkit
### CogKit

1. Install `cogkit`:

1. Install Cogkit:
```bash
pip install cogkit@git+https://github.com/thudm/cogkit.git
pip install cogkit@git+https://github.com/THUDM/cogkit.git
```

2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
Expand All @@ -42,7 +34,6 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
pip install -e .[video]
```


### Verify installation

You can verify that cogkit is installed correctly by running:
Expand Down
55 changes: 16 additions & 39 deletions docs/03-Inference/01-CLI.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<!-- TODO: check this doc -->
# Command-Line Interface

CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
`cogkit` provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.

## Overview

Expand All @@ -15,68 +15,45 @@ cogkit [OPTIONS] COMMAND [ARGS]...
```

Available commands:

- `inference`: Generate images or videos using AI models
- `launch`: Launch a web UI for interactive use
- `launch`: Launch a API server

Global options:

- `-v, --verbose`: Increase verbosity (can be used multiple times)

## Inference Command

The `inference` command allows you to generate images and videos:

```bash
cogkit inference [OPTIONS] PROMPT MODEL_ID_OR_PATH
```
The `inference` command allows you to generate images or videos:

### Examples

```bash
# Generate an image from text
cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"

# Generate a video from text
cogkit inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"

```

## Fine-tuning Command

The `finetune` command allows you to fine-tune models with your own data:

```bash
cogkit finetune [OPTIONS]
```
<!-- TODO: Add example for i2v -->

> Note: The fine-tuning command is currently under development. Please check back for updates.
:::tip
See `cogkit inference --help` for more information.
:::

<!-- TODO: add docs for launch server -->
## Launch Command

The `launch` command starts a web UI for interactive use:
The `launch` command will starts a API server:

<!-- FIXME: Add examples -->
```bash
cogkit launch [OPTIONS]
...
```

This launches a web interface where you can:
- Generate images and videos interactively
- Upload images for image-to-video generation
- Adjust generation parameters
- View and download results

### Options

| Option | Description |
|--------|-------------|
| `--host TEXT` | Host to bind the server to (default: 127.0.0.1) |
| `--port INTEGER` | Port to bind the server to (default: 7860) |
| `--share` | Create a public URL |
Please refer to [API](./02-API.md#api-server) for details on how to interact with the API server using client interfaces.

### Example

```bash
# Launch the web UI on the default port
cogkit launch

```
:::tip
See `cogkit launch --help` for more information.
:::
8 changes: 6 additions & 2 deletions docs/03-Inference/02-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

# API

CogKit provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
`cogkit` provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.

## Python API

You can also use Cogkit programmatically in your Python code:
You can also use `cogkit` programmatically in your Python code:

```python
from cogkit.generation import generate_image, generate_video
Expand All @@ -32,6 +32,10 @@ video = generate_video(
)
video.save("cat_video.mp4")
```
<!-- TODO: add examples for i2v -->

<!-- FIXME: correct url -->
See function signatures in [generation.py](...) for more details.

## API Server

Expand Down
2 changes: 1 addition & 1 deletion docs/04-Finetune/02-Quick Start.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ We recommend that you read the corresponding [model card](../05-Model%20Card.mdx
4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs

5. Run the script, for example:

```bash
bash scripts/train_ddp_t2i.sh
```


## Load Fine-tuned Model

### LoRA
Expand Down
10 changes: 5 additions & 5 deletions docs/04-Finetune/03-Data Format.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:

## Text-to-Image Conversion Dataset (t2i)

- Each directory contains a set of image files (`.png`)
- The `metadata.jsonl` file contains text descriptions for each image

Expand Down Expand Up @@ -34,12 +35,11 @@
```

:::info
- Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
- When image files are provided, they are associated with the video file of the same name through the id field
- Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
- When image files are provided, they are associated with the video file of the same name through the id field
:::

## Notes

- Training sets (`train/`) are used for model training
- Test sets (`test/`) are used for evaluating model performance
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed cache data. If the dataset changes, you need to **manually delete this directory** and retrain.
- Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.
8 changes: 4 additions & 4 deletions docs/05-Model Card.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# Model Card

Here is a detailed description of how CogKit supports models.
Here is a detailed description of how `cogkit` supports models.

All training requirements must be strictly followed as specified in the table below, including resolution, number of frames, prompt token limit, and video length requirements.

Expand All @@ -27,9 +27,9 @@ All training requirements must be strictly followed as specified in the table be
<td style={{ textAlign: "center" }}>September 19, 2024</td>
</tr>
<tr>
<td style={{ textAlign: "center" }}>Video Resolution</td>
<td style={{ textAlign: "center" }}>Video Resolution (W * H) </td>
<td colspan="1" style={{ textAlign: "center" }}>1360 * 768</td>
<td colspan="1" style={{ textAlign: "center" }}>Min(W, H) &#61 768 <br/> 768 ≤ Max(W, H) ≤ 1360 <br/> Max(W, H) % 16 &#61 0</td>
<td colspan="1" style={{ textAlign: "center" }}>Min(W, H) = 768 <br/> 768 ≤ Max(W, H) ≤ 1360 <br/> Max(W, H) % 16 = 0</td>
<td colspan="3" style={{ textAlign: "center" }}>720 * 480</td>
</tr>
<tr>
Expand Down Expand Up @@ -80,7 +80,7 @@ All training requirements must be strictly followed as specified in the table be
</tr>
<tr>
<td style={{ textAlign: "center" }}>Resolution</td>
<td style={{ textAlign: "center" }}>512 ≤ (W, H) ≤ 2048 <br/> H * W ≤ 2^{21} <br/> Max(W, H) % 32 &#61 0 </td>
<td style={{ textAlign: "center" }}>512 ≤ (W, H) ≤ 2048 <br/> H * W ≤ 2^{21} <br/> Max(W, H) % 32 = 0 </td>
</tr>
<tr>
<td style={{ textAlign: "center" }}>Prompt Language</td>
Expand Down
20 changes: 0 additions & 20 deletions hatch.toml

This file was deleted.

Loading
Loading