Skip to content

Commit d51b855

Browse files
Merge pull request #1 from zRzRzRzRzRzRzR/refactor
refactor: moves to `pdm` package manager
2 parents 7539fef + cf91ec8 commit d51b855

File tree

107 files changed

+3876
-1213
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

107 files changed

+3876
-1213
lines changed

.env.template

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
COGVIEW4_PATH=THUDM/CogView4-6B

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,13 @@ Temporary Items
241241

242242
# End of https://www.toptal.com/developers/gitignore/api/macos
243243

244+
# * hatch-vcs
244245
src/cogkit/_version.py
246+
247+
# * pdm
248+
.pdm-python
249+
250+
# * a temporary directory to store files you do not wish to share.
245251
tmp/
246252

247253
**/*.safetensor
@@ -254,7 +260,6 @@ tmp/
254260
**/*foo*
255261
**/train_result
256262

257-
**/uv.lock
258263

259264
webdoc/
260265
**/wandb/

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Introduction
44

5-
**CogKit** is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
5+
**`cogkit`** is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
66

7-
Visit our [**Docs**](https://thudum.github.io/CogKit) to start.
7+
Visit our [**Docs**](https://thudm.github.io/CogKit) to start.
88

99
## Features
1010

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ RUN apt-get update && \
1212

1313

1414
###### install cogkit ######
15-
REPO_NAME=cogkit
1615
WORKDIR /app
1716

18-
RUN git https://github.com/thudm/CogKit
17+
# refactor url later (and maybe repo name)
18+
RUN git clone https://github.com/THUDM/CogKit.git
1919
WORKDIR CogKit
2020

21+
# TODO: use `pdm sync`
2122
RUN pip install uv
2223
RUN uv pip install . --system

docs/01-Intro.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,7 @@ slug: /
44

55
# Introduction
66

7-
CogKit is a powerful framework for working with ZhipuAI Cog Series models, focusing on multimodal generation and fine-tuning capabilities.
8-
It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
9-
10-
## Key Features
11-
12-
- **Command-line Interface**: Easy-to-use CLI and Python API for both inference and fine-tuning
13-
- **Fine-tuning Support**: With LoRA or full model fine-tuning support to customize models with your own data
7+
`cogkit` is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
148

159
## Supported Models
1610

docs/02-Installation.md

Lines changed: 9 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,29 @@
33

44
# Installation
55

6-
CogKit can be installed using pip. We recommend using a virtual environment to avoid conflicts with other packages.
6+
`cogkit` can be installed using pip. We recommend using a virtual environment to avoid conflicts with other packages.
77

88
## Requirements
99

1010
- Python 3.10 or higher
11-
- CUDA-compatible GPU (for optimal performance)
12-
- At least 8GB of GPU memory for inference, 16GB+ recommended for fine-tuning
11+
- OpenCV and PyTorch
1312

1413
## Installation Steps
1514

16-
### Create a virtual environment (recommended)
15+
### OpenCV
1716

18-
```bash
19-
# Using venv
20-
python -m venv cogkit-env
21-
source cogkit-env/bin/activate
22-
23-
# Or using conda
24-
conda create -n cogkit-env python=3.10
25-
conda activate cogkit-env
26-
```
17+
Please refer to the [opencv-python installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) for instructions on installing OpenCV according to your system.
2718

28-
### Install PyTorch
19+
### PyTorch
2920

3021
Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.
3122

32-
### Install Cogkit
23+
### CogKit
24+
25+
1. Install `cogkit`:
3326

34-
1. Install Cogkit:
3527
```bash
36-
pip install cogkit@git+https://github.com/thudm/cogkit.git
28+
pip install cogkit@git+https://github.com/THUDM/cogkit.git
3729
```
3830

3931
2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
@@ -42,7 +34,6 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
4234
pip install -e .[video]
4335
```
4436

45-
4637
### Verify installation
4738

4839
You can verify that cogkit is installed correctly by running:

docs/03-Inference/01-CLI.md

Lines changed: 16 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<!-- TODO: check this doc -->
55
# Command-Line Interface
66

7-
CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
7+
`cogkit` provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
88

99
## Overview
1010

@@ -15,68 +15,45 @@ cogkit [OPTIONS] COMMAND [ARGS]...
1515
```
1616

1717
Available commands:
18+
1819
- `inference`: Generate images or videos using AI models
19-
- `launch`: Launch a web UI for interactive use
20+
- `launch`: Launch a API server
2021

2122
Global options:
23+
2224
- `-v, --verbose`: Increase verbosity (can be used multiple times)
2325

2426
## Inference Command
2527

26-
The `inference` command allows you to generate images and videos:
27-
28-
```bash
29-
cogkit inference [OPTIONS] PROMPT MODEL_ID_OR_PATH
30-
```
28+
The `inference` command allows you to generate images or videos:
3129

32-
### Examples
3330

3431
```bash
3532
# Generate an image from text
3633
cogkit inference "a beautiful sunset over mountains" "THUDM/CogView4-6B"
3734

3835
# Generate a video from text
3936
cogkit inference "a cat playing with a ball" "THUDM/CogVideoX1.5-5B"
40-
4137
```
4238

43-
## Fine-tuning Command
44-
45-
The `finetune` command allows you to fine-tune models with your own data:
46-
47-
```bash
48-
cogkit finetune [OPTIONS]
49-
```
39+
<!-- TODO: Add example for i2v -->
5040

51-
> Note: The fine-tuning command is currently under development. Please check back for updates.
41+
:::tip
42+
See `cogkit inference --help` for more information.
43+
:::
5244

5345
<!-- TODO: add docs for launch server -->
5446
## Launch Command
5547

56-
The `launch` command starts a web UI for interactive use:
48+
The `launch` command will starts a API server:
5749

50+
<!-- FIXME: Add examples -->
5851
```bash
59-
cogkit launch [OPTIONS]
52+
...
6053
```
6154

62-
This launches a web interface where you can:
63-
- Generate images and videos interactively
64-
- Upload images for image-to-video generation
65-
- Adjust generation parameters
66-
- View and download results
67-
68-
### Options
69-
70-
| Option | Description |
71-
|--------|-------------|
72-
| `--host TEXT` | Host to bind the server to (default: 127.0.0.1) |
73-
| `--port INTEGER` | Port to bind the server to (default: 7860) |
74-
| `--share` | Create a public URL |
55+
Please refer to [API](./02-API.md#api-server) for details on how to interact with the API server using client interfaces.
7556

76-
### Example
77-
78-
```bash
79-
# Launch the web UI on the default port
80-
cogkit launch
81-
82-
```
57+
:::tip
58+
See `cogkit launch --help` for more information.
59+
:::

docs/03-Inference/02-API.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33

44
# API
55

6-
CogKit provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
6+
`cogkit` provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
77

88
## Python API
99

10-
You can also use Cogkit programmatically in your Python code:
10+
You can also use `cogkit` programmatically in your Python code:
1111

1212
```python
1313
from cogkit.generation import generate_image, generate_video
@@ -32,6 +32,10 @@ video = generate_video(
3232
)
3333
video.save("cat_video.mp4")
3434
```
35+
<!-- TODO: add examples for i2v -->
36+
37+
<!-- FIXME: correct url -->
38+
See function signatures in [generation.py](...) for more details.
3539

3640
## API Server
3741

docs/04-Finetune/02-Quick Start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ We recommend that you read the corresponding [model card](../05-Model%20Card.mdx
3030
4. If you are using ZeRO strategy, refer to `accelerate_config.yaml` to confirm your ZeRO level and number of GPUs
3131

3232
5. Run the script, for example:
33+
3334
```bash
3435
bash scripts/train_ddp_t2i.sh
3536
```
3637

37-
3838
## Load Fine-tuned Model
3939

4040
### LoRA

docs/04-Finetune/03-Data Format.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
`src/cogkit/finetune/data` directory contains various dataset templates for fine-tuning different models, please refer to the corresponding dataset template based on your task type:
88

99
## Text-to-Image Conversion Dataset (t2i)
10+
1011
- Each directory contains a set of image files (`.png`)
1112
- The `metadata.jsonl` file contains text descriptions for each image
1213

@@ -34,12 +35,11 @@
3435
```
3536

3637
:::info
37-
- Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
38-
- When image files are provided, they are associated with the video file of the same name through the id field
38+
- Image files are optional; if not provided, the system will default to using the first frame of the video as the input image
39+
- When image files are provided, they are associated with the video file of the same name through the id field
3940
:::
4041

4142
## Notes
4243

43-
- Training sets (`train/`) are used for model training
44-
- Test sets (`test/`) are used for evaluating model performance
45-
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed cache data. If the dataset changes, you need to **manually delete this directory** and retrain.
44+
- Training sets (`train/`) are used for model training, test sets (`test/`) are used for evaluating model performance
45+
- Each dataset will generate a `.cache/` directory during training, used to store preprocessed data. If the dataset changes, you need to **manually delete this directory** and retrain.

0 commit comments

Comments
 (0)