Skip to content

Commit 0e8d2dd

Browse files
OleehyOzhangch9
andauthored
[fix] API server
* [feat] Add check * Change to tools * nit: handles exception * [CI] Update ruff-pre-commit * nit: fixes lint errors * nit: makes `model` param optional in the `/v1/images/generations` endpoint * [fix] Correct image encoding errors * [fix] Pass environment file to settings properly * [feat] Add supported models info in model not found error * [deps] Add openai client * nit: avoids multiple instances * [docs] Add API server documentation * nit: removes `Makefile` --------- Co-authored-by: Chenhui Zhang <[email protected]>
1 parent cf91ec8 commit 0e8d2dd

File tree

15 files changed

+156
-82
lines changed

15 files changed

+156
-82
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
repos:
22
- repo: https://github.com/astral-sh/ruff-pre-commit
3-
rev: v0.4.5
3+
rev: v0.11.0
44
hooks:
55
- id: ruff
66
args: [--fix, --respect-gitignore, --config=pyproject.toml]

Makefile

Lines changed: 0 additions & 7 deletions
This file was deleted.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
**`cogkit`** is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
5+
CogKit is an open-source project that provides a user-friendly interface for researchers and developers to utilize ZhipuAI's [**CogView**](https://huggingface.co/collections/THUDM/cogview-67ac3f241eefad2af015669b) (image generation) and [**CogVideoX**](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) (video generation) models. It streamlines multimodal tasks such as **text-to-image (T2I)**, **text-to-video (T2V)**, and **image-to-video (I2V)**. Users must comply with legal and ethical guidelines to ensure responsible implementation.
66

77
Visit our [**Docs**](https://thudm.github.io/CogKit) to start.
88

docs/01-Intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ slug: /
44

55
# Introduction
66

7-
`cogkit` is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
7+
CogKit is a powerful framework for working with cognitive AI models, focusing on multi-modal generation and fine-tuning capabilities. It provides a unified interface for various AI tasks including text-to-image, text-to-video, and image-to-video generation.
88

99
## Supported Models
1010

docs/02-Installation.md

Lines changed: 3 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,13 @@
33

44
# Installation
55

6-
`cogkit` can be installed using pip. We recommend using a virtual environment to avoid conflicts with other packages.
7-
86
## Requirements
97

108
- Python 3.10 or higher
11-
- OpenCV and PyTorch
9+
- PyTorch
1210

1311
## Installation Steps
1412

15-
### OpenCV
16-
17-
Please refer to the [opencv-python installation guide](https://github.com/opencv/opencv-python?tab=readme-ov-file#installation-and-usage) for instructions on installing OpenCV according to your system.
18-
1913
### PyTorch
2014

2115
Please refer to the [PyTorch installation guide](https://pytorch.org/get-started/locally/) for instructions on installing PyTorch according to your system.
@@ -25,13 +19,13 @@ Please refer to the [PyTorch installation guide](https://pytorch.org/get-started
2519
1. Install `cogkit`:
2620

2721
```bash
28-
pip install cogkit@git+https://github.com/THUDM/cogkit.git
22+
pip install "cogkit@git+https://github.com/THUDM/cogkit.git"
2923
```
3024

3125
2. Optional: for video tasks (e.g. text-to-video), install additional dependencies:
3226

3327
```bash
34-
pip install -e .[video]
28+
pip install "cogkit[video]@git+https://github.com/THUDM/cogkit.git"
3529
```
3630

3731
### Verify installation
@@ -41,18 +35,3 @@ You can verify that cogkit is installed correctly by running:
4135
```bash
4236
cogkit --help
4337
```
44-
45-
and will get:
46-
47-
```text
48-
Usage: cogkit [OPTIONS] COMMAND [ARGS]...
49-
50-
Options:
51-
-v, --verbose Verbosity level (from 0 to 2) [default: 0; 0<=x<=2]
52-
--help Show this message and exit.
53-
54-
Commands:
55-
finetune
56-
inference Generates a video based on the given prompt and saves it to...
57-
launch
58-
```

docs/03-Inference/01-CLI.md

Lines changed: 60 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<!-- TODO: check this doc -->
55
# Command-Line Interface
66

7-
`cogkit` provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
7+
CogKit provides a powerful command-line interface (CLI) that allows you to perform various tasks without writing Python code. This guide covers the available commands and their usage.
88

99
## Overview
1010

@@ -45,15 +45,70 @@ See `cogkit inference --help` for more information.
4545
<!-- TODO: add docs for launch server -->
4646
## Launch Command
4747

48-
The `launch` command will starts a API server:
48+
The `launch` command starts an API server for image and video generation. Before using this command, you need to install the API dependencies:
4949

50-
<!-- FIXME: Add examples -->
5150
```bash
52-
...
51+
pip install "cogkit[api]@git+https://github.com/THUDM/cogkit.git"
5352
```
5453

55-
Please refer to [API](./02-API.md#api-server) for details on how to interact with the API server using client interfaces.
54+
<!-- FIXME: correct url -->
55+
Before starting the server, make sure to configure the model paths that you want to serve. This step is necessary to specify which models will be available through the API server.
56+
57+
To configure the model paths:
58+
59+
1. Create a `.env` file in your working directory
60+
2. Refer to the [environment template]() and add needed environment variables to specify model paths. For example, to serve `CogView4-6B` as a service, you must specify `COGVIEW4_PATH` in your `.env` file:
61+
62+
```bash
63+
# /your/workdir/.env
64+
65+
COGVIEW4_PATH="THUDM/CogView4-6B" # or local path
66+
# other variables...
67+
```
68+
69+
Then starts a API server, for example:
70+
71+
```bash
72+
cogkit launch
73+
```
5674

5775
:::tip
5876
See `cogkit launch --help` for more information.
5977
:::
78+
79+
80+
### Client Interfaces
81+
82+
The server API is OpenAI-compatible, which means you can use it with any OpenAI client library. Here's an example using the OpenAI Python client:
83+
84+
```python
85+
import base64
86+
87+
from io import BytesIO
88+
from PIL import Image
89+
90+
from openai import OpenAI
91+
92+
client = OpenAI(
93+
api_key="foo",
94+
base_url="http://localhost:8000/v1" # Your server URL
95+
)
96+
97+
# Generate an image from cogview-4
98+
response = client.images.generate(
99+
model="cogview-4",
100+
prompt="a beautiful sunset over mountains",
101+
n=1,
102+
size="1024x1024",
103+
)
104+
image_b64 = response.data[0].b64_json
105+
106+
# Decode the base64 string
107+
image_data = base64.b64decode(image_b64)
108+
109+
# Create an image from the decoded data
110+
image = Image.open(BytesIO(image_data))
111+
112+
# Save the image
113+
image.save("output.png")
114+
```

docs/03-Inference/02-API.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# API
55

6-
`cogkit` provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
6+
CogKit provides a powerful inference API for generating images and videos using various AI models. This document covers both the Python API and API server.
77

88
## Python API
99

@@ -18,12 +18,15 @@ image = generate_image(
1818
model_id_or_path="THUDM/CogView4-6B",
1919
lora_model_id_or_path=None,
2020
transformer_path=None,
21+
height=1024,
22+
width=1024,
2123
)
2224
image.save("sunset.png")
2325

24-
# Text-to-Video generation
26+
# Text/Image-to-Video generation
2527
video = generate_video(
2628
prompt="a cat playing with a ball",
29+
image_file="path/to/image.png", # Needed for Image-to-Video task
2730
model_id_or_path="THUDM/CogVideoX1.5-5B",
2831
lora_model_id_or_path=None,
2932
transformer_path=None,
@@ -32,13 +35,6 @@ video = generate_video(
3235
)
3336
video.save("cat_video.mp4")
3437
```
35-
<!-- TODO: add examples for i2v -->
3638

3739
<!-- FIXME: correct url -->
38-
See function signatures in [generation.py](...) for more details.
39-
40-
## API Server
41-
42-
<!-- FIXME: add docs for the API server -->
43-
44-
<!-- TODO: add examples -->
40+
See function signatures in [generation.py]() for more details.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ dependencies = [
2727
torch = ["numpy", "torch", "torchvision"]
2828
api = [
2929
"fastapi[standard]~=0.115.11",
30+
"openai~=1.67",
3031
"pydantic-settings~=2.8",
3132
"python-dotenv~=1.0",
3233
]

src/cogkit/api/application.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def get_application(settings: APISettings | None = None) -> FastAPI:
2222

2323
@asynccontextmanager
2424
async def lifespan(_: FastAPI) -> AsyncIterator[RequestState]:
25-
yield {"image_generation": ImageGenerationService(settings.cogview4_path)}
25+
yield {"image_generation": ImageGenerationService(settings)}
2626

2727
app = FastAPI(lifespan=lifespan)
2828

src/cogkit/api/models/images/generation_params.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
class ImageGenerationParams(RequestParams):
1010
prompt: str
11-
model: Literal["cogview-4"] = "cogview-4"
11+
model: str = "cogview-4"
1212
n: int = 1
1313
size: Literal[
1414
"1024x1024", "768x1344", "864x1152", "1344x768", "1152x864", "1440x720", "720x1440"

0 commit comments

Comments
 (0)