Skip to content

Commit 0945da1

Browse files
authored
Merge pull request #28 from liulanzheng/main
update doc
2 parents 10e1ff1 + 5fc757e commit 0945da1

File tree

7 files changed

+239
-156
lines changed

7 files changed

+239
-156
lines changed

docs/_sidebar.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,9 @@
33
- OSS Model Connector
44

55
- [Introduction](/modelconnector/introduction.md)
6-
- [Installation](/modelconnector/installation.md)
7-
- [Configuration](/modelconnector/configuration.md)
86
- [Python APIs](/modelconnector/python_apis.md)
97
- [Inference Framworks](/modelconnector/framworks.md)
8+
- [LD_PRELOAD](/modelconnector/ld_preload.md)
109

1110
- OSS Torch Connector
1211

docs/modelconnector/configuration.md

-127
This file was deleted.

docs/modelconnector/framworks.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Overview
44

5-
Mainstream AI inference frameworks, such as vllm and transformers, load models from a local directory. The number of files in the model directory is not large, comprising several small files and multiple larger model files. For example, the directory below shows the model directory for Qwen2.50-72B, including 37 large safetensors files and several small files.
5+
Mainstream AI inference frameworks, such as vllm and transformers, load models from a local directory. The number of files in the model directory is not large, comprising several small files and multiple larger model files. For example, the directory below shows the model directory for Qwen2.5-72B, including 37 large safetensors files and several small files.
66

77
```bash
88
# ll -lh /root/Qwen2.5-72B

docs/modelconnector/installation.md

-22
This file was deleted.

docs/modelconnector/introduction.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ In current, memory of computing nodes for AI inference are generally large. The
99
The primary function of the OSS Model Connector is to fully leverage local memory to accelerate the process of downloading models from OSS.
1010
In our testing environment, the download speed can exceed 15GB/s, approaching 20GB/s.
1111

12-
The OSS Model Connector mainly offers two usage methods.
12+
The OSS Model Connector mainly offers 3 usage methods.
1313

14-
The first method is using the Python interface, allowing users to open OSS objects and read their contents through list stream api.
14+
- The first method is using the Python interface, allowing users to open OSS objects and read their contents through list stream api.
1515
We also provide an interface for listing objects on OSS, as well as an implementation call 'fast list', which can complete the listing of a million objects within several seconds.
1616

17-
The second method is utilizing the libraries for loading models in inference frameworks such as transformer or vllm. This method enables the integration of model file downloading and loading, optimizing the model deployment time.
17+
- The second method is utilizing the libraries for loading models in inference frameworks such as transformer or vllm. This method enables the integration of model file downloading and loading, optimizing the model deployment time.
18+
19+
- The third method is to use LD_PRELOAD to address scenarios that the second method cannot handle, such as multi-process environments. The advantage of this approach is that it does not require modifying the code, configuration alone is sufficient.
1820

1921
## Features
2022

docs/modelconnector/ld_preload.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Loading Models via LD_PRELOAD
2+
3+
## Overview
4+
In multi-process scenarios, the OSSModelConnector configuration initialized via the Python interface may be lost in Python sub-processes, causing OSS data to fail to load. For example, `vllm.entrypoints.openai.api_server`, where the main process is the API server and model inference happens in sub-processes; or in multi-GPU scenarios, where different processes load models onto different GPUs.
5+
6+
In such cases, you can start the OSSModelConnector using the `LD_PRELOAD` method, passing configuration parameters via environment variables. Compared to initializing with Python, this `LD_PRELOAD` method generally does not require code modifications.
7+
8+
## Installation
9+
10+
Download the installation package `oss-connector-lib` from [Release](https://github.com/aliyun/oss-connector-for-ai-ml/releases)
11+
12+
For example, download the `oss-connector-lib-1.0.0rc8` and install.
13+
14+
rpm:
15+
16+
```shell
17+
yum install -y https://github.com/aliyun/oss-connector-for-ai-ml/releases/download/ossmodelconnector%2Fv1.0.0rc8/oss-connector-lib-1.0.0rc8.x86_64.rpm
18+
```
19+
20+
deb:
21+
```shell
22+
wget https://github.com/aliyun/oss-connector-for-ai-ml/releases/download/ossmodelconnector%2Fv1.0.0rc8/oss-connector-lib-1.0.0rc8.x86_64.deb
23+
dpkg -i oss-connector-lib-1.0.0rc8.x86_64.deb
24+
```
25+
26+
**After installation, check `/usr/local/lib/libossc_preload.so`.**
27+
28+
29+
## Usage Method
30+
31+
### Configuration File
32+
33+
The configuration file path is `/etc/oss-connector/config.json`. The installation package **already includes** a default configuration file as follows:
34+
35+
```json
36+
{
37+
"logLevel": 1,
38+
"logPath": "/var/log/oss-connector/connector.log",
39+
"auditPath": "/var/log/oss-connector/audit.log",
40+
"prefetch": {
41+
"vcpus": 16,
42+
"workers": 16
43+
}
44+
}
45+
```
46+
47+
The main performance-related parameters are:
48+
49+
- `prefetch.vcpus`: Number of vCPUs (CPU cores) to prefetch, default value is 16.
50+
- `prefetch.workers`: Number of coroutines per prefetched vCPU, default value is 16.
51+
52+
### Configure Environment Variables
53+
54+
| Environment Variable KEY | Environment Variable VALUE Description |
55+
| --- | --- |
56+
| OSS_ACCESS_KEY_ID | OSS access key |
57+
| OSS_ACCESS_KEY_SECRET | OSS access key secret |
58+
| OSS_SESSION_TOKEN | Optional, STS token |
59+
| OSS_ENDPOINT | Endpoint for OSS, e.g., `http://oss-cn-beijing-internal.aliyuncs.com`, default HTTP schema is `http` |
60+
| OSS_PATH | OSS model directory, e.g., `oss://example-bucket/example-model-path/` |
61+
| MODEL_DIR | Local model directory, passed to vLLM or other inference frameworks. To avoid interference from dirty data, it is recommended to clear this directory first. Temporary data will be downloaded during use, and it can be deleted afterward. |
62+
| LD_PRELOAD | `/usr/local/lib/libossc_preload.so` |
63+
| **ENABLE_CONNECTOR** | `1`, **Enable Connector, must be set for the main process** |
64+
65+
### Start Python Program
66+
67+
```shell
68+
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${akid} OSS_ACCESS_KEY_SECRET=${aksecret} OSS_ENDPOINT=${endpoint} OSS_PATH=oss://${bucket}/${path}/ MODEL_DIR=/tmp/model python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
69+
```
70+
71+
### Note!
72+
73+
1. `MODEL_DIR` must be consistent with the model dir for AI framework, e.g., vLLM's `--model`.
74+
75+
2. `ENABLE_CONNECTOR=1` must be set for the entrypoint process. `LD_PRELOAD` is recommended to be set for the entrypoint process but can also be directly set for the container.
76+
77+
3. Currently, when starting the OSSModelConnector via `LD_PRELOAD`, additional memory used for caching will be released with a delay, currently set at 120 seconds.
78+
79+
4. If using `nohup` to start, do not configure the environment variables for `nohup`. Instead, encapsulate the environment variables and startup command into a script and execute `nohup` on the script.
80+
81+
5. For now, try to use this method in single-machine scenarios. In multi-machine setups, there might be repeated loading or other unknown issues.

0 commit comments

Comments
 (0)