aliyun
diff --git a/‎docs/_sidebar.md
Lines changed: 1 addition & 2 deletions b/‎docs/_sidebar.md
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/modelconnector/configuration.md
Lines changed: 0 additions & 127 deletions b/‎docs/modelconnector/configuration.md
Lines changed: 0 additions & 127 deletions
diff --git a/‎docs/modelconnector/framworks.md
Lines changed: 1 addition & 1 deletion b/‎docs/modelconnector/framworks.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/modelconnector/installation.md
Lines changed: 0 additions & 22 deletions b/‎docs/modelconnector/installation.md
Lines changed: 0 additions & 22 deletions
diff --git a/‎docs/modelconnector/introduction.md
Lines changed: 5 additions & 3 deletions b/‎docs/modelconnector/introduction.md
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/modelconnector/ld_preload.md
Lines changed: 81 additions & 0 deletions b/‎docs/modelconnector/ld_preload.md
Lines changed: 81 additions & 0 deletions
@@ -3,10 +3,9 @@
 - OSS Model Connector
 
     - [Introduction](/modelconnector/introduction.md)
-    - [Installation](/modelconnector/installation.md)
-    - [Configuration](/modelconnector/configuration.md)
     - [Python APIs](/modelconnector/python_apis.md)
     - [Inference Framworks](/modelconnector/framworks.md)
+    - [LD_PRELOAD](/modelconnector/ld_preload.md)
 
 - OSS Torch Connector
 
 
@@ -2,7 +2,7 @@
 
 ## Overview
 
-Mainstream AI inference frameworks, such as vllm and transformers, load models from a local directory. The number of files in the model directory is not large, comprising several small files and multiple larger model files. For example, the directory below shows the model directory for Qwen2.50-72B, including 37 large safetensors files and several small files.
+Mainstream AI inference frameworks, such as vllm and transformers, load models from a local directory. The number of files in the model directory is not large, comprising several small files and multiple larger model files. For example, the directory below shows the model directory for Qwen2.5-72B, including 37 large safetensors files and several small files.
 
 ```bash
 # ll -lh /root/Qwen2.5-72B
 
@@ -9,12 +9,14 @@ In current, memory of computing nodes for AI inference are generally large. The
 The primary function of the OSS Model Connector is to fully leverage local memory to accelerate the process of downloading models from OSS.
 In our testing environment, the download speed can exceed 15GB/s, approaching 20GB/s.
 
-The OSS Model Connector mainly offers two usage methods.
+The OSS Model Connector mainly offers 3 usage methods.
 
-The first method is using the Python interface, allowing users to open OSS objects and read their contents through list stream api.
+- The first method is using the Python interface, allowing users to open OSS objects and read their contents through list stream api.
 We also provide an interface for listing objects on OSS, as well as an implementation call 'fast list', which can complete the listing of a million objects within several seconds.
 
-The second method is utilizing the libraries for loading models in inference frameworks such as transformer or vllm. This method enables the integration of model file downloading and loading, optimizing the model deployment time.
+- The second method is utilizing the libraries for loading models in inference frameworks such as transformer or vllm. This method enables the integration of model file downloading and loading, optimizing the model deployment time.
+
+- The third method is to use LD_PRELOAD to address scenarios that the second method cannot handle, such as multi-process environments. The advantage of this approach is that it does not require modifying the code, configuration alone is sufficient.
 
 ## Features
 
 
@@ -0,0 +1,81 @@
+# Loading Models via LD_PRELOAD
+
+## Overview
+In multi-process scenarios, the OSSModelConnector configuration initialized via the Python interface may be lost in Python sub-processes, causing OSS data to fail to load. For example, `vllm.entrypoints.openai.api_server`, where the main process is the API server and model inference happens in sub-processes; or in multi-GPU scenarios, where different processes load models onto different GPUs.
+
+In such cases, you can start the OSSModelConnector using the `LD_PRELOAD` method, passing configuration parameters via environment variables. Compared to initializing with Python, this `LD_PRELOAD` method generally does not require code modifications.
+
+## Installation
+
+Download the installation package `oss-connector-lib` from [Release](https://github.com/aliyun/oss-connector-for-ai-ml/releases)
+
+For example, download the `oss-connector-lib-1.0.0rc8` and install.
+
+rpm:
+
+```shell
+yum install -y https://github.com/aliyun/oss-connector-for-ai-ml/releases/download/ossmodelconnector%2Fv1.0.0rc8/oss-connector-lib-1.0.0rc8.x86_64.rpm
+```
+
+deb:
+```shell
+wget https://github.com/aliyun/oss-connector-for-ai-ml/releases/download/ossmodelconnector%2Fv1.0.0rc8/oss-connector-lib-1.0.0rc8.x86_64.deb
+dpkg -i oss-connector-lib-1.0.0rc8.x86_64.deb
+```
+
+**After installation, check `/usr/local/lib/libossc_preload.so`.**
+
+
+## Usage Method
+
+### Configuration File
+
+The configuration file path is `/etc/oss-connector/config.json`. The installation package **already includes** a default configuration file as follows:
+
+```json
+{
+    "logLevel": 1,
+    "logPath": "/var/log/oss-connector/connector.log",
+    "auditPath": "/var/log/oss-connector/audit.log",
+    "prefetch": {
+        "vcpus": 16,
+        "workers": 16
+    }
+}
+```
+
+The main performance-related parameters are:
+
+- `prefetch.vcpus`: Number of vCPUs (CPU cores) to prefetch, default value is 16.
+- `prefetch.workers`: Number of coroutines per prefetched vCPU, default value is 16.
+
+### Configure Environment Variables
+
+| Environment Variable KEY | Environment Variable VALUE Description |
+| --- | --- |
+| OSS_ACCESS_KEY_ID | OSS access key |
+| OSS_ACCESS_KEY_SECRET | OSS access key secret |
+| OSS_SESSION_TOKEN | Optional, STS token |
+| OSS_ENDPOINT | Endpoint for OSS, e.g., `http://oss-cn-beijing-internal.aliyuncs.com`, default HTTP schema is `http` |
+| OSS_PATH | OSS model directory, e.g., `oss://example-bucket/example-model-path/` |
+| MODEL_DIR | Local model directory, passed to vLLM or other inference frameworks. To avoid interference from dirty data, it is recommended to clear this directory first. Temporary data will be downloaded during use, and it can be deleted afterward. |
+| LD_PRELOAD | `/usr/local/lib/libossc_preload.so` |
+| **ENABLE_CONNECTOR** | `1`, **Enable Connector, must be set for the main process** |
+
+### Start Python Program
+
+```shell
+LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${akid} OSS_ACCESS_KEY_SECRET=${aksecret} OSS_ENDPOINT=${endpoint} OSS_PATH=oss://${bucket}/${path}/ MODEL_DIR=/tmp/model python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
+```
+
+### Note!
+
+1. `MODEL_DIR` must be consistent with the model dir for AI framework, e.g., vLLM's `--model`.
+
+2. `ENABLE_CONNECTOR=1` must be set for the entrypoint process. `LD_PRELOAD` is recommended to be set for the entrypoint process but can also be directly set for the container.
+
+3. Currently, when starting the OSSModelConnector via `LD_PRELOAD`, additional memory used for caching will be released with a delay, currently set at 120 seconds.
+
+4. If using `nohup` to start, do not configure the environment variables for `nohup`. Instead, encapsulate the environment variables and startup command into a script and execute `nohup` on the script.
+
+5. For now, try to use this method in single-machine scenarios. In multi-machine setups, there might be repeated loading or other unknown issues.