You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/launcher.md
+56-49Lines changed: 56 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -320,33 +320,37 @@ Stop and delete a specific vLLM instance.
320
320
321
321
**GET**`/v2/vllm/instances/{instance_id}/log`
322
322
323
-
Retrieve stdout/stderr logs from a specific vLLM instance starting from a specific byte position.
323
+
Retrieve stdout/stderr logs from a specific vLLM instance as raw bytes.
324
324
325
325
**Path Parameters:**
326
326
327
327
-`instance_id`: ID of the instance
328
328
329
-
**Query Parameters:**
329
+
**Request Headers:**
330
330
331
-
-`start_byte` (optional): Byte position to start reading from (default: 0, minimum: 0). The largest valid value is the total number of bytes captured so far (i.e., the length of the log). Use this to continue reading from where you left off.
332
-
-`max_bytes` (optional): Maximum bytes of log data to retrieve from start_byte (default: 1048576 (1 MB), range: 1024-10485760 (10 MB))
331
+
-`Range` (optional): Byte range to retrieve, following [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110#name-range-requests). Supported formats:
332
+
-`Range: bytes=START-END` — retrieve bytes from START to END (both inclusive)
333
+
-`Range: bytes=START-` — retrieve bytes from START to end of log (up to 1 MB)
334
+
- Suffix ranges (`bytes=-N`) are **not** supported.
333
335
334
-
**Response (200 OK):**
336
+
**Response (200 OK) — without Range header:**
335
337
336
-
```json
337
-
{
338
-
"log": "INFO: Started server process\nINFO: Waiting for application startup\nINFO: Application startup complete\n"
339
-
}
340
-
```
338
+
Returns the full log content (up to 1 MB) as `application/octet-stream`.
341
339
342
-
**Response Fields:**
340
+
**Response (206 Partial Content) — with Range header:**
343
341
344
-
-`log`: Log content as a single string. Since `start_byte` is a byte offset but the JSON response contains unicode characters, the client must encode the string back to UTF-8 to compute the correct byte length for the next `start_byte` (e.g., `start_byte + len(log.encode("utf-8"))` in Python). Using the character length directly will produce incorrect offsets if the log contains multi-byte characters.
342
+
Returns the requested byte range as `application/octet-stream` with a `Content-Range` header:
343
+
344
+
```
345
+
Content-Range: bytes START-END/TOTAL
346
+
Content-Type: application/octet-stream
347
+
```
345
348
346
349
**Error Responses:**
347
350
351
+
-`400 Bad Request`: Malformed or unsupported Range header
348
352
-`404 Not Found`: Instance not found
349
-
-`416 Range Not Satisfiable`: The requested `start_byte`is beyond available log content. The response includes a `Content-Range: bytes */N` header (per [RFC 9110 §15.5.17](https://www.rfc-editor.org/rfc/rfc9110#status.416)) and a JSON body: `{"available_bytes": N}`, where `N` is the total number of bytes captured so far (i.e., the largest valid value of `start_byte`).
353
+
-`416 Range Not Satisfiable`: The requested start position is beyond available log content. The response includes a `Content-Range: bytes */N` header (per [RFC 9110 §15.5.17](https://www.rfc-editor.org/rfc/rfc9110#status.416)) with an empty body, where `N` is the total number of bytes captured so far.
The log is treated as a flat byte stream. The `start_byte` parameter specifies the exact byte offset to begin reading from, and `max_bytes` limits how many bytes are returned:
549
+
The log is treated as a flat byte stream. The `Range` header specifies which bytes to retrieve:
To stream logs, use `start_byte + len(log.encode("utf-8"))` as the `start_byte` for the next request, since `start_byte` is a byte offset and the JSON response contains unicode characters.
560
+
The `Content-Range` response header tells you exactly which bytes were returned and the current total log length (which may grow over time), e.g. `Content-Range: bytes 0-1048575/5242880`.
559
561
560
562
## Configuration
561
563
@@ -629,7 +631,7 @@ Represents a single vLLM instance with its process and configuration.
629
631
-`start()`: Start the vLLM process
630
632
-`stop(timeout=10)`: Stop the vLLM process gracefully (or force kill after timeout)
631
633
-`get_status()`: Get detailed status information
632
-
-`get_logs(start_byte=0, max_bytes=1048576)`: Retrieve logs from the instance starting from a byte position
634
+
-`get_log_bytes(start=0, end=None)`: Retrieve log bytes from the instance (start and end are both inclusive), returns `(bytes, total_size)`
-`stop_all_instances(timeout=10)`: Stop all running instances
643
645
-`get_instance_status(instance_id)`: Get status of a specific instance
644
646
-`get_all_instances_status()`: Get status of all instances
645
-
-`get_instance_logs(instance_id, start_byte=0, max_bytes=1048576)`: Retrieve logs from a specific instance starting from a byte position
647
+
-`get_instance_log_bytes(instance_id, start=0, end=None)`: Retrieve log bytes from a specific instance, returns `(bytes, total_size)`
646
648
647
649
## Best Practices
648
650
@@ -698,36 +700,41 @@ Be mindful of system resources:
698
700
699
701
### 5. Log Management
700
702
701
-
The launcher captures stdout/stderr from each vLLM instance using a multiprocessing queue that feeds into a persistent in-memory byte buffer (`_log_buffer`) per instance:
703
+
The launcher captures stdout/stderr from each vLLM instance by writing directly to a log file on disk:
702
704
703
-
-**Architecture**: A `QueueWriter` in the child process sends messages to a bounded queue (`MAX_QUEUE_SIZE = 5000` messages, a Python constant defined in `launcher.py`). On read, the launcher drains the queue into the instance's `_log_buffer`, which accumulates all log bytes.
704
-
-**Byte-Based Retrieval**: The `start_byte` parameter is a byte offset into the accumulated buffer. The `max_bytes` parameter limits how many bytes are returned per request.
705
-
-**Queue Overflow**: When the queue is full, new messages are dropped (non-blocking put). Messages already drained into the buffer are preserved.
705
+
-**Architecture**: The child process redirects stdout and stderr at the OS level using `os.dup2`, so all output — including from vLLM, uvicorn, and C extensions — is captured to a per-instance log file (`/tmp/launcher-<pid>-vllm-<instance_id>.log`). The file is opened with `O_APPEND` so concurrent writes from stdout and stderr are safe.
706
+
-**Raw Bytes**: The log endpoint returns `application/octet-stream` — raw bytes, not JSON.
707
+
-**Range Header**: Use the standard HTTP `Range: bytes=START-END` header to request specific byte ranges. Without a Range header, the full log (up to 1 MB) is returned.
708
+
-**No Data Loss**: Since logs are written directly to disk, there is no bounded queue that could overflow and drop messages.
706
709
-**Non-blocking**: Log capture doesn't slow down the vLLM process.
707
-
-**Streaming Support**: Use `start_byte` parameter to efficiently stream logs without re-reading.
710
+
-**Streaming Support**: Use the `Content-Range` response header to track position for efficient streaming.
711
+
-**Cleanup**: Log files are automatically removed when an instance is stopped or deleted.
708
712
709
713
**Best Practices:**
710
714
711
-
-**Streaming Logs**: Use `start_byte` to efficiently stream logs. Since `start_byte` is a byte offset but the JSON response contains unicode characters, compute the byte length by encoding the string back to UTF-8:
715
+
-**Streaming Logs**: Use the Range header to stream logs efficiently. The `Content-Range`response header tells you the byte range and total size:
0 commit comments