Skip to content

Commit e4dc7da

Browse files
Launcher log improvements (#286)
* tmp file for logs Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Pure byte and range header Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Mike's comments Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Fix tests Signed-off-by: Diego-Castan <diego.castan@ibm.com> * use dup2 to have vllm logs Signed-off-by: Diego-Castan <diego.castan@ibm.com> * update doc Signed-off-by: Diego-Castan <diego.castan@ibm.com> --------- Signed-off-by: Diego-Castan <diego.castan@ibm.com>
1 parent 6b8f8e0 commit e4dc7da

File tree

3 files changed

+586
-506
lines changed

3 files changed

+586
-506
lines changed

docs/launcher.md

Lines changed: 56 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -320,33 +320,37 @@ Stop and delete a specific vLLM instance.
320320

321321
**GET** `/v2/vllm/instances/{instance_id}/log`
322322

323-
Retrieve stdout/stderr logs from a specific vLLM instance starting from a specific byte position.
323+
Retrieve stdout/stderr logs from a specific vLLM instance as raw bytes.
324324

325325
**Path Parameters:**
326326

327327
- `instance_id`: ID of the instance
328328

329-
**Query Parameters:**
329+
**Request Headers:**
330330

331-
- `start_byte` (optional): Byte position to start reading from (default: 0, minimum: 0). The largest valid value is the total number of bytes captured so far (i.e., the length of the log). Use this to continue reading from where you left off.
332-
- `max_bytes` (optional): Maximum bytes of log data to retrieve from start_byte (default: 1048576 (1 MB), range: 1024-10485760 (10 MB))
331+
- `Range` (optional): Byte range to retrieve, following [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110#name-range-requests). Supported formats:
332+
- `Range: bytes=START-END` — retrieve bytes from START to END (both inclusive)
333+
- `Range: bytes=START-` — retrieve bytes from START to end of log (up to 1 MB)
334+
- Suffix ranges (`bytes=-N`) are **not** supported.
333335

334-
**Response (200 OK):**
336+
**Response (200 OK) — without Range header:**
335337

336-
```json
337-
{
338-
"log": "INFO: Started server process\nINFO: Waiting for application startup\nINFO: Application startup complete\n"
339-
}
340-
```
338+
Returns the full log content (up to 1 MB) as `application/octet-stream`.
341339

342-
**Response Fields:**
340+
**Response (206 Partial Content) — with Range header:**
343341

344-
- `log`: Log content as a single string. Since `start_byte` is a byte offset but the JSON response contains unicode characters, the client must encode the string back to UTF-8 to compute the correct byte length for the next `start_byte` (e.g., `start_byte + len(log.encode("utf-8"))` in Python). Using the character length directly will produce incorrect offsets if the log contains multi-byte characters.
342+
Returns the requested byte range as `application/octet-stream` with a `Content-Range` header:
343+
344+
```
345+
Content-Range: bytes START-END/TOTAL
346+
Content-Type: application/octet-stream
347+
```
345348

346349
**Error Responses:**
347350

351+
- `400 Bad Request`: Malformed or unsupported Range header
348352
- `404 Not Found`: Instance not found
349-
- `416 Range Not Satisfiable`: The requested `start_byte` is beyond available log content. The response includes a `Content-Range: bytes */N` header (per [RFC 9110 §15.5.17](https://www.rfc-editor.org/rfc/rfc9110#status.416)) and a JSON body: `{"available_bytes": N}`, where `N` is the total number of bytes captured so far (i.e., the largest valid value of `start_byte`).
353+
- `416 Range Not Satisfiable`: The requested start position is beyond available log content. The response includes a `Content-Range: bytes */N` header (per [RFC 9110 §15.5.17](https://www.rfc-editor.org/rfc/rfc9110#status.416)) with an empty body, where `N` is the total number of bytes captured so far.
350354

351355
---
352356

@@ -524,38 +528,36 @@ curl http://localhost:8001/v2/vllm/instances
524528
### Example 5: Retrieve Instance Logs
525529

526530
```bash
527-
# Get up to 1 MB of logs from the beginning (default)
531+
# Get up to 1 MB of logs from the beginning (no Range header → 200 OK)
528532
curl http://localhost:8001/v2/vllm/instances/abc123.../log
529533

530-
# Get up to 500 KB of logs from the beginning
531-
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?max_bytes=512000"
532-
533-
# Continue reading from where you left off (streaming logs)
534-
# First request - get initial logs
535-
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=0&max_bytes=1048576"
536-
# To continue, use start_byte + len(log.encode("utf-8")) as the next start_byte
537-
# (encode to UTF-8 first to get byte length, not character length)
534+
# Get the first 1 MB chunk (Range header → 206 Partial Content)
535+
curl -H "Range: bytes=0-1048575" \
536+
http://localhost:8001/v2/vllm/instances/abc123.../log
538537

539-
# Second request - get next chunk
540-
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=1048576&max_bytes=1048576"
538+
# Second chunk — continue from byte 1048576
539+
curl -H "Range: bytes=1048576-2097151" \
540+
http://localhost:8001/v2/vllm/instances/abc123.../log
541541

542-
# Third request - continue from new position
543-
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=2097152&max_bytes=1048576"
542+
# Open-ended range — from byte 2097152 to EOF (up to 1 MB)
543+
curl -H "Range: bytes=2097152-" \
544+
http://localhost:8001/v2/vllm/instances/abc123.../log
544545
```
545546

546-
**How `start_byte` Works:**
547+
**How the Range header works:**
547548

548-
The log is treated as a flat byte stream. The `start_byte` parameter specifies the exact byte offset to begin reading from, and `max_bytes` limits how many bytes are returned:
549+
The log is treated as a flat byte stream. The `Range` header specifies which bytes to retrieve:
549550

550551
```
551552
Example: 30 bytes of log content
552553
553-
start_byte=0 → Returns bytes [0, max_bytes)
554-
start_byte=15 → Returns bytes [15, 15 + max_bytes)
555-
start_byte=15, max_bytes=5 → Returns bytes [15, 20)
554+
No Range header → 200 OK, returns bytes [0, 30)
555+
Range: bytes=0-14 → 206, returns bytes [0, 15)
556+
Range: bytes=15-29 → 206, returns bytes [15, 30)
557+
Range: bytes=15- → 206, returns bytes [15, 30) (open-ended)
556558
```
557559

558-
To stream logs, use `start_byte + len(log.encode("utf-8"))` as the `start_byte` for the next request, since `start_byte` is a byte offset and the JSON response contains unicode characters.
560+
The `Content-Range` response header tells you exactly which bytes were returned and the current total log length (which may grow over time), e.g. `Content-Range: bytes 0-1048575/5242880`.
559561

560562
## Configuration
561563

@@ -629,7 +631,7 @@ Represents a single vLLM instance with its process and configuration.
629631
- `start()`: Start the vLLM process
630632
- `stop(timeout=10)`: Stop the vLLM process gracefully (or force kill after timeout)
631633
- `get_status()`: Get detailed status information
632-
- `get_logs(start_byte=0, max_bytes=1048576)`: Retrieve logs from the instance starting from a byte position
634+
- `get_log_bytes(start=0, end=None)`: Retrieve log bytes from the instance (start and end are both inclusive), returns `(bytes, total_size)`
633635

634636
#### `VllmMultiProcessManager`
635637

@@ -642,7 +644,7 @@ Manages multiple VllmInstance objects.
642644
- `stop_all_instances(timeout=10)`: Stop all running instances
643645
- `get_instance_status(instance_id)`: Get status of a specific instance
644646
- `get_all_instances_status()`: Get status of all instances
645-
- `get_instance_logs(instance_id, start_byte=0, max_bytes=1048576)`: Retrieve logs from a specific instance starting from a byte position
647+
- `get_instance_log_bytes(instance_id, start=0, end=None)`: Retrieve log bytes from a specific instance, returns `(bytes, total_size)`
646648

647649
## Best Practices
648650

@@ -698,36 +700,41 @@ Be mindful of system resources:
698700

699701
### 5. Log Management
700702

701-
The launcher captures stdout/stderr from each vLLM instance using a multiprocessing queue that feeds into a persistent in-memory byte buffer (`_log_buffer`) per instance:
703+
The launcher captures stdout/stderr from each vLLM instance by writing directly to a log file on disk:
702704

703-
- **Architecture**: A `QueueWriter` in the child process sends messages to a bounded queue (`MAX_QUEUE_SIZE = 5000` messages, a Python constant defined in `launcher.py`). On read, the launcher drains the queue into the instance's `_log_buffer`, which accumulates all log bytes.
704-
- **Byte-Based Retrieval**: The `start_byte` parameter is a byte offset into the accumulated buffer. The `max_bytes` parameter limits how many bytes are returned per request.
705-
- **Queue Overflow**: When the queue is full, new messages are dropped (non-blocking put). Messages already drained into the buffer are preserved.
705+
- **Architecture**: The child process redirects stdout and stderr at the OS level using `os.dup2`, so all output — including from vLLM, uvicorn, and C extensions — is captured to a per-instance log file (`/tmp/launcher-<pid>-vllm-<instance_id>.log`). The file is opened with `O_APPEND` so concurrent writes from stdout and stderr are safe.
706+
- **Raw Bytes**: The log endpoint returns `application/octet-stream` — raw bytes, not JSON.
707+
- **Range Header**: Use the standard HTTP `Range: bytes=START-END` header to request specific byte ranges. Without a Range header, the full log (up to 1 MB) is returned.
708+
- **No Data Loss**: Since logs are written directly to disk, there is no bounded queue that could overflow and drop messages.
706709
- **Non-blocking**: Log capture doesn't slow down the vLLM process.
707-
- **Streaming Support**: Use `start_byte` parameter to efficiently stream logs without re-reading.
710+
- **Streaming Support**: Use the `Content-Range` response header to track position for efficient streaming.
711+
- **Cleanup**: Log files are automatically removed when an instance is stopped or deleted.
708712

709713
**Best Practices:**
710714

711-
- **Streaming Logs**: Use `start_byte` to efficiently stream logs. Since `start_byte` is a byte offset but the JSON response contains unicode characters, compute the byte length by encoding the string back to UTF-8:
715+
- **Streaming Logs**: Use the Range header to stream logs efficiently. The `Content-Range` response header tells you the byte range and total size:
712716

713717
```python
714718
# Python example
715719
import requests
716720

717-
start_byte = 0
721+
start = 0
718722
while True:
719-
resp = requests.get(f"http://localhost:8001/v2/vllm/instances/id/log?start_byte={start_byte}")
720-
log = resp.json()["log"]
721-
if not log:
723+
resp = requests.get(
724+
f"http://localhost:8001/v2/vllm/instances/id/log",
725+
headers={"Range": f"bytes={start}-"},
726+
)
727+
if resp.status_code == 416:
728+
break # No new content
729+
data = resp.content
730+
if not data:
722731
break
723-
start_byte += len(log.encode("utf-8"))
732+
start += len(data)
724733
```
725734

726-
- **Polling**: Track `start_byte + len(log.encode("utf-8"))` between requests to only fetch new log content
727-
- **Memory Efficiency**: Use `max_bytes` parameter to limit response size (default: 1 MB, max: 10 MB)
728-
- **Data Loss**: Logs are lost when an instance is deleted
735+
- **Polling**: Track `start + len(response.content)` between requests to only fetch new content
736+
- **Data Loss**: Logs are lost when an instance is deleted (the log file is removed)
729737
- **Production**: Consider external logging solutions for long-term storage and analysis
730-
- **Byte Tracking**: The `start_byte` position is relative to all logs ever captured, not just current queue content
731738

732739
### 6. Testing
733740

0 commit comments

Comments
 (0)