Skip to content

Commit c47c25b

Browse files
[FEATURE] - Fetch stdout in launcher (#242)
* add Queue-based stdout capture Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Prevents memory overload from excessive logs Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Fix and add tests for launcher logs Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Update documentation Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Mike comment - Queue with bytes instead of lines Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Documentation Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Fix issue with queue Signed-off-by: Diego-Castan <diego.castan@ibm.com> * start_byte implementation Signed-off-by: Diego-Castan <diego.castan@ibm.com> * converted log retrieval from line-based array to byte-based string format Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Address PR review: simplify log API response and make byte-oriented - Remove redundant fields from log endpoint response (total_bytes, next_byte, instance_id, start_byte) — clients can derive these from the request and response log content - Return 416 instead of 500 when start_byte is beyond available content, with LogRangeNotAvailable exception - Rewrite get_logs_from_queue to be truly byte-oriented: concatenate all messages into a flat byte stream and slice, instead of message-boundary-based skipping - Update docs and tests to match simplified API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix problem terminal signal vllm cpu Signed-off-by: Diego-Castan <diego.castan@ibm.com> * fix error with launcher. Avoid EngineCore exiting Signed-off-by: Diego-Castan <diego.castan@ibm.com> * Mike's comments Signed-off-by: Diego-Castan <diego.castan@ibm.com> * More Mike's comments Signed-off-by: Diego-Castan <diego.castan@ibm.com> --------- Signed-off-by: Diego-Castan <diego.castan@ibm.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8988875 commit c47c25b

File tree

3 files changed

+727
-14
lines changed

3 files changed

+727
-14
lines changed

docs/launcher.md

Lines changed: 109 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ The launcher preloads vLLM’s Python modules to accelerate the initialization o
2828
- **Environment Variable Support**: Set custom environment variables per instance
2929
- **Graceful Shutdown**: Proper termination with configurable timeout and force-kill fallback
3030
- **Status Monitoring**: Query status of individual instances or all instances at once
31+
- **Log Capture**: Retrieve stdout/stderr logs from running instances via REST API
3132
- **Health Checks**: Built-in health endpoint for monitoring service availability
3233

3334
> [!NOTE]
@@ -130,7 +131,7 @@ make build-and-push-launcher CONTAINER_IMG_REG=$CONTAINER_IMG_REG
130131
### 1. Start the Launcher Service
131132

132133
```bash
133-
python vllm_launcher.py
134+
python launcher.py
134135
```
135136

136137
The service will start on `http://0.0.0.0:8001`
@@ -209,6 +210,7 @@ Get service information and available endpoints.
209210
"delete_all_instances": "DELETE /v2/vllm/instances",
210211
"get_instance_status": "GET /v2/vllm/instances/{instance_id}",
211212
"get_all_instances": "GET /v2/vllm/instances",
213+
"get_instance_logs": "GET /v2/vllm/instances/{instance_id}/log"
212214
}
213215
}
214216
```
@@ -314,6 +316,40 @@ Stop and delete a specific vLLM instance.
314316

315317
---
316318

319+
#### Get Instance Logs
320+
321+
**GET** `/v2/vllm/instances/{instance_id}/log`
322+
323+
Retrieve stdout/stderr logs from a specific vLLM instance starting from a specific byte position.
324+
325+
**Path Parameters:**
326+
327+
- `instance_id`: ID of the instance
328+
329+
**Query Parameters:**
330+
331+
- `start_byte` (optional): Byte position to start reading from (default: 0, minimum: 0). The largest valid value is the total number of bytes captured so far (i.e., the length of the log). Use this to continue reading from where you left off.
332+
- `max_bytes` (optional): Maximum bytes of log data to retrieve from start_byte (default: 1048576 (1 MB), range: 1024-10485760 (10 MB))
333+
334+
**Response (200 OK):**
335+
336+
```json
337+
{
338+
"log": "INFO: Started server process\nINFO: Waiting for application startup\nINFO: Application startup complete\n"
339+
}
340+
```
341+
342+
**Response Fields:**
343+
344+
- `log`: Log content as a single string. Since `start_byte` is a byte offset but the JSON response contains unicode characters, the client must encode the string back to UTF-8 to compute the correct byte length for the next `start_byte` (e.g., `start_byte + len(log.encode("utf-8"))` in Python). Using the character length directly will produce incorrect offsets if the log contains multi-byte characters.
345+
346+
**Error Responses:**
347+
348+
- `404 Not Found`: Instance not found
349+
- `416 Range Not Satisfiable`: The requested `start_byte` is beyond available log content. The response includes a `Content-Range: bytes */N` header (per [RFC 9110 §15.5.17](https://www.rfc-editor.org/rfc/rfc9110#status.416)) and a JSON body: `{"available_bytes": N}`, where `N` is the total number of bytes captured so far (i.e., the largest valid value of `start_byte`).
350+
351+
---
352+
317353
#### Delete All Instances
318354

319355
**DELETE** `/v2/vllm/instances`
@@ -485,6 +521,42 @@ curl -X POST http://localhost:8001/v2/vllm/instances \
485521
curl http://localhost:8001/v2/vllm/instances
486522
```
487523

524+
### Example 5: Retrieve Instance Logs
525+
526+
```bash
527+
# Get up to 1 MB of logs from the beginning (default)
528+
curl http://localhost:8001/v2/vllm/instances/abc123.../log
529+
530+
# Get up to 500 KB of logs from the beginning
531+
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?max_bytes=512000"
532+
533+
# Continue reading from where you left off (streaming logs)
534+
# First request - get initial logs
535+
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=0&max_bytes=1048576"
536+
# To continue, use start_byte + len(log.encode("utf-8")) as the next start_byte
537+
# (encode to UTF-8 first to get byte length, not character length)
538+
539+
# Second request - get next chunk
540+
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=1048576&max_bytes=1048576"
541+
542+
# Third request - continue from new position
543+
curl "http://localhost:8001/v2/vllm/instances/abc123.../log?start_byte=2097152&max_bytes=1048576"
544+
```
545+
546+
**How `start_byte` Works:**
547+
548+
The log is treated as a flat byte stream. The `start_byte` parameter specifies the exact byte offset to begin reading from, and `max_bytes` limits how many bytes are returned:
549+
550+
```
551+
Example: 30 bytes of log content
552+
553+
start_byte=0 → Returns bytes [0, max_bytes)
554+
start_byte=15 → Returns bytes [15, 15 + max_bytes)
555+
start_byte=15, max_bytes=5 → Returns bytes [15, 20)
556+
```
557+
558+
To stream logs, use `start_byte + len(log.encode("utf-8"))` as the `start_byte` for the next request, since `start_byte` is a byte offset and the JSON response contains unicode characters.
559+
488560
## Configuration
489561

490562
### vLLM Options
@@ -557,6 +629,7 @@ Represents a single vLLM instance with its process and configuration.
557629
- `start()`: Start the vLLM process
558630
- `stop(timeout=10)`: Stop the vLLM process gracefully (or force kill after timeout)
559631
- `get_status()`: Get detailed status information
632+
- `get_logs(start_byte=0, max_bytes=1048576)`: Retrieve logs from the instance starting from a byte position
560633

561634
#### `VllmMultiProcessManager`
562635

@@ -569,6 +642,7 @@ Manages multiple VllmInstance objects.
569642
- `stop_all_instances(timeout=10)`: Stop all running instances
570643
- `get_instance_status(instance_id)`: Get status of a specific instance
571644
- `get_all_instances_status()`: Get status of all instances
645+
- `get_instance_logs(instance_id, start_byte=0, max_bytes=1048576)`: Retrieve logs from a specific instance starting from a byte position
572646

573647
## Best Practices
574648

@@ -622,7 +696,40 @@ Be mindful of system resources:
622696
- **CPU**: vLLM uses CPU for pre/post-processing
623697
- **Disk**: Models are cached in the container's filesystem
624698

625-
### 5. Testing
699+
### 5. Log Management
700+
701+
The launcher captures stdout/stderr from each vLLM instance using a multiprocessing queue that feeds into a persistent in-memory byte buffer (`_log_buffer`) per instance:
702+
703+
- **Architecture**: A `QueueWriter` in the child process sends messages to a bounded queue (`MAX_QUEUE_SIZE = 5000` messages, a Python constant defined in `launcher.py`). On read, the launcher drains the queue into the instance's `_log_buffer`, which accumulates all log bytes.
704+
- **Byte-Based Retrieval**: The `start_byte` parameter is a byte offset into the accumulated buffer. The `max_bytes` parameter limits how many bytes are returned per request.
705+
- **Queue Overflow**: When the queue is full, new messages are dropped (non-blocking put). Messages already drained into the buffer are preserved.
706+
- **Non-blocking**: Log capture doesn't slow down the vLLM process.
707+
- **Streaming Support**: Use `start_byte` parameter to efficiently stream logs without re-reading.
708+
709+
**Best Practices:**
710+
711+
- **Streaming Logs**: Use `start_byte` to efficiently stream logs. Since `start_byte` is a byte offset but the JSON response contains unicode characters, compute the byte length by encoding the string back to UTF-8:
712+
713+
```python
714+
# Python example
715+
import requests
716+
717+
start_byte = 0
718+
while True:
719+
resp = requests.get(f"http://localhost:8001/v2/vllm/instances/id/log?start_byte={start_byte}")
720+
log = resp.json()["log"]
721+
if not log:
722+
break
723+
start_byte += len(log.encode("utf-8"))
724+
```
725+
726+
- **Polling**: Track `start_byte + len(log.encode("utf-8"))` between requests to only fetch new log content
727+
- **Memory Efficiency**: Use `max_bytes` parameter to limit response size (default: 1 MB, max: 10 MB)
728+
- **Data Loss**: Logs are lost when an instance is deleted
729+
- **Production**: Consider external logging solutions for long-term storage and analysis
730+
- **Byte Tracking**: The `start_byte` position is relative to all logs ever captured, not just current queue content
731+
732+
### 6. Testing
626733

627734
Test with small models first:
628735

0 commit comments

Comments
 (0)