Skip to content

Commit 3fe0e1d

Browse files
authored
Resolve merge conflicts and simplify README (#796)
Remove the detailed dmrlet documentation section that was added during merge conflict resolution. This includes the feature comparison table, extensive usage examples, supported backends list, and architecture diagram. The README now contains only a brief introduction to dmrlet with a minimal usage example to keep the documentation concise and focused on the core docker-model-runner project. Signed-off-by: Eric Curtin <eric.curtin@docker.com>
1 parent 93bf790 commit 3fe0e1d

File tree

1 file changed

+1
-91
lines changed

1 file changed

+1
-91
lines changed

README.md

Lines changed: 1 addition & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -402,21 +402,10 @@ in the form of [a Helm chart and static YAML](charts/docker-model-runner/README.
402402
If you are interested in a specific Kubernetes use-case, please start a
403403
discussion on the issue tracker.
404404
405-
<<<<<<< Updated upstream
406-
=======
407405
## dmrlet: Container Orchestrator for AI Inference
408406
409407
dmrlet is a purpose-built container orchestrator for AI inference workloads. Unlike Kubernetes, it focuses exclusively on running stateless inference containers with zero configuration overhead. Multi-GPU mapping "just works" without YAML, device plugins, or node selectors.
410408
411-
### Key Features
412-
413-
| Feature | Kubernetes | dmrlet |
414-
|---------|------------|--------|
415-
| Multi-GPU setup | Device plugins + node selectors + resource limits YAML | `dmrlet serve llama3 --gpus all` |
416-
| Config overhead | 50+ lines of YAML minimum | Zero YAML, CLI-only |
417-
| Time to first inference | Minutes (pod scheduling, image pull) | Seconds (model already local) |
418-
| Model management | External (mount PVCs, manage yourself) | Integrated with Docker Model Runner store |
419-
420409
### Building dmrlet
421410
422411
```bash
@@ -429,91 +418,12 @@ go build -o dmrlet ./cmd/dmrlet
429418
430419
### Usage
431420
432-
**Start the daemon:**
433-
```bash
434-
# Start in foreground
435-
dmrlet daemon
436-
437-
# With custom socket path
438-
dmrlet daemon --socket /tmp/dmrlet.sock
439-
```
440-
441421
**Serve a model:**
442422
```bash
443423
# Auto-detect backend and GPUs
444-
dmrlet serve llama3.2
445-
446-
# Specify backend
447-
dmrlet serve llama3.2 --backend vllm
448-
449-
# Specify GPU allocation
450-
dmrlet serve llama3.2 --gpus 0,1
451-
dmrlet serve llama3.2 --gpus all
452-
453-
# Multiple replicas
454-
dmrlet serve llama3.2 --replicas 2
455-
456-
# Backend-specific options
457-
dmrlet serve llama3.2 --ctx-size 4096 # llama.cpp context size
458-
dmrlet serve llama3.2 --gpu-memory 0.8 # vLLM GPU memory utilization
459-
```
460-
461-
**List running models:**
462-
```bash
463-
dmrlet ps
464-
# MODEL BACKEND REPLICAS GPUS ENDPOINTS STATUS
465-
# llama3.2 llama.cpp 1 [0,1,2,3] localhost:30000 healthy
466-
```
467-
468-
**View logs:**
469-
```bash
470-
dmrlet logs llama3.2 # Last 100 lines
471-
dmrlet logs llama3.2 -f # Follow logs
472-
```
473-
474-
**Scale replicas:**
475-
```bash
476-
dmrlet scale llama3.2 4 # Scale to 4 replicas
477-
```
478-
479-
**Stop a model:**
480-
```bash
481-
dmrlet stop llama3.2
482-
dmrlet stop --all # Stop all models
483-
```
484-
485-
**Check status:**
486-
```bash
487-
dmrlet status
488-
# DAEMON: running
489-
# SOCKET: /var/run/dmrlet.sock
490-
#
491-
# GPUs:
492-
# GPU 0: NVIDIA A100 80GB 81920MB (in use: llama3.2)
493-
# GPU 1: NVIDIA A100 80GB 81920MB (available)
494-
#
495-
# MODELS: 1 running
496-
```
497-
498-
### Supported Backends
499-
500-
- **llama.cpp** - Default backend for GGUF models
501-
- **vLLM** - High-throughput serving for safetensors models
502-
- **SGLang** - Fast serving with RadixAttention
503-
504-
### Architecture
505-
506-
```
507-
dmrlet daemon
508-
├── GPU Manager - Auto-detect and allocate GPUs
509-
├── Container Manager - Docker-based container lifecycle
510-
├── Service Registry - Endpoint discovery with load balancing
511-
├── Health Monitor - Auto-restart unhealthy containers
512-
├── Auto-scaler - Scale based on QPS/latency/GPU utilization
513-
└── Log Aggregator - Centralized log collection
424+
dmrlet serve gemma3
514425
```
515426
516-
>>>>>>> Stashed changes
517427
## Community
518428
519429
For general questions and discussion, please use [Docker Model Runner's Slack channel](https://dockercommunity.slack.com/archives/C09H9P5E57B).

0 commit comments

Comments
 (0)