Skip to content

Commit c8ab431

Browse files
committed
feat: device-apiserver design doc
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
1 parent 412549b commit c8ab431

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

docs/design/device-apiserver.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Technical Specification: device-apiserver
2+
3+
1. Overview
4+
**Synopsis**: The Device API server validates and configures data for the hardware api objects which include GPUs. The API Server services gRPC operations and provides the frontend to a node’s shared hardware resource state through which all other local hardware-aware components interact.
5+
6+
2. Architecture
7+
The server implements the Kubernetes API provider pattern but is optimized for node-local footprint.
8+
9+
2.1 Storage Stack
10+
To avoid the overhead of a full Etcd cluster, the data layer utilizes:
11+
- **Kine**: An Etcd shim that translates Etcd v3 API calls into SQL queries.
12+
- **SQLite**: The database engine.
13+
- **Default**: In-memory for ephemeral runtime state.
14+
- **Optional**: Persistent file-based storage for state that must survive restarts.
15+
16+
2.2 State Semantics
17+
- **ResourceVersion (RV)**: Managed by Kine/SQLite. Increments on every write to provide Optimistic Concurrency Control.
18+
- **Generation**: Managed by the server. Increments only when `.Spec` is modified, signaling desired state changes.
19+
20+
3. gRPC Interface Definition
21+
The server exposes the standard CRUD+UpdateStatus+Patch+Watch interface for hardware resources.
22+
23+
| Method | Target | Scope |
24+
| :--- | :--- | :--- |
25+
| **CreateGpu** | Full Object | Spec/Metadata |
26+
| **UpdateGpu** | **Spec Only** | Spec/Metadata |
27+
| **PatchGpu** | Partial Object | Spec/Metadata |
28+
| **UpdateGpuStatus** | **Status Only** | Status |
29+
| **GetGpu** | Read-only | Single Resource |
30+
| **ListGpus** | Read-only | All Resources |
31+
| **WatchGpus** | Stream Events | All Resources |
32+
33+
4. Resource Schema
34+
All API objects follow the Kubernetes Resource Model (KRM) other than the following exceptions.
35+
36+
4.1 Metadata
37+
- `ObjectMeta`: A subset of `k8s.io/apimachinery/pkg/apis/meta/v1.ObjectMeta`.
38+
- `Name`
39+
- `ResourceVersion`
40+
- `Namespace`
41+
- `UID`
42+
- `Generation`
43+
- `CreationTimestamp`
44+
45+
- `ListMeta`: A subset of `k8s.io/apimachinery/pkg/apis/meta/v1.ListMeta`.
46+
- `ResourceVersion`
47+
48+
4.2 Options
49+
- `CreateOptions`: Not supported.
50+
51+
- `UpdateOptions`: Not supported.
52+
53+
- `DeleteOptions`: Not supported.
54+
55+
- `ListOptions`: A subset of `k8s.io/apimachinery/pkg/apis/meta/v1.ListOptions`.
56+
- `ResourceVersion`
57+
58+
5. Validation & Concurrency
59+
- **Immutability**: The server rejects updates attempting to change `metadata.name`, `metadata.uid` or `metadata.namespace`.
60+
- **Optimistic Concurrency**: If `incoming.ResourceVersion` does not match `current.ResourceVersion`, the server returns a `storage.Conflict` error, forcing the client to re-read and try again.
61+
- **No-Op**: Updates where the `incoming.Spec` matches `current.Spec` result in a successful return without a database write.
62+
63+
6. Internal Mechanics
64+
65+
6.1 Bootstrapping & Persistence
66+
The server's lifecycle is tied to the Kine-managed SQLite instance:
67+
- **In-Memory**: The SQLite database exists purely in memory. The `device-apiserver` starts with a blank slate. Another component (e.g., the device plugin) is responsible for re-discovering and re-registering the GPUs on every start.
68+
- **On-Disk**: The SQLite database exists in a single ordinary disk file (e.g., `/var/lib/device-apiserver/state.db`). The `device-apiserver` starts from the last successfully persisted state.
69+
70+
6.2 API Discovery & Registration
71+
The `device-apiserver` uses a decentralized registration pattern to manage its API surface. During startup, available APIs are automatically discovered and registered with both the storage backend and gRPC server.
72+
73+
7. Reliability
74+
- **Database Integrity**: SQLite's WAL (Write-Ahead Logging) mode is enabled by default to allow multiple concurrent readers and a single writer.
75+
76+
8. Observability
77+
The server's observability stack is designed for production-grade monitoring.
78+
79+
9.1 Prometheus Metrics
80+
- **Build Metadata (`device_apiserver_build_info`)**: A constant `Gauge` containing labels for `version`, `revision`, `build_date`, and `goversion`, `compiler`, and `platform`.
81+
- **Service Availability (`device_apiserver_service_status`)**: A `GuageVec` that tracks the serving state of internal sub-services (`1`: Serving / Ready, `2`: Not Serving / Storage Backend Disconnected).
82+
- **gRPC Performance (`grpc_server_*`)**: Standard `Histogram`s and `Counter`s via the `grpcprom` provider.
83+
- **Storage Backend (`kine_*`)**:
84+
- **`kine_sql_total`**: A `CounterVec` tracking the total number of SQL operations, labeled by `error_code`.
85+
- **`kine_sql_time_seconds`**: A `HistogramVec` providing the distribution of SQL execution times.
86+
- **`kine_compact_total`**: A `CounterVec` recording successful and failed history compactions.
87+
- **`kine_insert_errors_total`**: A `CounterVec` tracking retries due to unique constraint violations
88+
89+
9.2 Admin & Reflection
90+
- **gRPC Reflection**: Dynamic discovery of API schema.
91+
- **Health Checks**: Standard `grpc.health.v1` for liveness and readiness probes.
92+
- **Channelz**: Low-level socket and connection-level statistics.
93+
94+
10 Security
95+
// TODO
96+
97+
11 Performance
98+
// TODO

0 commit comments

Comments
 (0)