You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Container caching** — Subsequent runs start in 2-4 minutes after initial build
41
43
-**Built-in monitoring** — View job status and logs in Google Cloud Console
42
44
-**Automatic cleanup** — Resources are released when jobs complete
45
+
-**Transparent errors** — Remote exceptions are re-raised locally with the original traceback
43
46
44
47
## Installation
45
48
@@ -65,7 +68,7 @@ cd keras-remote
65
68
pip install -e ".[cli]"
66
69
```
67
70
68
-
This adds the `keras-remote up`, `keras-remote down`, `keras-remote status`, and `keras-remote config` commands for provisioning and tearing down cloud resources.
71
+
This adds the `keras-remote up`, `keras-remote down`, `keras-remote status`, `keras-remote config`, and `keras-remote pool` commands for provisioning and managing cloud resources.
69
72
70
73
### Requirements
71
74
@@ -113,6 +116,19 @@ To view configuration:
113
116
keras-remote config
114
117
```
115
118
119
+
To manage accelerator node pools after initial setup:
120
+
121
+
```bash
122
+
# Add a node pool for a specific accelerator
123
+
keras-remote pool add --accelerator=v6e-8
124
+
125
+
# List current node pools
126
+
keras-remote pool list
127
+
128
+
# Remove a node pool by name
129
+
keras-remote pool remove <pool-name>
130
+
```
131
+
116
132
### 2. Set Environment Variables
117
133
118
134
Add to your shell profile (`~/.bashrc`, `~/.zshrc`, etc.):
Keras Remote automatically detects and installs dependencies on the remote worker.
191
207
208
+
> **Note:** JAX packages (`jax`, `jaxlib`, `libtpu`, `libtpu-nightly`) are automatically filtered from your `requirements.txt` to prevent overriding the accelerator-specific JAX installation. To keep a JAX line, append `# kr:keep` to it.
209
+
192
210
### Prebuilt Container Images
193
211
194
212
Skip container build time by using prebuilt images:
For multi-GPU configurations on GKE, append the count: `a100x4`, `l4x2`, etc.
343
366
367
+
### CPU
368
+
369
+
Use `accelerator="cpu"` to run on a CPU-only node (no accelerator attached).
370
+
371
+
### Multi-Host TPU (Pathways)
372
+
373
+
Multi-host TPU configurations (those requiring more than one node, such as `v2-16`, `v3-32`, or `v5p-16`) automatically use the [Pathways](https://cloud.google.com/tpu/docs/pathways-overview) backend. You can also set the backend explicitly:
0 commit comments