You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`utils/storage.py`| GCS upload/download/cleanup for job artifacts |
51
-
|`runner/remote_runner.py`| Runs inside container: deserialize, execute, upload result |
50
+
|`data/data.py`|`Data` class, content hashing, data ref serialization |
51
+
|`utils/packager.py`|`save_payload()` (cloudpickle), `zip_working_dir()`, Data ref extraction |
52
+
|`utils/storage.py`| GCS upload/download/cleanup for job artifacts and Data cache |
53
+
|`runner/remote_runner.py`| Runs inside container: resolve Data refs/volumes, execute, upload result |
52
54
|`cli/commands/pool.py`| Node pool add/remove/list commands |
53
55
|`cli/infra/post_deploy.py`| kubectl, LWS CRD, GPU driver setup after stack.up() |
54
56
|`cli/constants.py`| CLI defaults, paths, API list |
@@ -59,8 +61,53 @@ keras_remote/
59
61
-**`JobContext`** (`backend/execution.py`): Mutable dataclass carrying all job state through the pipeline — inputs, generated IDs, artifact paths, image URI.
60
62
-**`BaseK8sBackend`** (`backend/execution.py`): Base class with `submit_job`, `wait_for_job`, `cleanup_job`. Subclassed by `GKEBackend` and `PathwaysBackend`.
61
63
-**`GpuConfig` / `TpuConfig`** (`core/accelerators.py`): Frozen dataclasses for accelerator metadata. Single source of truth used by runtime, container builder, and CLI.
64
+
-**`Data`** (`data/data.py`): Wraps a local path or GCS URI. Passed as a function argument or via the `volumes` decorator parameter. Resolved to a plain filesystem path on the remote pod. Content-hashed for upload caching.
62
65
-**`InfraConfig` / `NodePoolConfig`** (`cli/config.py`): CLI provisioning configuration. `InfraConfig` holds project, zone, cluster name, and a list of `NodePoolConfig` entries. `NodePoolConfig` pairs a unique pool name (e.g., `gpu-l4-a3f2`) with a `GpuConfig` or `TpuConfig`.
63
66
67
+
## Data API
68
+
69
+
The `Data` class (`keras_remote.Data`) declares data dependencies for remote functions. It accepts local file/directory paths or GCS URIs (`gs://...`).
70
+
71
+
### Two usage patterns
72
+
73
+
**Function arguments** — `Data` objects passed as args/kwargs are uploaded to GCS, serialized as data ref dicts in the payload, and resolved to local paths on the pod:
74
+
75
+
```python
76
+
@keras_remote.run(accelerator="v3-8")
77
+
deftrain(data_dir, config_path):
78
+
...# data_dir and config_path are plain strings
79
+
80
+
train(Data("./dataset/"), Data("./config.json"))
81
+
```
82
+
83
+
**Volumes** — `Data` objects in the `volumes=` decorator parameter are downloaded to fixed mount paths before execution:
files = os.listdir("/data") # available at mount path
89
+
```
90
+
91
+
Both patterns can be combined. `Data` objects can also be nested inside lists, dicts, and other containers — they are recursively discovered and resolved.
92
+
93
+
### Content-addressed caching
94
+
95
+
Local `Data` objects are content-hashed (SHA-256 over sorted file contents). Uploads go to `gs://{bucket}/{namespace}/data-cache/{hash}/`. A `.cache_marker` sentinel enables O(1) cache-hit checks. Identical data is uploaded only once.
96
+
97
+
### Pipeline integration
98
+
99
+
During `_prepare_artifacts()`:
100
+
101
+
1. Upload `Data` from `volumes` and function args via `storage.upload_data()` (content-addressed)
102
+
1. Replace `Data` objects in args/kwargs with serializable `__data_ref__` dicts
103
+
1. Local `Data` paths inside the caller directory are auto-excluded from `context.zip`
104
+
105
+
On the remote pod (`remote_runner.py`):
106
+
107
+
1.`resolve_volumes()` — download volume data to mount paths
108
+
1.`resolve_data_refs()` — recursively resolve `__data_ref__` dicts in args/kwargs to local paths
109
+
1. Single-file `Data` resolves to the file path; directory `Data` resolves to the directory path
0 commit comments