-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
It's difficult to tell why this job failed, would be great to get more logs and info to debug:
% python examples/example_gke.py
2026-02-25 05:38:23.460649: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
============================================================
Keras Remote - GKE Examples
============================================================
--- Example 1: Simple Computation (CPU) ---
Running simple_computation(10, 20) on GKE...
Packaging function and context...
Payload serialized to /tmp/tmpg8kw64fo/payload.pkl
Context packaged to /tmp/tmpg8kw64fo/context.zip
No requirements.txt found
Building container image...
Using cached container: us-docker.pkg.dev/keras-team-gcp/keras-remote/base:cpu-22d410ef9d5d
View image: https://console.cloud.google.com/artifacts/docker/keras-team-gcp/us/keras-remote/base?project=keras-team-gcp
Uploading artifacts to Cloud Storage (job: job-f6bd5a77)...
Uploaded payload to gs://keras-team-gcp-keras-remote-jobs/job-f6bd5a77/payload.pkl
Uploaded context to gs://keras-team-gcp-keras-remote-jobs/job-f6bd5a77/context.zip
View artifacts: https://console.cloud.google.com/storage/browser/keras-team-gcp-keras-remote-jobs/job-f6bd5a77?project=keras-team-gcp
Submitting job to GKEBackend...
Submitted K8s job: keras-remote-job-f6bd5a77
View job with: kubectl get job keras-remote-job-f6bd5a77 -n default
View logs with: kubectl logs -l job-name=keras-remote-job-f6bd5a77 -n default
Job keras-remote-job-f6bd5a77 running...
Pod keras-remote-job-f6bd5a77-4rbwp logs:
Deleted K8s job: keras-remote-job-f6bd5a77
Downloading result...
Traceback (most recent call last):
File "~/jeffcarp/gh/keras-team/remote/examples/example_gke.py", line 165, in <module>
main()
~~~~^^
File "~/jeffcarp/gh/keras-team/remote/examples/example_gke.py", line 113, in main
result = simple_computation()
File "~/jeffcarp/gh/keras-team/remote/keras_remote/core/core.py", line 70, in wrapper
return _execute_on_gke(
func,
...<8 lines>...
env_vars,
)
File "~/jeffcarp/gh/keras-team/remote/keras_remote/core/core.py", line 127, in _execute_on_gke
return execute_remote(ctx, GKEBackend(cluster=cluster, namespace=namespace))
File "~/jeffcarp/gh/keras-team/remote/keras_remote/backend/execution.py", line 315, in execute_remote
raise job_error from None
File "~/jeffcarp/gh/keras-team/remote/keras_remote/backend/execution.py", line 300, in execute_remote
backend.wait_for_job(job, ctx)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "~/jeffcarp/gh/keras-team/remote/keras_remote/backend/execution.py", line 134, in wait_for_job
gke_client.wait_for_job(job, namespace=self.namespace)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/jeffcarp/gh/keras-team/remote/keras_remote/backend/gke_client.py", line 133, in wait_for_job
raise RuntimeError(f"GKE job {job_name} failed")
RuntimeError: GKE job keras-remote-job-f6bd5a77 failed
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels