Skip to content

[Storage][Serve] Surface the error message when SkyServe failed on workdir storage #3951

Open
@cblmemo

Description

@cblmemo

On the latest master, the service with workdir will fail to sync up their workdir due to assertion errors. Services without workdir do not have this problem. This prevent a common usecase of sky serve and also let a bunch of smoke test to fail. We should fix this.

$ sky serve up examples/serve/http_server/task.yaml
Service from YAML spec: examples/serve/http_server/task.yaml
Service Spec:
Readiness probe method:           GET /health
Readiness initial delay seconds:  20
Readiness probe timeout seconds:  15
Replica autoscaling policy:       Fixed 2 replicas
Spot Policy:                      No spot fallback policy

Each replica will use the following resources (estimated):
I 09-16 20:57:39 optimizer.py:719] == Optimizer ==
I 09-16 20:57:39 optimizer.py:730] Target: minimizing cost
I 09-16 20:57:39 optimizer.py:742] Estimated cost: $0.0 / hour
I 09-16 20:57:39 optimizer.py:742] 
I 09-16 20:57:39 optimizer.py:867] Considered resources (1 node):
I 09-16 20:57:39 optimizer.py:937] --------------------------------------------------------------------------------------------------------
I 09-16 20:57:39 optimizer.py:937]  CLOUD        INSTANCE             vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE     COST ($)   CHOSEN   
I 09-16 20:57:39 optimizer.py:937] --------------------------------------------------------------------------------------------------------
I 09-16 20:57:39 optimizer.py:937]  Kubernetes   2CPU--2GB            2       2         -              kubernetes      0.00          ✔     
I 09-16 20:57:39 optimizer.py:937]  AWS          m6i.large            2       8         -              us-east-1       0.10                
I 09-16 20:57:39 optimizer.py:937]  Azure        Standard_D2s_v5      2       8         -              eastus          0.10                
I 09-16 20:57:39 optimizer.py:937]  GCP          n2-standard-2        2       8         -              us-central1-a   0.10                
I 09-16 20:57:39 optimizer.py:937]  RunPod       1x_RTXA4000_SECURE   6       16        RTXA4000:1     CA              0.34                
I 09-16 20:57:39 optimizer.py:937] --------------------------------------------------------------------------------------------------------
I 09-16 20:57:39 optimizer.py:937] 
Launching a new service 'sky-service-93ad'. Proceed? [Y/n]: 
I 09-16 20:57:42 controller_utils.py:600] Translating workdir to SkyPilot Storage...
I 09-16 20:57:42 controller_utils.py:625] Workdir 'examples/serve/http_server' will be synced to cloud storage 'skypilot-workdir-txia-5e091ebd'.
I 09-16 20:57:42 controller_utils.py:698] Uploading sources to cloud storage. See: sky storage ls
E 09-16 20:57:43 storage.py:902] Could not create StoreType.S3 store with name skypilot-workdir-txia-5e091ebd.
AssertionError: ('We only support one store type for now.', {})

Version & Commit info:

  • sky -c: e870839aeed16c118c0eb1f4889efc20006c27c4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions