Skip to content

Update sky show-gpus #1752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Update sky show-gpus #1752

wants to merge 5 commits into from

Conversation

ewzeng
Copy link
Collaborator

@ewzeng ewzeng commented Mar 9, 2023

This pr updates sky show-gpus and fixes #1733.

Changes:

  • sky show-gpus [--cloud CLOUD] [--region REGION] shows all gpus satisfying the query, but does not show any detailed information (e.g. specs, pricing)
  • Using --all shows detailed information
  • A10, A10G are part of COMMON_GPU (@WoosukKwon I forgot which ones you wanted to move out of COMMON_GPU, could you remind me?)

Tested:

  • sky show-gpus
  • sky show-gpus -a
  • sky show-gpus --cloud lambda
  • sky show-gpus --cloud lambda --region us-east-1
  • sky show-gpus --cloud lambda -a
  • sky show-gpus --cloud gcp
  • sky show-gpus --cloud gcp -a
  • sky show-gpus A100

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ewzeng ! Looks like after we add tpu-v4 into our catalog, we need to update tpu list here

'tpu-v3-512', 'tpu-v3-1024', 'tpu-v3-2048'

But the bigger issue is sky show-gpus can be too long... I think we should consider hiding some tpu types or OTHER_GPU by default and show hint msg. wdyt?

(sky-tmp) gcpuser@ray-dev-new-head-7cfd1b53-compute:~/tmp/skypilot$ sky show-gpus
COMMON_GPU  AVAILABLE_QUANTITIES
V100        1, 2, 4, 8
V100-32GB   8
A100        1, 2, 4, 8, 16
A100-80GB   1, 2, 4, 8
P100        1, 2, 4
K80         1, 2, 4, 8, 16
T4          1, 2, 4, 8
M60         1, 2, 4
A10         1, 2
A10G        1, 4, 8

GOOGLE_TPU   AVAILABLE_QUANTITIES
tpu-v2-8     1
tpu-v2-32    1
tpu-v2-128   1
tpu-v2-256   1
tpu-v2-512   1
tpu-v3-8     1
tpu-v3-32    1
tpu-v3-64    1
tpu-v3-128   1
tpu-v3-256   1
tpu-v3-512   1
tpu-v3-1024  1
tpu-v3-2048  1

OTHER_GPU        AVAILABLE_QUANTITIES
A6000            1, 2, 4
Gaudi HL-205     8
K520             1, 4
P4               1, 2, 4
P40              1, 2, 4
RTX6000          1
Radeon MI25      1
Radeon Pro V520  1, 2, 4
T4g              1, 2
tpu-v4-1024      1
tpu-v4-1152      1
tpu-v4-128       1
tpu-v4-1280      1
tpu-v4-1408      1
tpu-v4-1536      1
tpu-v4-16        1
tpu-v4-1664      1
tpu-v4-1792      1
tpu-v4-1920      1
tpu-v4-2048      1
tpu-v4-2176      1
tpu-v4-2304      1
tpu-v4-2432      1
tpu-v4-256       1
tpu-v4-2560      1
tpu-v4-2688      1
tpu-v4-2816      1
tpu-v4-2944      1
tpu-v4-3072      1
tpu-v4-32        1
tpu-v4-3200      1
tpu-v4-3328      1
tpu-v4-3456      1
tpu-v4-3584      1
tpu-v4-3712      1
tpu-v4-384       1
tpu-v4-3840      1
tpu-v4-3968      1
tpu-v4-512       1
tpu-v4-64        1
tpu-v4-640       1
tpu-v4-768       1
tpu-v4-8         1
tpu-v4-896       1

'P100',
'K80',
'T4',
'M60',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought our consensus in the meeting was to remove M60 and P100?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. You are right.

@infwinston
Copy link
Member

It's a little bit counterintuitive to me that sky show-gpus --all show less information than sky show-gpus T4. wdyt?

$ sky show-gpus --all
GPU  QTY  CLOUD  INSTANCE_TYPE          vCPUs  HOST_MEMORY  HOURLY_PRICE  HOURLY_SPOT_PRICE
T4   1    AWS    g4dn.xlarge            4      16GB         $ 0.526       $ 0.158
T4   1    AWS    g4dn.2xlarge           8      32GB         $ 0.752       $ 0.227
T4   1    AWS    g4dn.4xlarge           16     64GB         $ 1.204       $ 0.361
T4   1    AWS    g4dn.8xlarge           32     128GB        $ 2.176       $ 0.653
T4   1    AWS    g4dn.16xlarge          64     256GB        $ 4.352       $ 1.306
T4   4    AWS    g4dn.12xlarge          48     192GB        $ 3.912       $ 1.177
T4   8    AWS    g4dn.metal             96     384GB        $ 7.824       $ 2.347
T4   1    Azure  Standard_NC4as_T4_v3   4      28GB         $ 0.526       $ 0.064
T4   1    Azure  Standard_NC8as_T4_v3   8      56GB         $ 0.752       $ 0.092
T4   1    Azure  Standard_NC16as_T4_v3  16     110GB        $ 1.204       $ 0.147
T4   4    Azure  Standard_NC64as_T4_v3  64     440GB        $ 4.352       $ 0.531
T4   1    GCP    (attachable)           -      -            $ 0.350       $ 0.070
T4   2    GCP    (attachable)           -      -            $ 0.700       $ 0.140
T4   4    GCP    (attachable)           -      -            $ 1.400       $ 0.279

$ sky show-gpus T4
*NOTE*: for most GCP accelerators, INSTANCE_TYPE == (attachable) means the host VM's cost is not included.

GPU  QTY  CLOUD  INSTANCE_TYPE          vCPUs  HOST_MEMORY  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
T4   1    AWS    g4dn.xlarge            4      16GB         $ 0.526       $ 0.158            af-south-1
T4   1    AWS    g4dn.2xlarge           8      32GB         $ 0.752       $ 0.227            af-south-1
T4   1    AWS    g4dn.4xlarge           16     64GB         $ 1.204       $ 0.361            af-south-1
T4   1    AWS    g4dn.8xlarge           32     128GB        $ 2.176       $ 0.653            af-south-1
T4   1    AWS    g4dn.16xlarge          64     256GB        $ 4.352       $ 1.306            af-south-1
T4   4    AWS    g4dn.12xlarge          48     192GB        $ 3.912       $ 1.177            af-south-1
T4   8    AWS    g4dn.metal             96     384GB        $ 7.824       $ 2.347            af-south-1
T4   1    Azure  Standard_NC4as_T4_v3   4      28GB         $ 0.526       $ 0.064            eastus
T4   1    Azure  Standard_NC8as_T4_v3   8      56GB         $ 0.752       $ 0.092            eastus
T4   1    Azure  Standard_NC16as_T4_v3  16     110GB        $ 1.204       $ 0.147            eastus
T4   4    Azure  Standard_NC64as_T4_v3  64     440GB        $ 4.352       $ 0.531            eastus
T4   1    GCP    (attachable)           -      -            $ 0.350       $ 0.070            us-central1
T4   2    GCP    (attachable)           -      -            $ 0.700       $ 0.140            us-central1
T4   4    GCP    (attachable)           -      -            $ 1.400       $ 0.279            us-central1

GPU  QTY  CLOUD  INSTANCE_TYPE  vCPUs  HOST_MEMORY  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
T4g  1    AWS    g5g.xlarge     4      8GB          $ 0.420       $ 0.126            ap-northeast-1
T4g  1    AWS    g5g.2xlarge    8      16GB         $ 0.556       $ 0.167            ap-northeast-1
T4g  1    AWS    g5g.4xlarge    16     32GB         $ 0.828       $ 0.248            ap-northeast-1
T4g  1    AWS    g5g.8xlarge    32     64GB         $ 1.372       $ 0.412            ap-northeast-1
T4g  2    AWS    g5g.16xlarge   64     128GB        $ 2.744       $ 0.823            ap-northeast-1
T4g  2    AWS    g5g.metal      64     128GB        $ 2.744       $ 0.823            ap-northeast-1

@ewzeng
Copy link
Collaborator Author

ewzeng commented Mar 10, 2023

Hmm, I agree that it's a bit counterintuitive. I kept it that way because that was the original behavior, but I am ok with changing it.

@infwinston
Copy link
Member

Another issue is: do we support all global regions on AWS? as we are showing some Africa region af-south-1.

@ewzeng
Copy link
Collaborator Author

ewzeng commented Mar 10, 2023

I believe show-gpus shows the cheapest region, but if all the regions have the same prices, then show-gpus selects based on alphabetical order (which is why you get af-south-1

@infwinston
Copy link
Member

I believe show-gpus shows the cheapest region, but if all the regions have the same prices, then show-gpus selects based on alphabetical order (which is why you get af-south-1

I see. I mean can we sky launch a VM in af-south-1?

@ewzeng
Copy link
Collaborator Author

ewzeng commented Mar 10, 2023

Oooo. I just tried, and got

(sky) zeng@lenovo:sky$ sky launch --cloud aws --region af-south-1
ValueError: Invalid region 'af-south-1'
Did you mean one of these: 'ap-south-1'?

so currently no. Does the T4 catalog include the region af-south-1?

@concretevitamin concretevitamin added this to the v0.3 milestone Apr 6, 2023
@concretevitamin
Copy link
Member

Should we get this into 0.3? @ewzeng @infwinston

@github-actions
Copy link
Contributor

This PR is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 23, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[UI] sky show-gpus only shows common gpus
3 participants