-
Notifications
You must be signed in to change notification settings - Fork 616
[RunPod] Use zone to provision in a specific data center ID #5166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this @Kovbo! Left a question below. : )
sky/provision/runpod/utils.py
Outdated
@@ -305,7 +305,7 @@ def launch(cluster_name: str, node_type: str, instance_type: str, region: str, | |||
'min_vcpu_count': 4 * gpu_quantity, | |||
'min_memory_in_gb': gpu_specs['memoryInGb'] * gpu_quantity, | |||
'gpu_count': gpu_quantity, | |||
'country_code': region, | |||
'data_center_id': region, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are changing this, do we need to update SkyPilot runpod catalog below as well to make the region points to data_center_id
?
https://github.com/skypilot-org/skypilot-catalog/blob/master/catalogs/v6/runpod/vms.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we need to update the region column and list region codes (datacenter codes) instead of countries.
Will contact RunPod to be sure, but I think the price in all regions is the same, as well as the available GPUs.
Should I change it?
Is there a more efficient way to structure the list, rather than displaying 20+ rows with identical information for different regions? For example, could we consolidate them into a single row with the regions comma-separated like here?
Also, the price seems to be outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome if you could help update the catalog as well, so that we can get this PR in. Either consolidating them or not should be fine. : )
If you would like conslidate them, you can update the way runpod catalog is loaded here.
Actually, I am wondering whether the data center id
is more similar to the availability zone concept in other clouds? Should we keep the region code and add another zone
column for the data center id?
For the outdated price, yes, a automatic catalog fetcher should be implemented like the other clouds did, so that the pricing / offerings could be updated automatically: https://github.com/skypilot-org/skypilot/tree/master/sky/clouds/service_catalog/data_fetchers
The fetcher could be done later if it takes much more effort : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I can update the catalog.
This is a good point about the zone! I don't have a strong opinion, but it may be a good idea to keep the current country as a region to keep it backward compatible and speed up provisioning if no region/zone is specified.
I assume that setting a zone parameter will automatically pass it to config.provider_config['availability_zone']
, and we can access it in the run_instances function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! You may edit the following function to return an additional zone information in the dict with key 'availability_zone'
:
Lines 186 to 194 in a60fe41
return { | |
'instance_type': instance_type, | |
'custom_resources': custom_resources, | |
'region': region.name, | |
'image_id': image_id, | |
'use_spot': use_spot, | |
'bid_per_gpu': str(hourly_cost), | |
'docker_username_for_runpod': docker_username_for_runpod, | |
} |
And add the same key to the runpod specific configuration template:
skypilot/sky/templates/runpod-ray.yml.j2
Line 11 in a60fe41
region: "{{region}}" |
similar to the other cloud:
skypilot/sky/templates/aws-ray.yml.j2
Line 32 in a60fe41
availability_zone: {{zones}} |
After those, in the run_instances function, you will be able to find the config in the config.provider_config['availability_zone']
: )
RunPod has a
data_center_id
property that enables deployment in a specific region.Currently, SkyPilot’s
region
property uses RunPod’scountry_code
property, which only allows selecting the deployment country. An additionalzone
parameter to deploy in a specific data center.