Skip to content

Dynamic sched #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Dynamic sched #1369

wants to merge 11 commits into from

Conversation

diptanu
Copy link
Collaborator

@diptanu diptanu commented Apr 21, 2025

Dynamic Scheduling WIP

config.memory_bytes = resources.memory_bytes
config.disk_bytes = resources.disk_bytes
config.gpu_count = resources.gpu_count
config.gpu_count = resources.gpu.count if resources.HasField("gpu") else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always 0 if no gpu is needed. This is a sensible default for this value. No need to change this.

disk_mb=0, # TODO: Implement for Linux and MacOS hosts
gpus=self._gpu_allocator.list_all(),
)
return self.free_resources(logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. We should return total resources here. I'll fix this.

memory_mb=0, # TODO: Implement for Linux and MacOS hosts
disk_mb=0, # TODO: Implement for Linux and MacOS hosts
gpus=self._gpu_allocator.list_free(),
cpu_count=int(cpu_count()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to implement free_resources for now, I'll leave a TODO here. Also this is not free resources.

gpus=self._gpu_allocator.list_free(),
cpu_count=int(cpu_count()),
memory_mb=int(virtual_memory().total / 1024 / 1024),
disk_mb=int(disk_usage("/").total / 1024 / 1024),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE filesystem dirs won't be necessarily located on filesystem mounted at /. I'll make the FS path a configuration parameter via CLI and pass it into HostResourcesProvider.

resources: FunctionExecutorResources = function_executor_description.resources
config.cpu_ms_per_sec = resources.cpu_ms_per_sec
resources: HostResources = function_executor_description.resources
config.cpu_ms_per_sec = resources.cpu_count * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem here. cpu_count is integer while customers can set fractions of CPU in their function resources. It looks like this is why FE Resources were introduced. There's going to be even more difference of HostResources and FE resources in the future. So I think we should keep FE resources struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants