Skip to content

Implement task output caching in Indexify #1383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 7, 2025
Merged

Implement task output caching in Indexify #1383

merged 2 commits into from
May 7, 2025

Conversation

earhart
Copy link
Contributor

@earhart earhart commented Apr 25, 2025

Context

The idea here is to cache function outputs and reuse them in subsequent function calls iff requested.

Fixes #1339

What

This PR allows the client to specify a caching key when specifying graph functions; this indicates to the server that it's allowed to cache the function applications. A new TaskCache component observes output ingestion (creating cache entries), and intercepts tasks between creation and executor allocation; when it sees a cache hit, it skips the allocation, and augments the scheduler update with the cached task outputs; when the state machine observes the augmented scheduler update, it resolves the task as though it had just completed, generating output ingestion events to release downstream tasks.

Testing

Currently, fairly basic: running a workload multiple times. This PR needs further tests before it lands.

Contribution Checklist

  • If a Python package was changed, please run make fmt in the package directory.
  • If the server was changed, please run make fmt in server/.
  • Make sure all PR Checks are passing.

@earhart earhart requested review from diptanu and eabatalov April 25, 2025 19:58
@earhart
Copy link
Contributor Author

earhart commented Apr 25, 2025

This isn't quite ready to go in as-is, but if you'd like to give it an early review, the basic pieces are here. The cache needs to respect namespaces and report cache hits to the client; we may also want to add cache-clearing and monitoring APIs in this PR.

@earhart earhart force-pushed the earhart/caching branch 3 times, most recently from 5061d30 to 4a446e3 Compare April 29, 2025 00:12
@earhart earhart marked this pull request as ready for review April 29, 2025 01:22
Copy link
Collaborator

@diptanu diptanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more time to review. Leaving some comments here.

@earhart earhart force-pushed the earhart/caching branch 4 times, most recently from 0690799 to 902d389 Compare May 6, 2025 18:42
Copy link
Collaborator

@diptanu diptanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@earhart Looks great, have some small comments. Thanks for working on this very patiently!

if sched_update.cached_task_outputs.contains_key(&task.id) {
let _ =
self.task_event_tx
.send(InvocationStateChangeEvent::TaskMatchedCache(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a different event? This is for notifying the clients about what the scheduler is doing. We could add a new attribute to TaskCreated - cached_output: bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like having it as a different event: this way, TaskCreated always gets matched up with a corresponding TaskAssigned and TaskCompleted. Also, this event name is very prominent in the text output from the cli, so it's very, very obvious to users what's going on.

#[derive(Debug, Clone, Default)]
pub struct SchedulerUpdateRequest {
pub new_allocations: Vec<Allocation>,
pub remove_allocations: Vec<Allocation>,
pub updated_tasks: HashMap<TaskId, Task>,
pub cached_task_outputs: HashMap<TaskId, CachedTaskOutput>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we are also updating the TaskOutcome to completed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- that actually happens in task_cache.rs::try_allocate(), where we insert the task into cached_task_outputs.

@earhart earhart force-pushed the earhart/caching branch from fec8be4 to 2067080 Compare May 7, 2025 20:38
@earhart earhart merged commit 8eb476f into main May 7, 2025
9 checks passed
@earhart earhart deleted the earhart/caching branch May 7, 2025 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache Requests
2 participants