-
Notifications
You must be signed in to change notification settings - Fork 130
Implement task output caching in Indexify #1383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This isn't quite ready to go in as-is, but if you'd like to give it an early review, the basic pieces are here. The cache needs to respect namespaces and report cache hits to the client; we may also want to add cache-clearing and monitoring APIs in this PR. |
5061d30
to
4a446e3
Compare
4a446e3
to
6619412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more time to review. Leaving some comments here.
0690799
to
902d389
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@earhart Looks great, have some small comments. Thanks for working on this very patiently!
if sched_update.cached_task_outputs.contains_key(&task.id) { | ||
let _ = | ||
self.task_event_tx | ||
.send(InvocationStateChangeEvent::TaskMatchedCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be a different event? This is for notifying the clients about what the scheduler is doing. We could add a new attribute to TaskCreated - cached_output: bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda like having it as a different event: this way, TaskCreated always gets matched up with a corresponding TaskAssigned and TaskCompleted. Also, this event name is very prominent in the text output from the cli, so it's very, very obvious to users what's going on.
#[derive(Debug, Clone, Default)] | ||
pub struct SchedulerUpdateRequest { | ||
pub new_allocations: Vec<Allocation>, | ||
pub remove_allocations: Vec<Allocation>, | ||
pub updated_tasks: HashMap<TaskId, Task>, | ||
pub cached_task_outputs: HashMap<TaskId, CachedTaskOutput>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we are also updating the TaskOutcome to completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- that actually happens in task_cache.rs::try_allocate()
, where we insert the task into cached_task_outputs
.
Context
The idea here is to cache function outputs and reuse them in subsequent function calls iff requested.
Fixes #1339
What
This PR allows the client to specify a caching key when specifying graph functions; this indicates to the server that it's allowed to cache the function applications. A new TaskCache component observes output ingestion (creating cache entries), and intercepts tasks between creation and executor allocation; when it sees a cache hit, it skips the allocation, and augments the scheduler update with the cached task outputs; when the state machine observes the augmented scheduler update, it resolves the task as though it had just completed, generating output ingestion events to release downstream tasks.
Testing
Currently, fairly basic: running a workload multiple times. This PR needs further tests before it lands.
Contribution Checklist
make fmt
in the package directory.make fmt
inserver/
.