Skip to content

WIP: Add JobManager #4287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: series/3.x
Choose a base branch
from

Conversation

BalmungSan
Copy link
Contributor

Implements the JobManager idea proposed in #1345


Roadmap

  • Define the API.
  • Implement it.
  • Add tests.
  • Add docs.

* Creates and launches the given `Job` in the background. If another Job with the same id was
* already running, it will be cancelled before starting this one.
*/
def startJob(id: Id, job: Resource[F, JobManager.Job[F, S]]): F[Unit]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one had, I like the idea of users being able to use their own Ids.
On the other, I think most users would benefit from a default using automatically generated UUIDs.

Should we provide such default in some way?

Comment on lines +34 to +38
/**
* Gets the status of the `Job` associated with the given `id`. If `id` doesn't exists or the
* `Job` already finished then the returned value will be a `None`.
*/
def getJobStatus(id: Id): F[Option[S]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat dislike the idea that both bad id and already finished return in a None.
But, otherwise, the map will grow indefinitely.

Any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By fully controlling IDs, we could have startJob return a fresh ID, so "bad ID" would never happen. (But then we'd lose the ability of users to have their own IDs...)

Comment on lines +40 to +44
/**
* Signals cancellation of the `Job` associated with the given `id`, and waits for its
* completion.
*/
def cancelJob(id: Id): F[Unit]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we provide a cacelAndForget variant where we don't wait on cancellation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be essentially cancelJob(...).start? If yes, I don't think we should add it (everyone can .start for themselves).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually yes, but also, since we already have a Dispatcher in place, it could be used to perform that rather than raw start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I think my point is: if it's somehow better (performance, safety, whatever) than just .start-ing, then yeah, maybe add it. If it's not better, then definitely not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, it is safer since its life cycle is attached to the Supervisor.
But I am not sure what canceling a cancel does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does "nothing", as a cancel is uncancelable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I guess as you said, it provides no value and users who don't want to wait on the cancellation can just start.

Comment on lines +52 to +63
trait Job[F[_], S] {

/**
* Starts the logic of this `Job`.
*/
def run: F[Unit]

/**
* Gets the status of this `Job`.
*/
def getStatus: F[S]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users would then implement this trait for their own Jobs.

}
}

supervisor.supervise(runJob).void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't wait for the Job to start before returning to the user.
But, that means there is a brief delay between starting the Job and users being able to query its status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's okay. I think the point of std is to solve tricky race conditions like this for users.

Copy link
Contributor Author

@BalmungSan BalmungSan Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.
What do you think would be the best way to solve that? A Deferred a CountdownLatch? Other thing?
Or maybe, simply run the runJob logic there rather that sending it to the Supervisor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm a little confused by why it's sent to the Supervisor... (I'm sure there is a reason, I just don't know it).

Copy link
Contributor Author

@BalmungSan BalmungSan Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is that if the same id was already found we cancel it, which is costly.
But, I think I could just send that to the Supervisor as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looking at the code, I realized that we also need to run the job setup (Resource.acquire) before being able to run the job itself.
So that was also part of what was being done in the background, but it also means that the short delay could be bigger.

Thus, I decided to use a Deferred to wait until the job has been properly registered before returning. But that may take a while.
Another idea that I just had would be to have an Initializing status that could be used meanwhile. However, that would complicate the logic quite a bit.

cancel = fiber.cancel
).some
)
.flatMap(_.traverse_(_.cancel)) >>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the same id was already used, we cancel the previous Job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants