-
Notifications
You must be signed in to change notification settings - Fork 66
Purging queue in hard-reset mode does not cope with already-running tasks
#52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,8 @@ import ( | |
| pduration "github.com/golang/protobuf/ptypes/duration" | ||
|
|
||
| tasks "google.golang.org/genproto/googleapis/cloud/tasks/v2" | ||
| codes "google.golang.org/grpc/codes" | ||
| status "google.golang.org/grpc/status" | ||
| ) | ||
|
|
||
| // Queue holds all internals for a task queue | ||
|
|
@@ -236,7 +238,7 @@ func (queue *Queue) Purge() *sync.WaitGroup { | |
| } | ||
|
|
||
| // Goes beyond `Purge` behaviour to synchronously delete all tasks and their name handles | ||
| func (queue *Queue) HardReset(s *Server) { | ||
| func (queue *Queue) HardReset(s *Server) error { | ||
| waitGroup := queue.Purge() | ||
| waitGroup.Wait() | ||
|
|
||
|
|
@@ -245,22 +247,53 @@ func (queue *Queue) HardReset(s *Server) { | |
| // - task.Delete() writes to a buffered `cancel` channel | ||
| // - task.Schedule() reads from that buffered channel in a separate goroutine | ||
| // - When that goroutine sees the task is cancelled, it sets the task value to nil in the tasks map | ||
| // - Additionally, if a task has already been dispatched then task.Delete() has no effect until after the current | ||
| // execution, which depends entirely on the response time of the task's target. | ||
| // | ||
| // We need to be certain that we only remove the task from map *after* that completes, otherwise the task name will | ||
| // be reinserted with the nil value. At the moment the only easy way I can think of is to sleep for a very short | ||
| // period to allow the tasks' internal goroutines to fire first. | ||
| time.Sleep(10 * time.Millisecond) | ||
| // be reinserted with the nil value. | ||
| isReadyChannel := make(chan bool, 1) | ||
| tryDeleteTasks := func() { | ||
| queue.tsMux.Lock() | ||
| defer queue.tsMux.Unlock() | ||
|
|
||
| queue.tsMux.Lock() | ||
| defer queue.tsMux.Unlock() | ||
| for taskName, task := range queue.ts { | ||
| if task != nil { | ||
| // The naive "sleep till it deletes" approach described above is too naive... | ||
| panic("Expected task to be deleted by now!") | ||
| hasAnyPending := false | ||
| for taskName, task := range queue.ts { | ||
| if task == nil { | ||
| // Task has already been deleted / ran to completion - safe to remove | ||
| delete(queue.ts, taskName) | ||
| s.hardDeleteTask(taskName) | ||
| } else { | ||
| // Task is still running (or the `cancel` channel has not fired) - will need to wait and retry | ||
| hasAnyPending = true | ||
| } | ||
| } | ||
| isReadyChannel <- !hasAnyPending | ||
| } | ||
|
|
||
| delete(queue.ts, taskName) | ||
| s.hardDeleteTask(taskName) | ||
| // The timeout applies across all iterations of the for loop. | ||
| // It is intentionally relatively short, because the internal retry interval is rapid. | ||
| // If calling code expects some task requests to last longer, it should handle the DEADLINE_EXCEEDED error and retry | ||
| // on a schedule to suit the application. | ||
| timeout := time.After(3 * time.Second) | ||
|
|
||
| go tryDeleteTasks() | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is there a need for a go routines here? Since there is no asynchronous operation, and the intent is for the caller to wait anyway, I believe a simple loop with a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably just my inexperience with go threading/async. I suspect you're right, I'll give it a go with a simpler loop. |
||
|
|
||
| for { | ||
| select { | ||
| case isReady := <-isReadyChannel: | ||
| if isReady { | ||
| // All tasks have been purged | ||
| return nil | ||
| } else { | ||
| // One or more tasks is not yet deleted, wait and retry. | ||
| time.Sleep(5 * time.Millisecond) | ||
| go tryDeleteTasks() | ||
| } | ||
| case <-timeout: | ||
| log.Println("HardReset timed out waiting for tasks to clear") | ||
| return status.Errorf(codes.DeadlineExceeded, "Timed out waiting for tasks to be purged") | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel this should be a custom error, and let the handler worry about the grpc response - what do you reckon?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to confirm, do you mean return a (non-grpc) error from the queue method and then let the emulator func convert that to the If you mean a custom GRPC response code, I'm not sure how to do that.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah sorry, yes I meant the former (non-grpc from the queue method). |
||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be made configurable to avoid the complexity of having to handle this - especially since this is non-standard / undocumented behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing is even if you want to wait longer, you probably don't want the emulator checking every 5ms for longer.
But it needs to be that fast at first, because with no tasks running I was finding updating the
cancelchannel etc could take 0-10ms so a longer retry interval would cause unnecessary delays.I guess the better solution would be a basic exponential backoff. Could probably implement that easily enough with the simpler sync loop you suggested.
Then it would be fine to make this configurable - presumably as an extra CLI option?
Although it does then mean more parameters to validate, is it valid to set this option without enabling hard-reset mode etc. So makes the emulator/emulator interface a bit more complex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree; tempted to say just keep it as you have it for now as it's non-core behaviour anyway. If there is a need I expect people will raise an issue and we can look at it then.