Skip to content

Add Options to scheduler stress#276

Open
chaptersix wants to merge 3 commits intomainfrom
alex/sch-adj
Open

Add Options to scheduler stress#276
chaptersix wants to merge 3 commits intomainfrom
alex/sch-adj

Conversation

@chaptersix
Copy link
Copy Markdown
Contributor

@chaptersix chaptersix commented Dec 18, 2025

Rework the scheduler_stress scenario for flexible ad-hoc load testing.

Changes:

  • Add skip-deletion, skip-creation, and worker-count options for controlling load patterns without rebuilding
  • Refactor execution into worker pool with channel-based schedule dispatch (configurable concurrency via worker-count)
  • Add executeWithExistingSchedules path to re-run operations against previously created schedules (skip-creation=true)
  • Add retryWithBackoff with exponential backoff and jitter for all schedule operations (create, describe, update, delete)
  • Per-operation 30s timeouts to avoid hanging on unresponsive calls
  • Set TriggerImmediately on schedule creation
  • Debug logging for schedule deletion lifecycle

@chaptersix chaptersix marked this pull request as ready for review March 25, 2026 21:05
@chaptersix chaptersix requested review from a team as code owners March 25, 2026 21:05
@chaptersix chaptersix changed the title first v2 load test Add Options to scheduler stress Mar 25, 2026
wg.Go(func() {
// Each worker keeps consuming from the channel until it's closed
for config := range scheduleConfigChan {
func() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be in a lambda, given that it's already in the wg.Go(func() { ... }) scope

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the lambda so I could use defer and maybe some variable scoping issues. I'll see if I can remove it

}(sc.ScheduleID, start)
}

<-ticker.C
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you could replace all of the ticker business with <-time.After(s.config.OperationInterval) here (or, probably, select over that and ctx.Done)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want the duration of he API call to impact the wait time between operations. there's a sizable latency different between v1/v2. though this should have a select.

Comment on lines +278 to +279
defer close(scheduleIDChan)
listErrChan <- s.listSchedules(ctx, client, scheduleIDChan, logger)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the iterator should be responsible for closing the channel, instead of the caller (and if the caller cancels the channel, the iterator should stop producing).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Comment on lines +295 to +307
if !s.config.SkipDeletion {
defer func(id string, startTime time.Time) {
dur := time.Until(startTime.Add(s.config.WaitTimeBeforeCleanup))
select {
case <-time.After(dur):
if err := s.deleteSchedule(ctx, client, id, logger); err != nil {
logger.Errorw("Failed to delete schedule", "scheduleID", id, "error", err)
}
case <-ctx.Done():
logger.Infow("Context canceled")
}
}(scheduleID, start)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be factored out to a helper.

if errors.As(err, &notFoundErr) {
// Return nil if schedule is not found (already deleted or never existed)
logger.Debug("Schedule not found during describe operation", "scheduleID", scheduleID)
// retryConfig defines retry behavior for schedule operations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comment? I don't see a retryConfig

)

// retryWithBackoff executes an operation with exponential backoff and jitter on context deadline errors
func retryWithBackoff(ctx context.Context, operation string, logger *zap.SugaredLogger, fn func() error) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the SDK's built in retry mechanism for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting a bunch of context cancellations and I don't have control over the sdk config. don't remember if the SDK was actually retrying or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants