Skip to content

Conversation

Jeremydupras
Copy link
Contributor

Description

Creates a history index that records when a job acquires a lock to execute. Upon completion the record is updated with a completion time. The History service is built into the lock service and completely within jobScheduler.

Logic path
->Extension plugin tries to acquire lock
if the lock is acquired then there is a new record added to the index. If not the nothing is added.
->status codes within the record show if the job finished execution
->upon completion the Extension plugin releases the lock and the record is updated to show the job completed

Example of the index contents

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".opendistro-job-scheduler-history",
        "_id" : ".scheduler_sample_extension-jobid1-1755118872",
        "_score" : 1.0,
        "_source" : {
          "job_index_name" : ".scheduler_sample_extension",
          "job_id" : "jobid1",
          "start_time" : 1755118872,
          "completion_status" : 0,
          "end_time" : 1755118872
        }
      },
      {
        "_index" : ".opendistro-job-scheduler-history",
        "_id" : ".scheduler_sample_extension-jobid1-1755118932",
        "_score" : 1.0,
        "_source" : {
          "job_index_name" : ".scheduler_sample_extension",
          "job_id" : "jobid1",
          "start_time" : 1755118932,
          "completion_status" : 0,
          "end_time" : 1755118932
        }
      }
    ]
  }
}

Related Issues

Resolves #808

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Jeremy Dupras added 9 commits August 7, 2025 11:47
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Jeremy Dupras added 2 commits August 13, 2025 14:20
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.99%. Comparing base (fedf867) to head (91c6e69).
⚠️ Report is 2 commits behind head on main.

❌ Your project status has failed because the head coverage (69.99%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
+ Coverage   69.80%   69.99%   +0.18%     
==========================================
  Files          38       38              
  Lines        1729     1733       +4     
  Branches      156      156              
==========================================
+ Hits         1207     1213       +6     
+ Misses        431      430       -1     
+ Partials       91       90       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jeremy Dupras <[email protected]>
Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jeremydupras . PR looks close. left few comments. Will take a second pass once all are addressed.

}, exception -> fail("Exception during update: " + exception.getMessage())));
}, exception -> fail("Exception during initial record: " + exception.getMessage())));

latch.await(15L, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we awaiting in tests, that too with arbitrary values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test are based directly off of the Lock service IT tests. Allows the test time for the async operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not await tests. You handle assertions in the listener block so that should cover the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the await

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The historyService has asynchronous methods when writing to in Index. The Await ensures that the method finished before evaluating the result.

Jeremy Dupras and others added 4 commits August 14, 2025 11:46
Jeremy Dupras added 3 commits August 14, 2025 16:02
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
Signed-off-by: Jeremy Dupras <[email protected]>
private final Instant startTime;
private final Instant endTime;
private final int status;
private final long seqNo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep track of seqNo and primaryTerm. I'd prefer to only keep references to the new logic being introduced and not keep track of these metadata fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design adds to the history list when the lock is created and updates the entry when the lock in released. If the sequence number and primary term are not tracked the history service should be changed to append only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its necessary to track these, but I suppose it doesn't hurt either.

FYI I see that the usage is here in the LockService.

UpdateRequest updateRequest = new UpdateRequest().index(LOCK_INDEX_NAME)
                .id(updateLock.getLockId())
                .setIfSeqNo(updateLock.getSeqNo())
                .setIfPrimaryTerm(updateLock.getPrimaryTerm())
                .setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE)
                .doc(updateLock.toXContent(XContentFactory.jsonBuilder(), ToXContent.EMPTY_PARAMS))
                .fetchSource(true);

According to the javadoc, this is the purpose:

/**
   * only perform this update request if the document's modification was assigned the given
   * sequence number. Must be used in combination with {@link #setIfPrimaryTerm(long)}
   *
   * If the document last modification was assigned a different sequence number a
   * {@link org.opensearch.index.engine.VersionConflictEngineException} will be thrown.
   */

@cwperks
Copy link
Member

cwperks commented Aug 15, 2025

Thank you for the pr @Jeremydupras ! This looks like it would be a nice feature to have and makes sense to introduce it behind a feature flag. There's a couple of comments I left that need to be addressed, but the pr mostly looks good.

Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left small comments, looks good otherwise.

ActionListener.wrap(success -> {}, listener::onFailure)
);
}
updateLock(lockToRelease, ActionListener.wrap(releasedLock -> listener.onResponse(releasedLock != null), listener::onFailure));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the call on this line independent of the one on line 341?

asking because both are executed in async manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 341 updates the history index with an end time and success status code. 350 releases the lock from the job which updates the lock index to "released = true"

Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Jeremydupras ! LGTM 🎊

@cwperks cwperks merged commit 1a6dbf4 into opensearch-project:main Aug 21, 2025
15 of 16 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[FEATURE] Job execution History index

3 participants