OPENNLP-124: Maxent/Perceptron training should report progress back via an API #758

NishantShri4 · 2025-03-24T07:01:35Z

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

NishantShri4 · 2025-03-24T07:27:15Z

Hello @mawiesne / @kottmann . Good day!

Is this item waiting to be picked up? https://issues.apache.org/jira/browse/OPENNLP-124.
Here is my draft PR.
This PR is just to explain my current understanding of the Jira .
This needs further improvements (refactoring, integration with other model trainers, unit tests etc.).
I am happy to be part of the discussion and pick it up for implementation if the team approves.
Pls. let me know your views.

Attached the output of a couple of existing Tests (Perceptron Trainer) based on the integration with Console based TrainingProgressMonitor.
PerceptroTrainerUnitTests-LogOutput.txt

mawiesne · 2025-03-24T07:33:27Z

FYI: @jzonthemtn + @rzo1

rzo1

Thanks for the draft. I added some thoughts / comments.

opennlp-tools/src/main/java/opennlp/tools/monitoring/ConsoleTrainingProgressMonitor.java

...lp-tools/src/main/java/opennlp/tools/monitoring/PrevNIterationAccuracyLessThanTolerance.java

opennlp-tools/src/main/java/opennlp/tools/monitoring/StopCriteria.java

opennlp-tools/src/main/java/opennlp/tools/monitoring/TrainingProgressMonitor.java

opennlp-tools/src/main/java/opennlp/tools/ml/perceptron/PerceptronTrainer.java

NishantShri4 · 2025-04-13T23:17:56Z

hi @rzo1, @mawiesne,

Hope you are well. I have tried to fix the review comments. Would you pls. be able to review once more and direct towards the intended solution? Many thanks in advance.

Some queries and ToDos:

The jira description gives following prototype of finishedTraining() method.

finishedTraining(int iterations, int numberCorrectEvents, int totalEvents, StopCriteria stopCriteria);
}

Please can you clarify, what is the use of numberCorrectEvents and totalEvents parameters? In my current implementation, I have not used them, instead I found stopCriteria is sufficient. Pls. take a look.

ToDo : Update the documentation to include description of TPM and StopCriteria.

opennlp-tools/src/main/java/opennlp/tools/monitoring/StopCriteria.java

opennlp-tools/src/main/java/opennlp/tools/monitoring/TrainingProgressMonitor.java

opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java

NishantShri4 · 2025-04-16T17:36:00Z

Hi Reviewers - All checks are green now. This is available for review. If possible, pls. take a look.
One query - This PR has 10 commits. Before presenting it for review, Is it needed to club them into a single commit?

mawiesne · 2025-04-16T18:34:25Z

Hi Reviewers - All checks are green now. This is available for review. If possible, pls. take a look.

One query - This PR has 10 commits. Before presenting it for review, Is it needed to club them into a single commit?

Thx @NishantShri4 for moving this topic forward! Could you squash those commits and force push the resulting single commit? Once available, we'll have a detailed look and provide feedback.

…ia an API

NishantShri4 · 2025-04-16T23:00:16Z

Thanks @mawiesne. This is done (rebase, squash and force push).

mawiesne

Thx @NishantShri4 for providing this substantial contribution. I've left feedback by comments to further improve it. Once addressed, I'll re-check and potentially, @rzo1 can add his final thoughts/checks then.

opennlp-tools/pom.xml

opennlp-tools/src/main/java/opennlp/tools/commons/Trainer.java

opennlp-tools/src/main/java/opennlp/tools/ml/AbstractTrainer.java

opennlp-tools/src/main/java/opennlp/tools/ml/TrainerFactory.java

opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java

opennlp-tools/src/test/java/opennlp/tools/monitoring/IterDeltaAccuracyUnderToleranceTest.java

opennlp-tools/src/test/java/opennlp/tools/monitoring/LogLikelihoodThresholdBreachedTest.java

pom.xml

rzo1

LGTM.

We are currently discussing, if an ICLA is required due to the size of this contribution. Stay tuned 🙂

NishantShri4 · 2025-04-23T19:16:36Z

Thx @NishantShri4 for providing this substantial contribution. I've left feedback by comments to further improve it. Once addressed, I'll re-check and potentially, @rzo1 can add his final thoughts/checks then.

Thanks very much @mawiesne for the detailed review earlier. Very useful. I have pushed changes earlier to answer/fix the review comments. I could see two approvals available now. Thanks to the approvers for their time.

rzo1 · 2025-04-24T17:24:12Z

@NishantShri4 can you fill an ICLA for your contribution please?

Details can be found here https://www.apache.org/licenses/contributor-agreements.html

If you sign it, please add "OpenNLP" in the section "notify project". You don't need to add an Apache ID. Thanks!

NishantShri4 · 2025-04-24T22:37:22Z

@NishantShri4 can you fill an ICLA for your contribution please?

Details can be found here https://www.apache.org/licenses/contributor-agreements.html

If you sign it, please add "OpenNLP" in the section "notify project". You don't need to add an Apache ID. Thanks!

Thanks @rzo1. This is done (signed ICLA is sent to :[email protected]).

rzo1 · 2025-04-25T11:33:41Z

Thanks again (and of course, we truly appreciate your contribution)! We’ll go ahead and merge this PR once we get confirmation from the secretary.

commit 52955e9 Author: Nishant Shrivastava <[email protected]> Date: Sat Jun 14 18:50:09 2025 +0100 OPENNLP-1745: SentenceDetector - Add Junit test for useTokenEnd = false commit fe59eb9 Merge: 67ac7b2 05f69a4 Author: Nishant Shrivastava <[email protected]> Date: Sat Jun 14 07:29:36 2025 +0100 Merge remote-tracking branch 'origin/main' commit 05f69a4 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jun 9 11:49:38 2025 +0200 OPENNLP-1724: Update JUnit to 5.13.1 (apache#790) Bumps `junit.version` from 5.13.0 to 5.13.1. Updates `org.junit.jupiter:junit-jupiter-api` from 5.13.0 to 5.13.1 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1) Updates `org.junit.jupiter:junit-jupiter-engine` from 5.13.0 to 5.13.1 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1) Updates `org.junit.jupiter:junit-jupiter-params` from 5.13.0 to 5.13.1 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1) --- updated-dependencies: - dependency-name: org.junit.jupiter:junit-jupiter-api dependency-version: 5.13.1 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.junit.jupiter:junit-jupiter-engine dependency-version: 5.13.1 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.junit.jupiter:junit-jupiter-params dependency-version: 5.13.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 32f4ef7 Author: Richard Zowalla <[email protected]> Date: Sat Jun 7 21:21:36 2025 +0200 Disable merge request requirement for opennlp-2.x (apache#789) commit 8abfe0d Author: Richard Zowalla <[email protected]> Date: Sat Jun 7 20:45:08 2025 +0200 Remove code review requirement for 2.x branch to allow cherry picking already reviewed commits. (apache#788) commit 89e4260 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jun 2 09:17:43 2025 +0200 OPENNLP-1724: Update JUnit to 5.13.0 (apache#787) Bumps `junit.version` from 5.12.2 to 5.13.0. Updates `org.junit.jupiter:junit-jupiter-api` from 5.12.2 to 5.13.0 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0) Updates `org.junit.jupiter:junit-jupiter-engine` from 5.12.2 to 5.13.0 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0) Updates `org.junit.jupiter:junit-jupiter-params` from 5.12.2 to 5.13.0 - [Release notes](https://github.com/junit-team/junit5/releases) - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0) --- updated-dependencies: - dependency-name: org.junit.jupiter:junit-jupiter-api dependency-version: 5.13.0 dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.junit.jupiter:junit-jupiter-engine dependency-version: 5.13.0 dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.junit.jupiter:junit-jupiter-params dependency-version: 5.13.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2c8e58b Author: Martin Wiesner <[email protected]> Date: Sat May 24 20:59:20 2025 +0200 OPENNLP-1708: Raise OpenNLP version to 3.x on main branch (apache#785) * OPENNLP-1708: Raise OpenNLP version to 3.x on main branch - adjusts all pom.xml files towards 3.0.0-SNAPSHOT - adjusts upper major model version to 3.x - adds static method Version#between for simpler version range checks in BaseModel - adds 'opennlp-2.x' branch to protected branches in .asf.yml - updates README.md with infos on 'Branches and Merging Strategy' - cures a typo - adds external link to the ONNX website commit 0db3c10 Author: Richard Zowalla <[email protected]> Date: Tue May 20 21:35:27 2025 +0200 OPENNLP-1545 - Close ZipInputStream in BaseModel (apache#784) commit 2ed9949 Author: Martin Wiesner <[email protected]> Date: Tue May 20 16:26:59 2025 +0200 OPENNLP-1734: Adjust GH CI config to build with Java 25-ea (apache#781) commit 5eec98c Author: NishantShri4 <[email protected]> Date: Thu May 15 09:25:06 2025 +0100 OPENNLP-1731: Add Junits for NGramLanguageModelTool (apache#778) * OPENNLP-1731: Add Junits for NGramLanguageModelTool * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment. * OPENNLP-1731: Add Junits for NGramLanguageModelTool * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment. * OPENNLP-1731: Fixed a Generic RawType warning. * OPENNLP-1731: Rebased against upstream. * OPENNLP-1731: Rebased against upstream. * OPENNLP-1731: Rebased against upstream (removed extra new line). * OPENNLP-1731: Removed an extra newline. commit 67ac7b2 Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:23:00 2025 +0100 OPENNLP-1731: Removed an extra newline. commit 0d95dd9 Merge: 35de220 2580a20 Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:20:59 2025 +0100 Merge remote-tracking branch 'origin/main' # Conflicts: # opennlp-tools/src/test/java/opennlp/tools/AbstractLoggerTest.java commit 35de220 Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:19:54 2025 +0100 OPENNLP-1731: Rebased against upstream (removed extra new line). commit e09f2ad Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:18:29 2025 +0100 OPENNLP-1731: Rebased against upstream. commit 6d84e2f Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:16:21 2025 +0100 OPENNLP-1731: Rebased against upstream. commit 2580a20 Merge: 0a20ef5 46d2d78 Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:09:17 2025 +0100 Merge remote-tracking branch 'origin/main' # Conflicts: # opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java commit 0a20ef5 Author: Nishant Shrivastava <[email protected]> Date: Mon May 12 20:06:51 2025 +0100 OPENNLP-1731: Fixed a Generic RawType warning. commit cfa425f Author: Nishant Shrivastava <[email protected]> Date: Sun May 11 17:32:59 2025 +0100 OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment. commit a7eb44a Author: Nishant Shrivastava <[email protected]> Date: Sat May 10 23:24:16 2025 +0100 OPENNLP-1731: Add Junits for NGramLanguageModelTool commit f7be29d Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon May 12 20:35:53 2025 +0200 Minor: Regenerated NOTICE File for 21a2a2a (apache#783) Signed-off-by: GitHub <[email protected]> Co-authored-by: mawiesne <[email protected]> commit 21a2a2a Author: Martin Wiesner <[email protected]> Date: Mon May 12 20:34:19 2025 +0200 OPENNLP-1733: Remove implements Serializable from LanguageDetector (apache#780) commit 7c72cb0 Author: Martin Wiesner <[email protected]> Date: Mon May 12 20:32:46 2025 +0200 OPENNLP-1732: Eliminate use of raw types for StopCriteria (apache#779) commit e4f5ce2 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon May 12 20:32:10 2025 +0200 OPENNLP-1730: Update ONNX runtime to 1.22.0 (apache#782) Bumps `onnxruntime.version` from 1.21.1 to 1.22.0. Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.1 to 1.22.0 - [Release notes](https://github.com/microsoft/onnxruntime/releases) - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md) - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0) Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.1 to 1.22.0 - [Release notes](https://github.com/microsoft/onnxruntime/releases) - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md) - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0) --- updated-dependencies: - dependency-name: com.microsoft.onnxruntime:onnxruntime dependency-version: 1.22.0 dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu dependency-version: 1.22.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 46d2d78 Author: Nishant Shrivastava <[email protected]> Date: Sun May 11 17:32:59 2025 +0100 OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment. commit 01a4695 Author: Nishant Shrivastava <[email protected]> Date: Sat May 10 23:24:16 2025 +0100 OPENNLP-1731: Add Junits for NGramLanguageModelTool commit 1675317 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Apr 30 18:16:50 2025 +0200 Minor: Regenerated NOTICE File for 95cd7c8 (apache#776) Signed-off-by: GitHub <[email protected]> Co-authored-by: mawiesne <[email protected]> commit 95cd7c8 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Apr 30 18:15:51 2025 +0200 OPENNLP-1730: Update ONNX runtime to 1.21.1 (apache#774) Bumps `onnxruntime.version` from 1.21.0 to 1.21.1. Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.0 to 1.21.1 - [Release notes](https://github.com/microsoft/onnxruntime/releases) - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md) - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1) Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.0 to 1.21.1 - [Release notes](https://github.com/microsoft/onnxruntime/releases) - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md) - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1) --- updated-dependencies: - dependency-name: com.microsoft.onnxruntime:onnxruntime dependency-version: 1.21.1 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu dependency-version: 1.21.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 7c85b94 Author: Martin Wiesner <[email protected]> Date: Wed Apr 30 18:12:19 2025 +0200 OPENNLP-1729: Provide easier loading of Models for given model lang and type (apache#775) - extracts ModelType from DownloadUtil - adds new methods to ClassPathModelLoader to obtain actual model instances easily - adds ClassPathModelProvider interface - adds DefaultClassPathModelProvider which combines existing classes to achieve easier access to model objects via classpath loading - adds JUnit tests for the new classes - adds and improves JavaDoc commit 28e2de6 Author: NishantShri4 <[email protected]> Date: Fri Apr 25 18:59:14 2025 +0100 OPENNLP-124: Maxent/Perceptron training should report progress back via an API (apache#758) * OPENNLP-124 : Maxent/Perceptron training should report progress back via an API * OPENNLP-124 : Fixed Review Comments * OPENNLP-124 : Updated javadoc for the new Trainer.init method commit 2720a1b Author: Martin Wiesner <[email protected]> Date: Fri Apr 25 17:32:20 2025 +0200 OPENNLP-1728: Improve JavaDoc of opennlp.tools.models package (apache#772) commit e1843dc Author: Martin Wiesner <[email protected]> Date: Wed Apr 23 21:42:29 2025 +0200 OPENNLP-1727: Correct example snippet for loading a model from the classpath (apache#771)

NishantShri4 marked this pull request as draft March 24, 2025 07:28

rzo1 reviewed Mar 24, 2025

View reviewed changes

NishantShri4 commented Apr 14, 2025

View reviewed changes

opennlp-tools/src/main/java/opennlp/tools/monitoring/StopCriteria.java Outdated Show resolved Hide resolved

NishantShri4 commented Apr 15, 2025

View reviewed changes

opennlp-tools/src/main/java/opennlp/tools/monitoring/TrainingProgressMonitor.java Outdated Show resolved Hide resolved

NishantShri4 commented Apr 15, 2025

View reviewed changes

opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java Outdated Show resolved Hide resolved

NishantShri4 marked this pull request as ready for review April 16, 2025 08:30

mawiesne changed the title ~~OPENNLP-124 Maxent/Perceptron training should report progess back via an API~~ OPENNLP-124: Maxent/Perceptron training should report progess back via an API Apr 16, 2025

NishantShri4 force-pushed the main branch from 9f8621e to d02cbaa Compare April 16, 2025 22:41

OPENNLP-124 : Maxent/Perceptron training should report progess back v…

c351fcd

…ia an API

NishantShri4 force-pushed the main branch from 7042b4a to c351fcd Compare April 16, 2025 22:54

Merge remote-tracking branch 'upstream/main'

f21d867

mawiesne requested changes Apr 21, 2025

View reviewed changes

OPENNLP-124 : Fixed Review Comments.

4d6cc49

mawiesne approved these changes Apr 23, 2025

View reviewed changes

mawiesne requested a review from jzonthemtn April 23, 2025 18:44

mawiesne assigned NishantShri4 Apr 23, 2025

mawiesne added the java Pull requests that update Java code label Apr 23, 2025

rzo1 approved these changes Apr 23, 2025

View reviewed changes

OPENNLP-124 : Updated javadoc for the new Trainer.init method.

a794f38

mawiesne changed the title ~~OPENNLP-124: Maxent/Perceptron training should report progess back via an API~~ OPENNLP-124: Maxent/Perceptron training should report progress back via an API Apr 25, 2025

mawiesne merged commit 28e2de6 into apache:main Apr 25, 2025
10 checks passed

OPENNLP-124: Maxent/Perceptron training should report progress back via an API #758

OPENNLP-124: Maxent/Perceptron training should report progress back via an API #758

Uh oh!

Conversation

NishantShri4 commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For all changes:

For code changes:

For documentation related changes:

Note:

Uh oh!

NishantShri4 commented Mar 24, 2025

Uh oh!

mawiesne commented Mar 24, 2025

Uh oh!

rzo1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NishantShri4 commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NishantShri4 commented Apr 16, 2025

Uh oh!

mawiesne commented Apr 16, 2025

Uh oh!

NishantShri4 commented Apr 16, 2025

Uh oh!

mawiesne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rzo1 left a comment • edited by mawiesne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NishantShri4 commented Apr 23, 2025

Uh oh!

rzo1 commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NishantShri4 commented Apr 24, 2025

Uh oh!

rzo1 commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

NishantShri4 commented Mar 24, 2025 •

edited

Loading

NishantShri4 commented Apr 13, 2025 •

edited

Loading

rzo1 left a comment •

edited by mawiesne

Loading

rzo1 commented Apr 24, 2025 •

edited

Loading