Skip to content

OPENNLP-124: Maxent/Perceptron training should report progress back via an API #758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 25, 2025

Conversation

NishantShri4
Copy link
Contributor

@NishantShri4 NishantShri4 commented Mar 24, 2025

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

@NishantShri4
Copy link
Contributor Author

Hello @mawiesne / @kottmann . Good day!

Is this item waiting to be picked up? https://issues.apache.org/jira/browse/OPENNLP-124.
Here is my draft PR.
This PR is just to explain my current understanding of the Jira .
This needs further improvements (refactoring, integration with other model trainers, unit tests etc.).
I am happy to be part of the discussion and pick it up for implementation if the team approves.
Pls. let me know your views.

Attached the output of a couple of existing Tests (Perceptron Trainer) based on the integration with Console based TrainingProgressMonitor.
PerceptroTrainerUnitTests-LogOutput.txt

@NishantShri4 NishantShri4 marked this pull request as draft March 24, 2025 07:28
@mawiesne
Copy link
Contributor

FYI: @jzonthemtn + @rzo1

Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the draft. I added some thoughts / comments.

@NishantShri4
Copy link
Contributor Author

NishantShri4 commented Apr 13, 2025

hi @rzo1, @mawiesne,

Hope you are well. I have tried to fix the review comments. Would you pls. be able to review once more and direct towards the intended solution? Many thanks in advance.

Some queries and ToDos:

  1. The jira description gives following prototype of finishedTraining() method.
finishedTraining(int iterations, int numberCorrectEvents, int totalEvents, StopCriteria stopCriteria);
}

Please can you clarify, what is the use of numberCorrectEvents and totalEvents parameters? In my current implementation, I have not used them, instead I found stopCriteria is sufficient. Pls. take a look.

  1. ToDo : Update the documentation to include description of TPM and StopCriteria.

@NishantShri4 NishantShri4 marked this pull request as ready for review April 16, 2025 08:30
@NishantShri4
Copy link
Contributor Author

Hi Reviewers - All checks are green now. This is available for review. If possible, pls. take a look.
One query - This PR has 10 commits. Before presenting it for review, Is it needed to club them into a single commit?

@mawiesne
Copy link
Contributor

Hi Reviewers - All checks are green now. This is available for review. If possible, pls. take a look.

One query - This PR has 10 commits. Before presenting it for review, Is it needed to club them into a single commit?

Thx @NishantShri4 for moving this topic forward! Could you squash those commits and force push the resulting single commit? Once available, we'll have a detailed look and provide feedback.

@mawiesne mawiesne changed the title OPENNLP-124 Maxent/Perceptron training should report progess back via an API OPENNLP-124: Maxent/Perceptron training should report progess back via an API Apr 16, 2025
@NishantShri4
Copy link
Contributor Author

Thanks @mawiesne. This is done (rebase, squash and force push).

Copy link
Contributor

@mawiesne mawiesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @NishantShri4 for providing this substantial contribution. I've left feedback by comments to further improve it. Once addressed, I'll re-check and potentially, @rzo1 can add his final thoughts/checks then.

@mawiesne mawiesne requested a review from jzonthemtn April 23, 2025 18:44
@mawiesne mawiesne added the java Pull requests that update Java code label Apr 23, 2025
Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We are currently discussing, if an ICLA is required due to the size of this contribution. Stay tuned 🙂

@NishantShri4
Copy link
Contributor Author

Thx @NishantShri4 for providing this substantial contribution. I've left feedback by comments to further improve it. Once addressed, I'll re-check and potentially, @rzo1 can add his final thoughts/checks then.

Thanks very much @mawiesne for the detailed review earlier. Very useful. I have pushed changes earlier to answer/fix the review comments. I could see two approvals available now. Thanks to the approvers for their time.

@rzo1
Copy link
Contributor

rzo1 commented Apr 24, 2025

@NishantShri4 can you fill an ICLA for your contribution please?

Details can be found here https://www.apache.org/licenses/contributor-agreements.html

If you sign it, please add "OpenNLP" in the section "notify project". You don't need to add an Apache ID. Thanks!

@NishantShri4
Copy link
Contributor Author

@NishantShri4 can you fill an ICLA for your contribution please?

Details can be found here https://www.apache.org/licenses/contributor-agreements.html

If you sign it, please add "OpenNLP" in the section "notify project". You don't need to add an Apache ID. Thanks!

Thanks @rzo1. This is done (signed ICLA is sent to :[email protected]).

@rzo1
Copy link
Contributor

rzo1 commented Apr 25, 2025

Thanks again (and of course, we truly appreciate your contribution)! We’ll go ahead and merge this PR once we get confirmation from the secretary.

@mawiesne mawiesne changed the title OPENNLP-124: Maxent/Perceptron training should report progess back via an API OPENNLP-124: Maxent/Perceptron training should report progress back via an API Apr 25, 2025
@mawiesne mawiesne merged commit 28e2de6 into apache:main Apr 25, 2025
10 checks passed
NishantShri4 added a commit to NishantShri4/opennlp that referenced this pull request Jun 14, 2025
commit 52955e9
Author: Nishant Shrivastava <[email protected]>
Date:   Sat Jun 14 18:50:09 2025 +0100

    OPENNLP-1745: SentenceDetector - Add Junit test for useTokenEnd = false

commit fe59eb9
Merge: 67ac7b2 05f69a4
Author: Nishant Shrivastava <[email protected]>
Date:   Sat Jun 14 07:29:36 2025 +0100

    Merge remote-tracking branch 'origin/main'

commit 05f69a4
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 9 11:49:38 2025 +0200

    OPENNLP-1724: Update JUnit to 5.13.1 (apache#790)

    Bumps `junit.version` from 5.13.0 to 5.13.1.

    Updates `org.junit.jupiter:junit-jupiter-api` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    Updates `org.junit.jupiter:junit-jupiter-engine` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    Updates `org.junit.jupiter:junit-jupiter-params` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    ---
    updated-dependencies:
    - dependency-name: org.junit.jupiter:junit-jupiter-api
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: org.junit.jupiter:junit-jupiter-engine
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: org.junit.jupiter:junit-jupiter-params
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 32f4ef7
Author: Richard Zowalla <[email protected]>
Date:   Sat Jun 7 21:21:36 2025 +0200

    Disable merge request requirement for opennlp-2.x (apache#789)

commit 8abfe0d
Author: Richard Zowalla <[email protected]>
Date:   Sat Jun 7 20:45:08 2025 +0200

    Remove code review requirement for 2.x branch to allow cherry picking already reviewed commits. (apache#788)

commit 89e4260
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 2 09:17:43 2025 +0200

    OPENNLP-1724: Update JUnit to 5.13.0 (apache#787)

    Bumps `junit.version` from 5.12.2 to 5.13.0.

    Updates `org.junit.jupiter:junit-jupiter-api` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    Updates `org.junit.jupiter:junit-jupiter-engine` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    Updates `org.junit.jupiter:junit-jupiter-params` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    ---
    updated-dependencies:
    - dependency-name: org.junit.jupiter:junit-jupiter-api
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: org.junit.jupiter:junit-jupiter-engine
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: org.junit.jupiter:junit-jupiter-params
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 2c8e58b
Author: Martin Wiesner <[email protected]>
Date:   Sat May 24 20:59:20 2025 +0200

    OPENNLP-1708: Raise OpenNLP version to 3.x on main branch (apache#785)

    * OPENNLP-1708: Raise OpenNLP version to 3.x on main branch
    - adjusts all pom.xml files towards 3.0.0-SNAPSHOT
    - adjusts upper major model version to 3.x
    - adds static method Version#between for simpler version range checks in BaseModel
    - adds 'opennlp-2.x' branch to protected branches in .asf.yml
    - updates README.md with infos on 'Branches and Merging Strategy'
    - cures a typo
    - adds external link to the ONNX website

commit 0db3c10
Author: Richard Zowalla <[email protected]>
Date:   Tue May 20 21:35:27 2025 +0200

    OPENNLP-1545 - Close ZipInputStream in BaseModel (apache#784)

commit 2ed9949
Author: Martin Wiesner <[email protected]>
Date:   Tue May 20 16:26:59 2025 +0200

    OPENNLP-1734: Adjust GH CI config to build with Java 25-ea (apache#781)

commit 5eec98c
Author: NishantShri4 <[email protected]>
Date:   Thu May 15 09:25:06 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool (apache#778)

    * OPENNLP-1731: Add Junits for NGramLanguageModelTool

    * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

    * OPENNLP-1731: Add Junits for NGramLanguageModelTool

    * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

    * OPENNLP-1731: Fixed a Generic RawType warning.

    * OPENNLP-1731: Rebased against upstream.

    * OPENNLP-1731: Rebased against upstream.

    * OPENNLP-1731: Rebased against upstream (removed extra new line).

    * OPENNLP-1731: Removed an extra newline.

commit 67ac7b2
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:23:00 2025 +0100

    OPENNLP-1731: Removed an extra newline.

commit 0d95dd9
Merge: 35de220 2580a20
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:20:59 2025 +0100

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	opennlp-tools/src/test/java/opennlp/tools/AbstractLoggerTest.java

commit 35de220
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:19:54 2025 +0100

    OPENNLP-1731: Rebased against upstream (removed extra new line).

commit e09f2ad
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:18:29 2025 +0100

    OPENNLP-1731: Rebased against upstream.

commit 6d84e2f
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:16:21 2025 +0100

    OPENNLP-1731: Rebased against upstream.

commit 2580a20
Merge: 0a20ef5 46d2d78
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:09:17 2025 +0100

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java

commit 0a20ef5
Author: Nishant Shrivastava <[email protected]>
Date:   Mon May 12 20:06:51 2025 +0100

    OPENNLP-1731: Fixed a Generic RawType warning.

commit cfa425f
Author: Nishant Shrivastava <[email protected]>
Date:   Sun May 11 17:32:59 2025 +0100

    OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

commit a7eb44a
Author: Nishant Shrivastava <[email protected]>
Date:   Sat May 10 23:24:16 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool

commit f7be29d
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Mon May 12 20:35:53 2025 +0200

    Minor: Regenerated NOTICE File for 21a2a2a (apache#783)

    Signed-off-by: GitHub <[email protected]>
    Co-authored-by: mawiesne <[email protected]>

commit 21a2a2a
Author: Martin Wiesner <[email protected]>
Date:   Mon May 12 20:34:19 2025 +0200

    OPENNLP-1733: Remove implements Serializable from LanguageDetector (apache#780)

commit 7c72cb0
Author: Martin Wiesner <[email protected]>
Date:   Mon May 12 20:32:46 2025 +0200

    OPENNLP-1732: Eliminate use of raw types for StopCriteria (apache#779)

commit e4f5ce2
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon May 12 20:32:10 2025 +0200

    OPENNLP-1730: Update ONNX runtime to 1.22.0 (apache#782)

    Bumps `onnxruntime.version` from 1.21.1 to 1.22.0.

    Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.1 to 1.22.0
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0)

    Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.1 to 1.22.0
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0)

    ---
    updated-dependencies:
    - dependency-name: com.microsoft.onnxruntime:onnxruntime
      dependency-version: 1.22.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu
      dependency-version: 1.22.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 46d2d78
Author: Nishant Shrivastava <[email protected]>
Date:   Sun May 11 17:32:59 2025 +0100

    OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

commit 01a4695
Author: Nishant Shrivastava <[email protected]>
Date:   Sat May 10 23:24:16 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool

commit 1675317
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Wed Apr 30 18:16:50 2025 +0200

    Minor: Regenerated NOTICE File for 95cd7c8 (apache#776)

    Signed-off-by: GitHub <[email protected]>
    Co-authored-by: mawiesne <[email protected]>

commit 95cd7c8
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Apr 30 18:15:51 2025 +0200

    OPENNLP-1730: Update ONNX runtime to 1.21.1 (apache#774)

    Bumps `onnxruntime.version` from 1.21.0 to 1.21.1.

    Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.0 to 1.21.1
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1)

    Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.0 to 1.21.1
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1)

    ---
    updated-dependencies:
    - dependency-name: com.microsoft.onnxruntime:onnxruntime
      dependency-version: 1.21.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu
      dependency-version: 1.21.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 7c85b94
Author: Martin Wiesner <[email protected]>
Date:   Wed Apr 30 18:12:19 2025 +0200

    OPENNLP-1729: Provide easier loading of Models for given model lang and type (apache#775)

    - extracts ModelType from DownloadUtil
    - adds new methods to ClassPathModelLoader to obtain actual model instances easily
    - adds ClassPathModelProvider interface
    - adds DefaultClassPathModelProvider which combines existing classes to achieve easier access to model objects via classpath loading
    - adds JUnit tests for the new classes
    - adds and improves JavaDoc

commit 28e2de6
Author: NishantShri4 <[email protected]>
Date:   Fri Apr 25 18:59:14 2025 +0100

    OPENNLP-124: Maxent/Perceptron training should report progress back via an API (apache#758)

    * OPENNLP-124 : Maxent/Perceptron training should report progress back via an API
    * OPENNLP-124 : Fixed Review Comments
    * OPENNLP-124 : Updated javadoc for the new Trainer.init method

commit 2720a1b
Author: Martin Wiesner <[email protected]>
Date:   Fri Apr 25 17:32:20 2025 +0200

    OPENNLP-1728: Improve JavaDoc of opennlp.tools.models package (apache#772)

commit e1843dc
Author: Martin Wiesner <[email protected]>
Date:   Wed Apr 23 21:42:29 2025 +0200

    OPENNLP-1727: Correct example snippet for loading a model from the classpath (apache#771)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants