The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
This toolkit is written completely in Java and provides support for common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, coreference resolution, language detection and more!
These tasks are usually required to build more advanced text processing services.
The goal of the OpenNLP project is to be a mature toolkit for the above mentioned tasks.
An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.
Presently, OpenNLP includes common classifiers such as Maximum Entropy, Perceptron and Naive Bayes.
OpenNLP can be used both programmatically through its Java API or from a terminal through its CLI. OpenNLP API can be easily plugged into distributed streaming data pipelines like Apache Flink, Apache NiFi, Apache Spark.
For additional information, visit the OpenNLP Home Page
You can use OpenNLP with any language, demo models are provided here.
The models are fully compatible with the latest release, they can be used for testing or getting started.
Note
Please train your own models for all other use cases.
Documentation, including JavaDocs, code usage and command-line interface examples are available here
For recent news, updates and topics, you can:
- join the regular mailing lists,
- follow the project's
social media channel, or
- join the
channel (available to people with an @apache.org email address or upon invitation).
Please, also check the community's OpenNLP questions and answers.
Currently, the library has different modules:
opennlp-api: The public API defining core Apache OpenNLP interfaces and abstractions.opennlp-runtime: The core classes shared across Apache OpenNLP components.opennlp-ml-commons: Common utilities and shared functionality for ML implementations.opennlp-ml-maxent: Maximum Entropy (MaxEnt) machine learning implementation.opennlp-ml-bayes: Naive Bayes machine learning implementation.opennlp-ml-perceptron: Perceptron-based machine learning implementation.opennlp-dl: Apache OpenNLP adapter for ONNX models using theonnxruntimedependency.opennlp-dl-gpu: Replacesonnxruntimewith theonnxruntime_gpudependency to support GPU acceleration.opennlp-models: Classes for working with Apache OpenNLP model artifacts.opennlp-formats: Support for reading and writing various NLP training and data formats.opennlp-cli: The command-line tools for training, evaluating, and running models.opennlp-tools: The full end-user toolkit with all core components and utilities in its executable form.opennlp-morfologik: Extension module providing Morfologik-based dictionary and stemming support.opennlp-uima: Extension module providing a set of Apache UIMA annotators.opennlp-sandbox: Other projects in progress reside in the sandbox.
You can import the core toolkit directly from Maven or Gradle:
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-runtime</artifactId>
<version>${opennlp.version}</version>
</dependency>
<!-- if model support is needed -->
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-models</artifactId>
<version>${opennlp.version}</version>
</dependency>
Note: opennlp-runtime ships with the MaxEnt ML implementation by default. If you need other ML implementations, please add the corresponding dependencies as well.
compile group: "org.apache.opennlp", name: "opennlp-runtime", version: "${opennlp.version}"
compile group: "org.apache.opennlp", name: "opennlp-models", version: "${opennlp.version}"
For more details please check our documentation
The 3.x release line of Apache OpenNLP introduces no known breaking changes but modularizes the project for better usage as a library and to support future extensibility. The core API remains stable and compatible with 2.x, but the project structure has been reorganized into multiple modules.
That means, that you can continue to use the previous opennlp-tools artifact as a dependency. However, we strongly recommend to switch to the new modular structure
and import only the components you need, which will result in a smaller dependency footprint.
Only opennlp-runtime needs to be added as a dependency, and you can add additional modules (e.g. opennlp-ml-maxent, opennlp-models, etc.) as required by your project.
For users of the traditional CLI toolkit, nothing changes with the 3.x release line. CLI usage remains stable as described in the project's dev manual.
The Apache OpenNLP team is planning to change the package namespace from opennlp to org.apache.opennlp in a future release (potentially 4.x).
This change will be made to align with standard Java package naming conventions and to avoid potential conflicts with other libraries.
In addition, the Apache OpenNLP team is considering the raise of the minimal Java version to JDK 21+ in a future release (potentially 4.x) to take advantage of the latest language features and improvements.
To support ongoing development and stable maintenance of Apache OpenNLP, the project follows a dual-branch model:
main: Development branch for version 3.0 and beyond. All feature development and 3.x releases occur here.opennlp-2.x: Maintains the stable 2.x release line. This branch will receive selective updates and patch releases.
- Feature development
- New features targeting versions 3.0+ are developed on feature branches off
mainand merged intomain.
- New features targeting versions 3.0+ are developed on feature branches off
- Bug fixes and dependency updates
- Relevant fixes or dependency updates from
mainmay be cherry-picked intoopennlp-2.xas needed.
- Relevant fixes or dependency updates from
- Releases
- 3.x releases are made from the
mainbranch. - 2.x releases are made from the
opennlp-2.xbranch.
- 3.x releases are made from the
- Release tags
- Release tags are applied directly to the appropriate version branch (
mainfor 3.x oropennlp-2.xfor 2.x). - The presence of a version branch does not affect the tagging or visibility of releases.
- Release tags are applied directly to the appropriate version branch (
At least JDK 17 and Maven 3.3.9 are required to build the library.
After cloning the repository go into the destination directory and run:
mvn install
- Building and integrating Snowball Stemmer for OpenNLP.
The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.
If you would like to get involved please follow the instructions here