Skip to content

Latest commit

 

History

History
50 lines (33 loc) · 3.29 KB

File metadata and controls

50 lines (33 loc) · 3.29 KB

LICENSE-OVERVIEW.md

Overview

This repository (nltk_data) is governed as a whole by the Apache License 2.0. However, the individual data packages included in this repository are each subject to their own licenses, which may differ substantially from the repository-wide license. Packages may be covered by open licenses (MIT, Creative Commons, etc.), public domain dedication, custom or restrictive terms (such as "non-commercial use only" or "distributed with permission"), or may lack explicit license terms entirely.

Important:
You must consult the specific license for each dataset before use, especially for commercial or redistributive purposes.
See DATASET-LICENSES.md for a grouped summary of package licenses.

Maintainers are not legal professionals and cannot answer legal questions or provide legal advice.
If you have any doubts or require legal interpretation, consult a qualified legal professional.

Special Notes

  • Unclarified, Ambiguous, or Missing Licenses
    Some data packages have ambiguous, missing, or unclarified licenses (most notably the Punkt Tokenizer Models). Despite long-standing community efforts (see nltk_data issue #241 and related issues), clarification has not always been possible.
    These packages are grouped and flagged in DATASET-LICENSES.md with explicit warnings.
    If you have legal questions or concerns about using any package with an unclear or ambiguous license, consult a qualified lawyer. Do not rely on assumptions, community answers, or advice from maintainers.

  • This Documentation is Not Legal Advice
    The information in these files is provided for convenience and transparency, and does not constitute legal advice.
    You are responsible for ensuring your own legal compliance when using, modifying, or redistributing any content from this repository.

Data Package Licenses

Each data package may have its own license, as detailed in DATASET-LICENSES.md. These may include (but are not limited to):

  • Open source licenses (MIT, various Creative Commons, GPL, etc.)
  • Public domain dedication
  • Custom or restrictive terms ("distributed with permission", "non-commercial use only", "see website", etc.)
  • Citation requests (note: a citation request does not constitute a license)
  • No license or ambiguous terms

If a license is unclear, missing, or does not suit your intended use, do not assume that commercial or public redistribution is allowed.

Your Responsibilities

  • Check the Dataset License:
    Before using, modifying, or redistributing any data package, check the relevant license entry in DATASET-LICENSES.md and, if necessary, consult the original data source for updated terms.

  • When in Doubt:
    If the license is missing, ambiguous, or unclear, or if you are unsure about your intended use, seek advice from a qualified legal professional.

Keeping This Documentation Up to Date

If you add, update, or remove datasets, please also update DATASET-LICENSES.md and this overview file to ensure continued transparency for all users.

Apache License 2.0

See the LICENSE file for the full text of the repository-wide license.