Skip to content

Commit eec4ee3

Browse files
authored
Merge pull request nltk#3526 from nltk/update-contributing
Update CONTRIBUTING.md
2 parents f76dc7e + 5690e98 commit eec4ee3

File tree

1 file changed

+119
-40
lines changed

1 file changed

+119
-40
lines changed

CONTRIBUTING.md

Lines changed: 119 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ important are:
3131
- [nltk/nltk_book](https://github.com/nltk/nltk_book), source code for the NLTK
3232
Book.
3333

34-
## Development priorities (old)
34+
## Development priorities
3535

3636
NLTK consists of the functionality that the Python/NLP community is motivated to contribute.
3737
Some priority areas for development are listed in the [NLTK Wiki](https://github.com/nltk/nltk/wiki#development).
@@ -57,8 +57,25 @@ repository [nltk/nltk](https://github.com/nltk/nltk/):
5757
- Clone your forked repository locally
5858
(`git clone https://github.com/<your-github-username>/nltk.git`);
5959
- Run `cd nltk` to get to the root directory of the `nltk` code base;
60-
- Install the dependencies (`pip install -r pip-req.txt`);
61-
- Install the [pre-commit](https://pre-commit.com) hooks: (`pre-commit install`)
60+
- Create and activate a virtual environment:
61+
```bash
62+
python -m venv venv
63+
source venv/bin/activate # On Windows: venv\Scripts\activate
64+
```
65+
- Install NLTK in editable mode with dependencies:
66+
```bash
67+
pip install -e .
68+
pip install -r pip-req.txt
69+
```
70+
- Install the pre-commit hooks:
71+
```bash
72+
pip install pre-commit
73+
pre-commit install
74+
```
75+
- Install the code formatters and linter used by the pre-commit hooks:
76+
```bash
77+
pip install black isort ruff pyupgrade
78+
```
6279
- Download the datasets for running tests
6380
(`python -m nltk.downloader all`);
6481
- Create a remote link from your local repository to the
@@ -67,6 +84,31 @@ repository [nltk/nltk](https://github.com/nltk/nltk/):
6784
you will need to use this `upstream` link when updating your local repository
6885
with all the latest contributions.
6986

87+
### Pre-commit hooks
88+
89+
NLTK uses [pre-commit](https://pre-commit.com) to run code quality checks
90+
before each commit. The hooks are configured in
91+
[`.pre-commit-config.yaml`](https://github.com/nltk/nltk/blob/develop/.pre-commit-config.yaml)
92+
and include:
93+
94+
- [pre-commit-hooks](https://github.com/pre-commit/pre-commit-hooks) -- trailing whitespace, end-of-file fixer, YAML check
95+
- [pyupgrade](https://github.com/asottile/pyupgrade) -- upgrade syntax to Python 3.10+
96+
- [black](https://github.com/psf/black) -- code formatting
97+
- [isort](https://github.com/pycqa/isort) -- import sorting
98+
- [ruff](https://github.com/astral-sh/ruff-pre-commit) -- fast Python linter with auto-fix
99+
100+
You can run all hooks manually with:
101+
```bash
102+
pre-commit run --all-files
103+
```
104+
105+
Or run the tools individually:
106+
```bash
107+
isort nltk/path/to/file.py
108+
black nltk/path/to/file.py
109+
ruff check nltk/path/to/file.py
110+
```
111+
70112
### GitHub Pull requests
71113

72114
We use [gitflow](https://nvie.com/posts/a-successful-git-branching-model/) to manage our branches.
@@ -82,7 +124,7 @@ Summary of our git branching model:
82124
- Do many small commits on that branch locally (`git add files-changed`,
83125
`git commit -m "Add some change"`);
84126
- Run the tests to make sure nothing breaks
85-
(`tox -e py313` if you are on Python 3.13);
127+
(`pytest nltk/test` or `tox -e py313` if you are on Python 3.13);
86128
- Add your name to the `AUTHORS.md` file as a contributor;
87129
- Push to your fork on GitHub (with the name as your local branch:
88130
`git push origin branch-name`);
@@ -146,67 +188,104 @@ the desired feature.
146188

147189
You can use `pytest` to run your tests, no matter which type of test it is:
148190

149-
```
191+
```bash
150192
cd nltk/test
151-
pytest util.doctest # doctest
193+
pytest util.doctest # doctest
152194
pytest unit/translate/test_nist.py # unittest
153-
pytest # all tests
195+
pytest # all tests
154196
```
155197

198+
If your PR only touches a single module, you can run just the relevant test
199+
file directly with `python -m unittest` without needing pytest:
156200

157-
## Continuous Integration
201+
```bash
202+
# Run a specific test file
203+
python -m unittest nltk.test.unit.test_tokenize
204+
205+
# Run a specific test class
206+
python -m unittest nltk.test.unit.test_tokenize.TestTreebankWordDetokenizer
207+
208+
# Run a specific test method
209+
python -m unittest nltk.test.unit.test_tokenize.TestTreebankWordDetokenizer.test_contractions
210+
```
211+
212+
If your PR touches a module that has doctests (inline `>>>` examples in
213+
docstrings), you can run just those doctests with `python -m doctest`:
214+
215+
```bash
216+
# Run doctests for a single module
217+
python -m doctest nltk/metrics/distance.py
218+
219+
# Run with verbose output to see each test
220+
python -m doctest -v nltk/metrics/distance.py
158221

159-
**Deprecated:** NLTK uses [Cloudbees](https://nltk.ci.cloudbees.com/) for continuous integration.
222+
# Run a specific doctest file from the test suite
223+
python -m doctest nltk/test/tokenize.doctest
224+
```
225+
226+
These are faster than running the full test suite and useful for quick
227+
iteration during development.
160228

161-
**Deprecated:** NLTK uses [Travis](https://travis-ci.org/nltk/nltk/) for continuous integration.
162229

163-
NLTK uses [GitHub Actions](https://github.com/nltk/nltk/actions) for continuous integration. See [here](https://docs.github.com/en/actions) for GitHub's documentation.
230+
## Continuous Integration
231+
232+
NLTK uses [GitHub Actions](https://github.com/nltk/nltk/actions) for continuous integration.
233+
See [here](https://docs.github.com/en/actions) for GitHub's documentation.
164234

165-
The [`.github/workflows/ci.yaml`](https://github.com/nltk/nltk/blob/develop/.github/workflows/ci.yaml) file configures the CI:
235+
The [`.github/workflows/ci.yml`](https://github.com/nltk/nltk/blob/develop/.github/workflows/ci.yml) file configures the CI:
166236

167237
- `on:` section
168-
- ensures that this CI is run on code pushes, pull request, or through the GitHub website via a button.
238+
- ensures that this CI is run on code pushes, pull request, or through the GitHub website via `workflow_dispatch`.
169239

170-
- The `cache_nltk_data` job
240+
- The `pre-commit` job
171241
- performs these steps:
172242
- Downloads the `nltk` source code.
173-
- Load `nltk_data` via cache.
174-
- Otherwise, download all the data packages through `nltk.download('all')`.
175-
176-
- The `test` job
177-
- tests against supported Python versions (`3.10`, `3.11`, `3.12`, `3.13`, `3.14`).
178-
- tests on `ubuntu-latest` and `macos-latest`.
179-
- relies on the `cache_nltk_data` job to ensure that `nltk_data` is available.
180-
- performs these steps:
181-
- Downloads the `nltk` source code.
182-
- Set up Python using whatever version is being checked in the current execution.
183-
- Load module dependencies via cache.
184-
- Otherwise, install dependencies via `pip install -U -r requirements-ci.txt`.
185-
- Load cached `nltk_data` loaded via `cache_nltk_data`.
186-
- Run `pytest --numprocesses auto -rsx nltk/test`.
243+
- Runs pre-commit on all files in the repository (black, isort, ruff, pyupgrade).
244+
- Fails if any hooks performed a change.
187245

188-
- The `pre-commit` job
246+
- The `minimal_download_test` job
247+
- verifies that `nltk.download()` works on all platforms (ubuntu, macos, windows).
248+
249+
- The `test` job
250+
- tests against supported Python versions (`3.10`, `3.11`, `3.12`, `3.13`, `3.14`).
251+
- tests on `ubuntu-latest`, `macos-latest`, and `windows-latest`.
189252
- performs these steps:
190253
- Downloads the `nltk` source code.
191-
- Runs pre-commit on all files in the repository. (Similar to `pre-commit run --all-files`)
192-
- Fails if any hooks performed a change.
254+
- Sets up Python using whatever version is being checked in the current execution.
255+
- Installs dependencies via `pip install -r pip-req.txt`.
256+
- Downloads `nltk_data`.
257+
- Runs `pytest --numprocesses auto -rsx --doctest-modules nltk`.
258+
259+
#### To run tests locally
193260

194-
#### To test with `tox` locally
261+
Using pytest directly:
195262

196-
First setup a new virtual environment, see https://docs.python-guide.org/dev/virtualenvs/
197-
Then run `tox -e py313`.
263+
```bash
264+
# Run all tests
265+
pytest nltk/test
198266

199-
For example, using `pipenv`:
267+
# Run a specific test file
268+
pytest nltk/test/unit/test_tokenize.py
200269

270+
# Run tests in parallel
271+
pip install pytest-xdist
272+
pytest --numprocesses auto nltk/test
201273
```
202-
git clone https://github.com/nltk/nltk.git
203-
cd nltk
204-
pipenv install -r pip-req.txt
205-
pipenv install tox
206-
tox -e py313
274+
275+
Using tox (to test against a specific Python version):
276+
277+
```bash
278+
pip install tox
279+
tox -e py313 # for Python 3.13
207280
```
208281

209282

283+
## Supported Python Versions
284+
285+
NLTK supports Python `3.10`, `3.11`, `3.12`, `3.13`, and `3.14`.
286+
See `python_requires` in [setup.py](https://github.com/nltk/nltk/blob/develop/setup.py).
287+
288+
210289
# Discussion
211290

212291
We have three mail lists on Google Groups:

0 commit comments

Comments
 (0)