Skip to content

spraakbanken/sparv-sbx-corpus-statistics

Repository files navigation

sparv-sbx-corpus-statistics

PyPI version PyPI license PyPI - Python Version

Maturity badge - level 2 Stage

codecov

CI(check) CI(release) CI(scheduled) CI(test)

A Sparv plugin to collect statistics about a corpus.

Install

First, install Sparv as suggested,

with pipx:

pipx install sparv

or, with uv-pipx:

uvpipx install sparv

Then install sparv-sbx-corpus-statistics with,

the suggested method:

sparv plugins install sparv-sbx-corpus-statistics

or, if you used pipx above:

pipx inject sparv sparv-sbx-corpus-statistics

or, if you used uv-pipx above:

uvpipx install sparv-sbx-corpus-statistics --inject sparv

Usage

To use this plugin add sbx_corpus_statistics:stat_highlights under export.default in your config.yaml

export:
  default:
    - xml_export:pretty
    - sbx_corpus_statistics:stat_highlights
    # - more exports

Minimum Supported Python Version Policy

The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-corpus-statistics versions and their Minimum Supported Python Version:

  • v0.1: Python 3.11.

Note however that sparv-sbx-corpus-statistics also has dependencies, which might have different MSRV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.

Changelog

This project keeps a changelog.

License

This repository is licensed under the MIT license.

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo (in one of the ways below):
    • git clone git@github.com:spraakbanken/sparv-sbx-corpus-statistics.git
    • git clone https://github.com/spraakbanken/sparv-sbx-corpus-statistics.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.
  • Test the examples with:
    • make test-example-small-txt

This repo uses conventional commits.

Release a new version

[!NOTE] Requirements bump-my-version for make bumpversion, install with uv tool install bump-my-version. git-cliff for make prepare-release sparv-sbx-metadata for make generate-metadata, installed automaticly.

  • Prepare the CHANGELOG: make prepare-release.
  • Edit CHANGELOG.md to your liking. Keep the header [unreleased]
  • Add to git: git add --update
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
  • Add metadata for Språkbanken's resource

About

A Sparv plugin to collect statistics about a corpus.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •