-
Notifications
You must be signed in to change notification settings - Fork 0
feat: ✨ post on publishing check-datapackage
#184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lwjohnst86
wants to merge
4
commits into
main
Choose a base branch
from
feat/post-about-publishing-check-datapackage
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,147 @@ | ||||||
| --- | ||||||
| title: "First published release of `check-datapackage`!" | ||||||
| description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification." | ||||||
| author: | ||||||
| - Luke W. Johnston | ||||||
| date: "2025-12-08" | ||||||
| categories: | ||||||
| - packaging | ||||||
| - publishing | ||||||
| - programming | ||||||
| --- | ||||||
|
|
||||||
| On November 27th, 2025, we published our second Python package to | ||||||
| [PyPI](https://pypi.org/project/check-datapackage). This package forms | ||||||
| the basis for ensuring that any metadata created or edited for a [Data | ||||||
| Package](https://decisions.seedcase-project.org/why-frictionless-data/) | ||||||
| is correct and compliant with the [Data Package | ||||||
| standard](https://datapackage.org). Since we are and will be working | ||||||
| with and managing many Data Packages over the coming years, this is an | ||||||
| important tool for us to have! Generally, this will be a helpful tool | ||||||
| for anyone working with and managing Data Packages. | ||||||
|
|
||||||
| ## What's `check-datapackage`? | ||||||
|
|
||||||
| As with all our packages and software tools, we have a dedicated website | ||||||
| for | ||||||
| [`check-datapackage`](https://check-datapackage.seedcase-project.org). | ||||||
| So, rather than repeat what is already in that website, this post gives | ||||||
| a very quick overview of what this package does and why you might want | ||||||
| to use it. It can be summarised by its tagline: | ||||||
|
|
||||||
| > Ensure the compliance of your Data Package metadata | ||||||
| The "only" thing `check-datapackage` does is to check the content of a | ||||||
| `datapackage.json` file against the standard. Nothing fancy. But we | ||||||
| designed it to be configurable, so that if you have specific needs for | ||||||
| your Data Package, you can adjust the checks accordingly. It's possible | ||||||
| to both add checks on top of the standard or ignore certain checks from | ||||||
| the standard. For example, if you want to ensure that certain fields | ||||||
| that aren't required by the standard are always present in the metadata, | ||||||
| you can set up the checks to enforce that. | ||||||
|
|
||||||
| For now, `check-datapackage` is only a few Python functions and classes | ||||||
| that you can use within your own Python scripts. But in the future, we | ||||||
| plan to develop a command-line interface (CLI) so that you can use it | ||||||
| directly from your terminal without needing to write any code. Along | ||||||
| with including a config file, we hope to incorporate `check-datapackage` | ||||||
| into typical build tools and automated check workflows. | ||||||
|
|
||||||
| ## Why use it? | ||||||
|
|
||||||
| We wanted this package to be incredibly simple and focused. If you | ||||||
| install or use it, you know exactly what it does. It also doesn't | ||||||
lwjohnst86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
lwjohnst86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| include extra dependencies or features that you might not need. We | ||||||
| wanted it lightweight and easy to use. | ||||||
|
|
||||||
| While there are a few tools that provide some type of checks of Data | ||||||
| Packages, such as the | ||||||
| [frictionless-py](https://pypi.org/project/frictionless/) package, we | ||||||
| didn't want all the extras that came with these packages. Nor are these | ||||||
| tools easy to configure for our needs. In this regard, there were no | ||||||
| tools available that fit ours needs. So, we built our own package that | ||||||
| does exactly what we need. Hopefully, it will be useful for other people | ||||||
| too! | ||||||
|
|
||||||
| Eventually, when we develop `check-datapackage` as a CLI, you could | ||||||
| include it as a [pre-commit hook](https://pre-commit.com/) or part of | ||||||
| your [continuous | ||||||
| integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration) | ||||||
| workflow so that every time you make changes to your Data Package | ||||||
| metadata, it is automatically checked for compliance. That way, you will | ||||||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| always know that your Data Package metadata lives up to the standard and | ||||||
| your configuration. | ||||||
|
|
||||||
| ### Example use | ||||||
|
|
||||||
| We have a detailed | ||||||
| [guide](https://check-datapackage.seedcase-project.org/docs/guide/) on | ||||||
| how to use `check-datapackage`. But I'll briefly show how you might use | ||||||
lwjohnst86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| `check-datapackage`. The main function of the package is `check()`, | ||||||
| which takes as input the properties of a Data Package (i.e., the | ||||||
| contents of the `datapackage.json` file) as a Python dictionary and | ||||||
| checks it against the standard. | ||||||
|
|
||||||
| ``` python | ||||||
| import check_datapackage as cdp | ||||||
|
|
||||||
| # Normally you'd read in the `datapackage.json` file, but we'll | ||||||
| # show the actual contents here as a Python dict. Can use | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| # the `read_json()` helper function to read in `datapackage.json` | ||||||
| properties = { | ||||||
| "name": "woolly-dormice", | ||||||
| "id": "123-abc-123", | ||||||
| "resources": [{ | ||||||
| "name": "woolly-dormice-2015", | ||||||
| "path": "data.csv", | ||||||
| "schema": {"fields": [{ | ||||||
| "name": "eye-colour", | ||||||
| "type": "string", | ||||||
| }]}, | ||||||
| }], | ||||||
| } | ||||||
|
|
||||||
| cdp.check(properties) | ||||||
| ``` | ||||||
|
|
||||||
| At a minimum, a Data Package needs to have a `resources` property. So in | ||||||
| this case, there are no issues with the Data Package. But if you were to | ||||||
| remove the `resources` property, which is required, and run the check | ||||||
| again, there would be an issue: | ||||||
|
|
||||||
| ``` python | ||||||
| del properties["resources"] | ||||||
| cdp.check(properties) | ||||||
| ``` | ||||||
|
|
||||||
| If you want these checks to be treated as an error, you set the | ||||||
| parameter `error` to `True`: | ||||||
|
|
||||||
| ``` python | ||||||
| cdp.check(properties, error=True) | ||||||
| ``` | ||||||
|
|
||||||
| If you want to exclude certain checks, you can do that by using the | ||||||
| `Config` and `Exclusion` classes. For example, if you want to exclude | ||||||
| all required checks, you can define the exclusion, add it to the | ||||||
| configuration, and pass it to the check function like so: | ||||||
|
|
||||||
| ``` python | ||||||
| exclusion_required = cdp.Exclusion(type="required") | ||||||
| config = cdp.Config(exclusions=[exclusion_required]) | ||||||
| cdp.check(properties=package_properties, config=config) | ||||||
| ``` | ||||||
|
|
||||||
| If you want the issues listed in a more human-friendly way, you can use | ||||||
| the `explain()` function that takes the list of issues returned by | ||||||
| `check()` and formats them nicely: | ||||||
|
|
||||||
| ``` python | ||||||
| issues = cdp.check(properties) | ||||||
| cdp.explain(issues) | ||||||
| ``` | ||||||
|
|
||||||
| There's many other checks you can configure with `check-datapackage`, so | ||||||
| be sure to check out the | ||||||
| [website](https://check-datapackage.seedcase-project.org) for more | ||||||
| information! | ||||||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.