Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions posts/published-check-datapackage/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: "First published release of `check-datapackage`!"
description: "We've published our second Python package. :tada: :grin: This package checks that a Data Package is compliant with its specification."
author:
- Luke W. Johnston
date: "2025-12-08"
categories:
- packaging
- publishing
- programming
---

On November 27th, 2025, we published our second Python package to
[PyPI](https://pypi.org/project/check-datapackage). This package forms
the basis for ensuring that any metadata created or edited for a [Data
Package](https://decisions.seedcase-project.org/why-frictionless-data/)
is correct and compliant with the [Data Package
standard](https://datapackage.org). Since we are and will be working
with and managing many Data Packages over the coming years, this is an
important tool for us to have! Generally, this will be a helpful tool
for anyone working with and managing Data Packages.

## What's `check-datapackage`?

As with all our packages and software tools, we have a dedicated website
for
[`check-datapackage`](https://check-datapackage.seedcase-project.org).
So, rather than repeat what is already in that website, this post gives
a very quick overview of what this package does and why you might want
to use it. It can be summarised by its tagline:

> Ensure the compliance of your Data Package metadata

The "only" thing `check-datapackage` does is to check the content of a
`datapackage.json` file against the Data Package standard. Nothing fancy. But we
designed it to be configurable, so that if you have specific needs for
your Data Package, you can adjust the checks accordingly. It's possible
to both add checks on top of the standard or ignore certain checks from
the standard. For example, if you want to ensure that certain fields
that aren't required by the standard are always present in the metadata,
you can set up the checks to enforce that.

For now, `check-datapackage` is only a few Python functions and classes
that you can use within your own Python scripts. But in the future, we
plan to develop a command-line interface (CLI) so that you can use it
directly from your terminal without needing to write any code. Along
with including a config file, we hope to incorporate `check-datapackage`
into typical build tools and automated check workflows.

## Why use it?

We wanted this package to be incredibly simple and focused. It also doesn't
include extra dependencies or features that you might not need. We
wanted it lightweight and easy to use.

While there are a few tools that provide some type of checks of Data
Packages, such as the
[frictionless-py](https://pypi.org/project/frictionless/) package, we
didn't want all the extras that came with these packages. Nor are these
tools easy to configure for our needs. In this regard, there were no
tools available that fit ours needs. So, we built our own package that
does exactly what we need. Hopefully, it will be useful for other people
too!

Eventually, when we develop `check-datapackage` as a CLI, you could
include it as a [pre-commit hook](https://pre-commit.com/) or part of
your [continuous
integration](https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration)
workflow so that every time you make changes to your Data Package
metadata, it is automatically checked for compliance. That way, you will
always know that your Data Package metadata lives up to the standard and
your configuration.

### Example use

We have a detailed
[guide](https://check-datapackage.seedcase-project.org/docs/guide/) on
how to use `check-datapackage`. But we'll briefly show how you might use
`check-datapackage`. The main function of the package is `check()`,
which takes as input the properties of a Data Package (i.e., the
contents of the `datapackage.json` file) as a Python dictionary and
checks it against the standard.

``` python
import check_datapackage as cdp

# Normally you'd read in the `datapackage.json` file, but we'll
# show the actual contents here as a Python dict. Can use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# show the actual contents here as a Python dict. Can use
# show the actual contents here as a Python dict. You can use

# the `read_json()` helper function to read in `datapackage.json`
properties = {
"name": "woolly-dormice",
"id": "123-abc-123",
"resources": [{
"name": "woolly-dormice-2015",
"path": "data.csv",
"schema": {"fields": [{
"name": "eye-colour",
"type": "string",
}]},
}],
}

cdp.check(properties)
```

At a minimum, a Data Package needs to have a `resources` property. So in
this case, there are no issues with the Data Package. But if you were to
remove the `resources` property, which is required, and run the check
again, there would be an issue:

``` python
del properties["resources"]
cdp.check(properties)
```

If you want these checks to be treated as an error, you set the
parameter `error` to `True`:

``` python
cdp.check(properties, error=True)
```

If you want to exclude certain checks, you can do that by using the
`Config` and `Exclusion` classes. For example, if you want to exclude
all required checks, you can define the exclusion, add it to the
configuration, and pass it to the check function like so:

``` python
exclusion_required = cdp.Exclusion(type="required")
config = cdp.Config(exclusions=[exclusion_required])
cdp.check(properties=package_properties, config=config)
```

If you want the issues listed in a more human-friendly way, you can use
the `explain()` function that takes the list of issues returned by
`check()` and formats them nicely:

``` python
issues = cdp.check(properties)
cdp.explain(issues)
```

There's many other checks you can configure with `check-datapackage`, so
be sure to check out the
[website](https://check-datapackage.seedcase-project.org) for more
information!