Skip to content

Conversation

@khusmann
Copy link
Contributor

@khusmann khusmann commented Feb 5, 2025

This is a proposal for a recipe for "category tables", which allows categorical variables to define categories via table resources. As indicated in the recipe, this approach as a number of advantages:

  • In case of a large number of categories it is often easier to maintain these in files, such as CSV files. This also keeps the datapackage.json file compact and readable for humans.

  • The data set in the category table resource can store additional information besides the 'value' and 'label'. For example, the categories could have descriptions or the categories could form a hierarchy.

  • It is also possible to store additional meta data in the category table resource. For example, it is possible to indicate the source, license, version and owner of the data resource. This is important for many 'official' categories lists where there can be many similar versions maintained by different organisations.

  • When different fields use the same categories they can all refer to the same category table resource. First, this allows to reuse of the categories. Second, by referring to the same data resource, the field descriptors can indicate that the categories are comparable between fields.

  • It is possible to refer to category table resources in other data packages. This makes it, for example, possible to create centrally maintained repositories of categories.

It was first proposed / discussed here: #888

The current PR was work-shopped in great detail by myself, @djvanderlaan, @fomcl and @pschumm, and we plan to have a live discussion / Q & A at our next community meeting (2025-02-06). In the meantime, we look forward to everyone's thoughts and feedback.

@roll roll added the docs label Feb 10, 2025
@roll roll added this to datapackage Mar 24, 2025
@roll
Copy link
Member

roll commented Jul 7, 2025

Amazing work! Sorry for the late review

@roll roll merged commit 3b9efb7 into frictionlessdata:next Jul 7, 2025
@github-project-automation github-project-automation bot moved this to ✅ Done in datapackage Jul 7, 2025
roll added a commit that referenced this pull request Jul 7, 2025
@roll roll removed this from datapackage Jul 7, 2025
@roll roll added this to datapackage Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants