Skip to content

Discrepancies between ddf_utils.create_datapackage (Python) and validate-ddf -i (Node) #548

@lapidus

Description

@lapidus

We are using both Python (https://github.com/semio/ddf_utils) and Javascript tooling to generate the datapackage.json with its ddfSchema property.

When running on the very same dataset, the Python-based generator results in a 50% larger datapackage.json file.

It would be interesting to hear your thoughts (@buchslava, @semio) about harmonising the two libraries. So far we have identified 4 differences in outcome:

1. Resource.name is encoded differently:

validate-ddf
"path": "ddf--entities--jurisdiction.csv",
"name": "jurisdiction"

ddf_utils
"path": "ddf--entities--jurisdiction.csv",
"name": "ddf--entities--jurisdiction"

2. The default datapackage.json properties differ

The JavaScript version typically adds more placeholders such as title, license, author, version) whereas ddf_utils generates a bare minimum (name).

3. Python ddf_utils does not seem to work with multiple measures in one file?

ddf--datapoints--measure--measure--by--country--year.csv

4. Different files are excluded

The Python tools seem to do a better job when it comes to excluding files from ddf creation.
With validate-ddf -i .DS_Store and .ipynb files were accidentally encoded into the datapackage.json file whereas ddf_utils skipped over these.

Thanks for any pointers and ideas!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions