-
Notifications
You must be signed in to change notification settings - Fork 2
Description
We are using both Python (https://github.com/semio/ddf_utils) and Javascript tooling to generate the datapackage.json with its ddfSchema property.
When running on the very same dataset, the Python-based generator results in a 50% larger datapackage.json file.
It would be interesting to hear your thoughts (@buchslava, @semio) about harmonising the two libraries. So far we have identified 4 differences in outcome:
1. Resource.name is encoded differently:
validate-ddf
"path": "ddf--entities--jurisdiction.csv",
"name": "jurisdiction"
ddf_utils
"path": "ddf--entities--jurisdiction.csv",
"name": "ddf--entities--jurisdiction"
2. The default datapackage.json properties differ
The JavaScript version typically adds more placeholders such as title, license, author, version) whereas ddf_utils generates a bare minimum (name).
3. Python ddf_utils does not seem to work with multiple measures in one file?
ddf--datapoints--measure--measure--by--country--year.csv
4. Different files are excluded
The Python tools seem to do a better job when it comes to excluding files from ddf creation.
With validate-ddf -i .DS_Store and .ipynb files were accidentally encoded into the datapackage.json file whereas ddf_utils skipped over these.
Thanks for any pointers and ideas!