issues with miners and saving metadata
crates.io
- the index is organizied by some hashing thing, and the package data is stored in json lines
- how should we store the upstream metadata in the aboutcode-data repos? should we store the json line entry or recreate the directory structure of the index?
maven
- not certain if the parser we use for parsing the maven index gives us all the fields of pom data
- updating the web crawler should be easy enough as it gets the pom files directly
npm
- the code only looks up the index for the package name and versions and reports those. it does not get the package data proper. the code would need to be updated to have an option to get and store the package data
nuget
- all package data is stored in a single index file, do we save individual index files for each package version we get?
pypi
- update code to dump jsons for individual packages
alpine
- should we save the info of the package individually or just save the apkindex for all packages
composer
- update
get_composer_purl to save package data alongside purl
issues with miners and saving metadata
crates.io
maven
npm
nuget
pypi
alpine
composer
get_composer_purlto save package data alongside purl