-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
@benjelloun cc. @wumpus
Sharing a draft zip file as followup to #961
CCF_crawl_croissants_and_provenance_mockup.zip
Zip file includes:
- 117 croissant drafts, one for each of our crawls.
- 1 mockup example for provenance citation to our crawls
- This kind of hierarchy doesnt exist in our crawls, so we wont actually have this file in CCF, but a mockup for datasets referring to CCF.
We would like feedback especially on:
-
How we are using provenance
- eg. I havent used id's because these are not referred to in the same croissant
- but is it valid/important to use id's to refer to other croissants?
-
in how we use
"distribution"with FileObjects and FileSets- They include a bunch of FileObjects that act as manifest files - including paths to files included in the related FileSet
- eg.,
warc.paths.gzFileObject pointing to
- eg.,
- And one additional FileObject example that keeps the data itself, so just a FileObject
- They include a bunch of FileObjects that act as manifest files - including paths to files included in the related FileSet
Please let us know if anything looks awry!
Changes since #961:
- New FileObject added:
{crawl_id}.domains-top-1000(crawls > 2012) - Switched to using MAJOR.MINOR.PATCH also for build version:
1.0.0+1.0.0
Metadata
Metadata
Assignees
Labels
No labels