-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Let's think of how to redesign the API of run_recipe. This is still very work-in-progress ; )
Goal is a user friendly API that allows you to:
- use a git repo as a dataset, local or remote
- define a branch/commit for such git repo
- use the remote head of a branch, through local repo to minimize network traffic. Maybe fetch & checkout to other folder. Or pull if it's fine to change local repo.
- allow build from a local, private build environment, that's safe from interference in both ways (e.g. other process changes files during build (e.g.
git checkout), or not wanting to pull/checkout a shared local git repo to get files you need for commit)
ddf run_recipe -o <output_path> --ddf_dir <dataset_path> -i <recipe_path> --show-tree -d -p --private --private_path <private_path> --remote-head
builds dataset using recipe and other base datasets
-i <recipe_path> defaults to ./etl/recipes/etl.yml or ./etl/recipes/etl.yaml
-o <output_path> defaults to ./
--ddf_dir <dataset_path> defaults to ./../../
--private [<private_path>] use private build environment. Copy any base datasets first to private folder
--private_path defaults to ./etl/source/ maybe?
--remote-head use the head of the origin remote branch for any local git repo dataset. git pull or when private git fetch && export to different folder
This allows being in a dataset's root and just running ddf run_recipe to build the dataset
Base dataset specifier format, used in recipes. Based on pip install.
We don't have an index yet, so options are either a local path or git url. If either is relative, it's relative to <dataset_path>
paths:
./open-numbers/ddf--gapminder--wdi
justonefolder
urls (get resources from ./datapackage.json, then download, though does not allow download of assets, etl folder, etc):
https://raw.githubusercontent.com/open-numbers/ddf--gapminder--co2_emission/autogenerated/
http://example.com/dataset/
git urls (git+ allows extending functionality to other applications later. git protocols)
git+ssh://github.com/open-numbers/ddf--gapminder--wdi
[email protected]:open-numbers/ddf--gapminder--wdi (scp style syntax for ssh)
git+https://github.com/open-numbers/ddf--gapminder--wdi
git+file:///home/user/git/open-numbers/ddf--gapminder--wdi
git+open-numbers/ddf--gapminder--wdi
Adding a branch name, a commit hash, a tag name or a git ref is possible like so:
git+https://github.com/open-numbers/ddf--gapminder--wdi@master
git+https://github.com/open-numbers/[email protected]
git+https://github.com/open-numbers/ddf--gapminder--wdi@da39a3ee5e6b4b0d3255bfef95601890afd80709
git+https://github.com/open-numbers/ddf--gapminder--wdi@refs/pull/123/head
git+open-numbers/ddf--gapminder--wdi@master
Maybe we don't need --private if we define the base branches/commits in the recipe? I don't recall the exact problems we had in datakitchen that lead us to thinking we'd need it. Should look that up in chat logs.