Skip to content

Commit 9c18d8c

Browse files
authored
Merge pull request PolusAI#200 from vjaganat90/gconf2
coalesce different txt based config files into one json
2 parents 68e3bf7 + 9b18dbc commit 9c18d8c

18 files changed

+292
-216
lines changed

.gitignore

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,8 @@ docs/_build/
1717
gromacs_mdp.html
1818
NCI*
1919
error_*.txt
20+
*config.json
2021
validation_*.txt
21-
cwl_dirs.txt
22-
yml_dirs.txt
23-
inference_rules.txt
24-
renaming_conventions.txt
2522
.hypothesis/
2623
.env
2724
node_modules/

docs/advanced.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,38 @@
33
## Edge Inference Configuration
44

55
### Naming Conventions
6-
If `--inference_use_naming_conventions` is enabled, matches can be refined based on the naming conventions of the inputs and outputs in the CWL CommandLineTools. Specifically, first `input_` is removed from the input name and `output_` is removed from all output names. Then, the default renamings contained in `renaming_conventions.txt` (shown below) are iteratively applied to the input name (only), and then the modified input name and all of the output names are checked for equality.
6+
If `--inference_use_naming_conventions` is enabled, matches can be refined based on the naming conventions of the inputs and outputs in the CWL CommandLineTools. Specifically, first `input_` is removed from the input name and `output_` is removed from all output names. Then, the default renamings contained in `renaming_conventions` tag of `config.json` (shown below) are iteratively applied to the input name (only), and then the modified input name and all of the output names are checked for equality.
77

88
```
9-
# The biobb CWL files do not always use consistent naming
10-
# conventions, so we need to perform some renamings here.
11-
# Eventually, the CWL files themselves should be fixed.
12-
13-
energy_ edr_
14-
structure_ tpr_
15-
traj_ trr_
9+
"renaming_conventions": [
10+
[
11+
"energy_",
12+
"edr_"
13+
],
14+
[
15+
"structure_",
16+
"tpr_"
17+
],
18+
[
19+
"traj_",
20+
"trr_"
21+
]
22+
]
1623
```
1724

1825
If there is now a unique match, then great! If there are still multiple matches, it chooses the first (i.e. most recent) match. If there are now no matches, it ignores the naming conventions and chooses the first (i.e. most recent) match based on types and formats only. If there are still multiple matches, it again chooses the first (i.e. most recent) match. Note that there are cases (i.e. file format conversions) where using naming conventions may not yield the desired behavior, so again ***`users should always check that edge inference actually produces the intended DAG`***.
1926

2027
### Inference Rules
2128

22-
Users can customize the inference algorithm using inference rules. The default inference rules stored in inference_rules.txt are shown below:
29+
Users can customize the inference algorithm using inference rules. The default inference rules stored in `inference_rules` tag of `config.json` are shown below:
2330

2431
```
25-
# Amber, gromacs (zipped 3880) topology
26-
edam:format_3881 continue
27-
edam:format_3987 continue
28-
29-
# Amber, gromacs coordinates
30-
edam:format_3878 break
31-
edam:format_2033 break
32+
"inference_rules": {
33+
"edam:format_3881": "continue",
34+
"edam:format_3987": "continue",
35+
"edam:format_3878": "break",
36+
"edam:format_2033": "break"
37+
}
3238
```
3339

3440
Currently, the only inference rule implemented is `break`, which stops the inference algorithm from considering any further outputs beyond the current output from matching the current input. (The current output is allowed, i.e. break is inclusive.) This is useful when the most recent output file is desired, but the inference algorithm for some reason doesn't match it and chooses a subsequent / earlier file. This can happen when converting from one file format, performing a workflow step, and converting back to the original format, where in some cases the inference algorithm may choose the original file, thus accidentally skipping the workflow step.
@@ -229,7 +235,7 @@ wic:
229235

230236
## Namespaces
231237

232-
Namespaces can be used to distinguish two different tools / workflows with the same name from different sources. For example, suppose a collaborator has shared an alternative minimization protocol, which we have downloaded to `bar/min.yml`. We can use their protocol by adding the line `foo bar/` to `yml_dirs.txt` and annotating the call site in `basic.yml` with `namespace: foo` as shown below.
238+
Namespaces can be used to distinguish two different tools / workflows with the same name from different sources. For example, suppose a collaborator has shared an alternative minimization protocol, which we have downloaded to `bar/min.yml`. We can use their protocol by adding the namespace tag `foo` to `search_paths_yml` tag of `config.json` and annotating the call site in `basic.yml` with `namespace: foo` as shown below.
233239

234240
```yaml
235241
...

docs/tutorials/config_ci.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
// NOTE: This file should be in one of the directories listed in yml_dirs.txt
1+
// NOTE: This file should be in one of the directories listed in 'search_paths_yml'
2+
// tag of config.json
23
// (Technically comments are not allowed in JSON, but we are manually
34
// stripping out all lines that start with // before parsing.)
45
{

docs/userguide.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,24 +10,36 @@ See [overview](overview.md)
1010

1111
Many software packages have a way of automatically discovering files which they can use. (examples: [pytest](https://docs.pytest.org/en/latest/explanation/goodpractices.html#conventions-for-python-test-discovery) [pylint](https://pylint.pycqa.org/en/latest/user_guide/usage/run.html))
1212

13-
By default, wic will recursively search for tools / workflows within the directories (and subdirectories) listed in `~/wic/cwl_dirs.txt` and `~/wic/yml_dirs.txt`. The default cwl_dirs.txt is shown.
13+
By default, wic will recursively search for tools / workflows within the directories (and subdirectories) listed in the config file's json tags `search_paths_cwl` and `search_paths_yml`. The paths listed can be absolute or relative. The default `config.json` is shown.
1414

1515
***`We strongly recommend placing all repositories of tools / workflows in the same parent directory.`***
1616

1717
(All your repos should be side-by-side in sibling directories, as shown.)
1818

1919
```
20-
# Namespace Directory
21-
global ../workflow-inference-compiler/cwl_adapters/
22-
global ../image-workflows/cwl_adapters/
23-
global ../biobb_adapters/biobb_adapters/
24-
global ../mm-workflows/cwl_adapters/
25-
gpu ../mm-workflows/gpu/
26-
# foo a/relative/path/
27-
# bar /an/absolute/path/
20+
......
21+
"search_paths_cwl": {
22+
"global": [
23+
"../workflow-inference-compiler/cwl_adapters",
24+
"../image-workflows/cwl_adapters",
25+
"../biobb_adapters/biobb_adapters",
26+
"../mm-workflows/cwl_adapters"
27+
],
28+
"gpu": [
29+
"../mm-workflows/gpu"
30+
]
31+
},
32+
"search_paths_yml": {
33+
"global": [
34+
"./workflow-inference-compiler/docs/tutorials",
35+
"../image-workflows/workflows",
36+
"../mm-workflows/examples"
37+
]
38+
}
39+
.....
2840
```
2941

30-
If you do not have these files in your `~/wic/` directory, they will be automatically created for you the first time you run wic. (Because of this, the first time you run wic you should be in the root directory of any one of your repos.) Then you can manually edit these files with additional sources of tools / workflows.
42+
If you do not specify config file using the command line argument `--config`, it will be automatically created for you the first time you run wic in `~/wic/global_config.json`. (Because of this, the first time you run wic you should be in the root directory of any one of your repos.) Then you can manually edit this file with additional sources of tools / workflows.
3143

3244
To avoid dealing with relative file paths in YAML files, by default
3345

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,14 +128,15 @@ version = {attr = "wic.__version__"}
128128

129129
[tool.setuptools]
130130
package-dir = {"" = "src"}
131-
include-package-data = false
131+
include-package-data = true
132132

133133
[tool.setuptools.packages.find]
134134
where = ["src"]
135-
namespaces = false
135+
namespaces = true
136136

137137
[tool.setuptools.package-data]
138138
"*" = ["*.txt"]
139+
"wic" = ["*.json"]
139140

140141
[tool.aliases]
141142
test = "pytest --workers 8"

src/wic/ast.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def read_ast_from_disk(homedir: str,
9393
if paths_ns_i == {}:
9494
wicdir = Path(homedir) / 'wic'
9595
raise Exception(
96-
f'Error! namespace {plugin_ns} not found in yaml paths. Check {wicdir / "yml_dirs.txt"}')
96+
f"Error! namespace {plugin_ns} not found in yaml paths. Check 'search_paths_yml' in your config file")
9797
if stem not in paths_ns_i:
9898
msg = f'Error! {stem} not found in namespace {plugin_ns} when attempting to read {step_id.stem}.yml'
9999
if stem == 'in':

src/wic/cli.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@
77
parser = argparse.ArgumentParser(prog='main', description='Convert a high-level yaml workflow file to CWL.')
88
parser.add_argument('--yaml', type=str, required=('--generate_schemas_only' not in sys.argv),
99
help='Yaml workflow file')
10+
parser.add_argument('--config_file', type=str, required=False, default=str(Path().home()/'wic'/'global_config.json'),
11+
help='User provided (JSON) config file')
1012
# version action exits the parser Ref : https://github.com/python/cpython/blob/1f515e8a109204f7399d85b7fd806135166422d9/Lib/argparse.py#L1167
1113
parser.add_argument('--version', action='version', version=__version__,
1214
default='==SUPPRESS==', help='Current version of the Workflow Inference Compiler')
1315
parser.add_argument('--generate_schemas_only', default=False, action="store_true",
14-
help='Generate schemas for the files in ~/wic/cwl_dirs.txt and ~/wic/yml_dirs.txt')
16+
help='Generate schemas for the files in config.json (search_paths_yml and search_paths_cwl)')
1517
parser.add_argument('--homedir', type=str, required=False, default=str(Path().home()),
1618
help='The users home directory. This is necessary because CWL clears environment variables (e.g. HOME)')
1719
# Change default to True for now. See comment in compiler.py

src/wic/config.json

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"search_paths_cwl": {
3+
"global": [
4+
"../workflow-inference-compiler/cwl_adapters",
5+
"../image-workflows/cwl_adapters",
6+
"../biobb_adapters/biobb_adapters",
7+
"../mm-workflows/cwl_adapters"
8+
],
9+
"gpu": [
10+
"../mm-workflows/gpu"
11+
]
12+
},
13+
"search_paths_yml": {
14+
"global": [
15+
"../workflow-inference-compiler/docs/tutorials",
16+
"../image-workflows/workflows",
17+
"../mm-workflows/examples"
18+
]
19+
},
20+
"renaming_conventions": [
21+
["energy_", "edr_"],
22+
["structure_", "tpr_"],
23+
["traj_", "trr_"]
24+
],
25+
"inference_rules": {
26+
"edam:format_3881": "continue",
27+
"edam:format_3987": "continue",
28+
"edam:format_3878": "break",
29+
"edam:format_2033": "break"
30+
}
31+
}

src/wic/cwl_dirs.txt

Lines changed: 0 additions & 16 deletions
This file was deleted.

src/wic/cwl_watcher.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -258,14 +258,32 @@ def main() -> None:
258258
logfile.touch()
259259

260260
args_vals = json.loads(args.config)
261+
# In CWL all env variables are hidden by default so Path().home() doesn't work
262+
# Also User may specify a different homedir
263+
default_config_file = Path(args.homedir)/'wic'/'global_config.json'
264+
global_config: Json = {}
265+
if not Path(args.config_file).exists():
266+
if Path(args.config_file) == default_config_file:
267+
global_config = io.get_default_config()
268+
# write the default config object to the 'global_config.json' file in user's ~/wic directory
269+
# for user to inspect and or modify the config json file
270+
io.write_config_to_disk(global_config, default_config_file)
271+
print(f'default config file : {default_config_file} generated')
272+
else:
273+
print(f"Error user specified config file {args.config_file} doesn't exist")
274+
sys.exit()
275+
else:
276+
# reading user specified config file only if it exists
277+
# never overwrite user's config file or generate another file in user's non-default directory
278+
# TODO : Validate the json inside 'read_config_from_disk' function
279+
global_config = io.read_config_from_disk(Path(args.config_file))
261280

262-
tools_cwl = get_tools_cwl(args.homedir, quiet=args.quiet)
263-
yml_paths = get_yml_paths(args.homedir)
281+
tools_cwl = get_tools_cwl(global_config, quiet=args.quiet)
282+
yml_paths = get_yml_paths(global_config)
264283

265284
# Perform initialization via mutating global variables (This is not ideal)
266-
wicdir = Path(args.homedir) / 'wic'
267-
compiler.inference_rules = dict(io.read_lines_pairs(wicdir / 'inference_rules.txt'))
268-
inference.renaming_conventions = io.read_lines_pairs(wicdir / 'renaming_conventions.txt')
285+
compiler.inference_rules = global_config.get('inference_rules', {})
286+
inference.renaming_conventions = global_config.get('renaming_conventions', [])
269287

270288
# Generate schemas for validation
271289
yaml_stems = utils.flatten([list(p) for p in yml_paths.values()])

0 commit comments

Comments
 (0)