Skip to content

Commit 9c18d8c

Browse files
authored
Merge pull request PolusAI#200 from vjaganat90/gconf2
coalesce different txt based config files into one json
2 parents 68e3bf7 + 9b18dbc commit 9c18d8c

18 files changed

+292
-216
lines changed

.gitignore

+1-4
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,8 @@ docs/_build/
1717
gromacs_mdp.html
1818
NCI*
1919
error_*.txt
20+
*config.json
2021
validation_*.txt
21-
cwl_dirs.txt
22-
yml_dirs.txt
23-
inference_rules.txt
24-
renaming_conventions.txt
2522
.hypothesis/
2623
.env
2724
node_modules/

docs/advanced.md

+23-17
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,38 @@
33
## Edge Inference Configuration
44

55
### Naming Conventions
6-
If `--inference_use_naming_conventions` is enabled, matches can be refined based on the naming conventions of the inputs and outputs in the CWL CommandLineTools. Specifically, first `input_` is removed from the input name and `output_` is removed from all output names. Then, the default renamings contained in `renaming_conventions.txt` (shown below) are iteratively applied to the input name (only), and then the modified input name and all of the output names are checked for equality.
6+
If `--inference_use_naming_conventions` is enabled, matches can be refined based on the naming conventions of the inputs and outputs in the CWL CommandLineTools. Specifically, first `input_` is removed from the input name and `output_` is removed from all output names. Then, the default renamings contained in `renaming_conventions` tag of `config.json` (shown below) are iteratively applied to the input name (only), and then the modified input name and all of the output names are checked for equality.
77

88
```
9-
# The biobb CWL files do not always use consistent naming
10-
# conventions, so we need to perform some renamings here.
11-
# Eventually, the CWL files themselves should be fixed.
12-
13-
energy_ edr_
14-
structure_ tpr_
15-
traj_ trr_
9+
"renaming_conventions": [
10+
[
11+
"energy_",
12+
"edr_"
13+
],
14+
[
15+
"structure_",
16+
"tpr_"
17+
],
18+
[
19+
"traj_",
20+
"trr_"
21+
]
22+
]
1623
```
1724

1825
If there is now a unique match, then great! If there are still multiple matches, it chooses the first (i.e. most recent) match. If there are now no matches, it ignores the naming conventions and chooses the first (i.e. most recent) match based on types and formats only. If there are still multiple matches, it again chooses the first (i.e. most recent) match. Note that there are cases (i.e. file format conversions) where using naming conventions may not yield the desired behavior, so again ***`users should always check that edge inference actually produces the intended DAG`***.
1926

2027
### Inference Rules
2128

22-
Users can customize the inference algorithm using inference rules. The default inference rules stored in inference_rules.txt are shown below:
29+
Users can customize the inference algorithm using inference rules. The default inference rules stored in `inference_rules` tag of `config.json` are shown below:
2330

2431
```
25-
# Amber, gromacs (zipped 3880) topology
26-
edam:format_3881 continue
27-
edam:format_3987 continue
28-
29-
# Amber, gromacs coordinates
30-
edam:format_3878 break
31-
edam:format_2033 break
32+
"inference_rules": {
33+
"edam:format_3881": "continue",
34+
"edam:format_3987": "continue",
35+
"edam:format_3878": "break",
36+
"edam:format_2033": "break"
37+
}
3238
```
3339

3440
Currently, the only inference rule implemented is `break`, which stops the inference algorithm from considering any further outputs beyond the current output from matching the current input. (The current output is allowed, i.e. break is inclusive.) This is useful when the most recent output file is desired, but the inference algorithm for some reason doesn't match it and chooses a subsequent / earlier file. This can happen when converting from one file format, performing a workflow step, and converting back to the original format, where in some cases the inference algorithm may choose the original file, thus accidentally skipping the workflow step.
@@ -229,7 +235,7 @@ wic:
229235

230236
## Namespaces
231237

232-
Namespaces can be used to distinguish two different tools / workflows with the same name from different sources. For example, suppose a collaborator has shared an alternative minimization protocol, which we have downloaded to `bar/min.yml`. We can use their protocol by adding the line `foo bar/` to `yml_dirs.txt` and annotating the call site in `basic.yml` with `namespace: foo` as shown below.
238+
Namespaces can be used to distinguish two different tools / workflows with the same name from different sources. For example, suppose a collaborator has shared an alternative minimization protocol, which we have downloaded to `bar/min.yml`. We can use their protocol by adding the namespace tag `foo` to `search_paths_yml` tag of `config.json` and annotating the call site in `basic.yml` with `namespace: foo` as shown below.
233239

234240
```yaml
235241
...

docs/tutorials/config_ci.json

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
// NOTE: This file should be in one of the directories listed in yml_dirs.txt
1+
// NOTE: This file should be in one of the directories listed in 'search_paths_yml'
2+
// tag of config.json
23
// (Technically comments are not allowed in JSON, but we are manually
34
// stripping out all lines that start with // before parsing.)
45
{

docs/userguide.md

+22-10
Original file line numberDiff line numberDiff line change
@@ -10,24 +10,36 @@ See [overview](overview.md)
1010

1111
Many software packages have a way of automatically discovering files which they can use. (examples: [pytest](https://docs.pytest.org/en/latest/explanation/goodpractices.html#conventions-for-python-test-discovery) [pylint](https://pylint.pycqa.org/en/latest/user_guide/usage/run.html))
1212

13-
By default, wic will recursively search for tools / workflows within the directories (and subdirectories) listed in `~/wic/cwl_dirs.txt` and `~/wic/yml_dirs.txt`. The default cwl_dirs.txt is shown.
13+
By default, wic will recursively search for tools / workflows within the directories (and subdirectories) listed in the config file's json tags `search_paths_cwl` and `search_paths_yml`. The paths listed can be absolute or relative. The default `config.json` is shown.
1414

1515
***`We strongly recommend placing all repositories of tools / workflows in the same parent directory.`***
1616

1717
(All your repos should be side-by-side in sibling directories, as shown.)
1818

1919
```
20-
# Namespace Directory
21-
global ../workflow-inference-compiler/cwl_adapters/
22-
global ../image-workflows/cwl_adapters/
23-
global ../biobb_adapters/biobb_adapters/
24-
global ../mm-workflows/cwl_adapters/
25-
gpu ../mm-workflows/gpu/
26-
# foo a/relative/path/
27-
# bar /an/absolute/path/
20+
......
21+
"search_paths_cwl": {
22+
"global": [
23+
"../workflow-inference-compiler/cwl_adapters",
24+
"../image-workflows/cwl_adapters",
25+
"../biobb_adapters/biobb_adapters",
26+
"../mm-workflows/cwl_adapters"
27+
],
28+
"gpu": [
29+
"../mm-workflows/gpu"
30+
]
31+
},
32+
"search_paths_yml": {
33+
"global": [
34+
"./workflow-inference-compiler/docs/tutorials",
35+
"../image-workflows/workflows",
36+
"../mm-workflows/examples"
37+
]
38+
}
39+
.....
2840
```
2941

30-
If you do not have these files in your `~/wic/` directory, they will be automatically created for you the first time you run wic. (Because of this, the first time you run wic you should be in the root directory of any one of your repos.) Then you can manually edit these files with additional sources of tools / workflows.
42+
If you do not specify config file using the command line argument `--config`, it will be automatically created for you the first time you run wic in `~/wic/global_config.json`. (Because of this, the first time you run wic you should be in the root directory of any one of your repos.) Then you can manually edit this file with additional sources of tools / workflows.
3143

3244
To avoid dealing with relative file paths in YAML files, by default
3345

pyproject.toml

+3-2
Original file line numberDiff line numberDiff line change
@@ -128,14 +128,15 @@ version = {attr = "wic.__version__"}
128128

129129
[tool.setuptools]
130130
package-dir = {"" = "src"}
131-
include-package-data = false
131+
include-package-data = true
132132

133133
[tool.setuptools.packages.find]
134134
where = ["src"]
135-
namespaces = false
135+
namespaces = true
136136

137137
[tool.setuptools.package-data]
138138
"*" = ["*.txt"]
139+
"wic" = ["*.json"]
139140

140141
[tool.aliases]
141142
test = "pytest --workers 8"

src/wic/ast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def read_ast_from_disk(homedir: str,
9393
if paths_ns_i == {}:
9494
wicdir = Path(homedir) / 'wic'
9595
raise Exception(
96-
f'Error! namespace {plugin_ns} not found in yaml paths. Check {wicdir / "yml_dirs.txt"}')
96+
f"Error! namespace {plugin_ns} not found in yaml paths. Check 'search_paths_yml' in your config file")
9797
if stem not in paths_ns_i:
9898
msg = f'Error! {stem} not found in namespace {plugin_ns} when attempting to read {step_id.stem}.yml'
9999
if stem == 'in':

src/wic/cli.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@
77
parser = argparse.ArgumentParser(prog='main', description='Convert a high-level yaml workflow file to CWL.')
88
parser.add_argument('--yaml', type=str, required=('--generate_schemas_only' not in sys.argv),
99
help='Yaml workflow file')
10+
parser.add_argument('--config_file', type=str, required=False, default=str(Path().home()/'wic'/'global_config.json'),
11+
help='User provided (JSON) config file')
1012
# version action exits the parser Ref : https://github.com/python/cpython/blob/1f515e8a109204f7399d85b7fd806135166422d9/Lib/argparse.py#L1167
1113
parser.add_argument('--version', action='version', version=__version__,
1214
default='==SUPPRESS==', help='Current version of the Workflow Inference Compiler')
1315
parser.add_argument('--generate_schemas_only', default=False, action="store_true",
14-
help='Generate schemas for the files in ~/wic/cwl_dirs.txt and ~/wic/yml_dirs.txt')
16+
help='Generate schemas for the files in config.json (search_paths_yml and search_paths_cwl)')
1517
parser.add_argument('--homedir', type=str, required=False, default=str(Path().home()),
1618
help='The users home directory. This is necessary because CWL clears environment variables (e.g. HOME)')
1719
# Change default to True for now. See comment in compiler.py

src/wic/config.json

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{
2+
"search_paths_cwl": {
3+
"global": [
4+
"../workflow-inference-compiler/cwl_adapters",
5+
"../image-workflows/cwl_adapters",
6+
"../biobb_adapters/biobb_adapters",
7+
"../mm-workflows/cwl_adapters"
8+
],
9+
"gpu": [
10+
"../mm-workflows/gpu"
11+
]
12+
},
13+
"search_paths_yml": {
14+
"global": [
15+
"../workflow-inference-compiler/docs/tutorials",
16+
"../image-workflows/workflows",
17+
"../mm-workflows/examples"
18+
]
19+
},
20+
"renaming_conventions": [
21+
["energy_", "edr_"],
22+
["structure_", "tpr_"],
23+
["traj_", "trr_"]
24+
],
25+
"inference_rules": {
26+
"edam:format_3881": "continue",
27+
"edam:format_3987": "continue",
28+
"edam:format_3878": "break",
29+
"edam:format_2033": "break"
30+
}
31+
}

src/wic/cwl_dirs.txt

-16
This file was deleted.

src/wic/cwl_watcher.py

+23-5
Original file line numberDiff line numberDiff line change
@@ -258,14 +258,32 @@ def main() -> None:
258258
logfile.touch()
259259

260260
args_vals = json.loads(args.config)
261+
# In CWL all env variables are hidden by default so Path().home() doesn't work
262+
# Also User may specify a different homedir
263+
default_config_file = Path(args.homedir)/'wic'/'global_config.json'
264+
global_config: Json = {}
265+
if not Path(args.config_file).exists():
266+
if Path(args.config_file) == default_config_file:
267+
global_config = io.get_default_config()
268+
# write the default config object to the 'global_config.json' file in user's ~/wic directory
269+
# for user to inspect and or modify the config json file
270+
io.write_config_to_disk(global_config, default_config_file)
271+
print(f'default config file : {default_config_file} generated')
272+
else:
273+
print(f"Error user specified config file {args.config_file} doesn't exist")
274+
sys.exit()
275+
else:
276+
# reading user specified config file only if it exists
277+
# never overwrite user's config file or generate another file in user's non-default directory
278+
# TODO : Validate the json inside 'read_config_from_disk' function
279+
global_config = io.read_config_from_disk(Path(args.config_file))
261280

262-
tools_cwl = get_tools_cwl(args.homedir, quiet=args.quiet)
263-
yml_paths = get_yml_paths(args.homedir)
281+
tools_cwl = get_tools_cwl(global_config, quiet=args.quiet)
282+
yml_paths = get_yml_paths(global_config)
264283

265284
# Perform initialization via mutating global variables (This is not ideal)
266-
wicdir = Path(args.homedir) / 'wic'
267-
compiler.inference_rules = dict(io.read_lines_pairs(wicdir / 'inference_rules.txt'))
268-
inference.renaming_conventions = io.read_lines_pairs(wicdir / 'renaming_conventions.txt')
285+
compiler.inference_rules = global_config.get('inference_rules', {})
286+
inference.renaming_conventions = global_config.get('renaming_conventions', [])
269287

270288
# Generate schemas for validation
271289
yaml_stems = utils.flatten([list(p) for p in yml_paths.values()])

src/wic/inference_rules.txt

-10
This file was deleted.

src/wic/input_output.py

+53-31
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
import argparse
2-
import logging
2+
import copy
33
import json
44
from pathlib import Path
5-
import subprocess as sub
65
from typing import Any, List, Tuple
76

87
import yaml
98

109
from . import auto_gen_header
11-
from .wic_types import (Namespaces, NodeData, RoseTree, Yaml, ExplicitEdgeCalls)
10+
from .wic_types import (Namespaces, NodeData, RoseTree, Yaml, ExplicitEdgeCalls, Json)
1211

1312

1413
def read_lines_pairs(filename: Path) -> List[Tuple[str, str]]:
@@ -112,43 +111,69 @@ def write_to_disk(rose_tree: RoseTree, path: Path, relative_run_path: bool) -> N
112111
write_to_disk(sub_rose_tree, subpath, relative_run_path)
113112

114113

115-
logger_wicad = logging.getLogger("wicautodiscovery")
114+
def write_config_to_disk(config: Json, config_file: Path) -> None:
115+
"""Writes config json object to config_file
116116
117+
Args:
118+
config (Json): The json object that is to be written to disk
119+
config_file (Path): The file path where it is to be written
120+
"""
121+
config_dir = Path(config_file).parent
122+
# make the full path if it doesn't exist
123+
config_dir.mkdir(parents=True, exist_ok=True)
124+
with open(config_file, 'w', encoding='utf-8') as f:
125+
json.dump(config, f)
117126

118-
def copy_config_files(homedir: str) -> None:
119-
"""Copies the following configuration files to ~/wic/\n
120-
cwl_dirs.txt, yml_dirs.txt, renaming_conventions.txt, inference_rules.txt
127+
128+
def read_config_from_disk(config_file: Path) -> Json:
129+
"""Returns the config json object from config_file with absolute paths
121130
122131
Args:
123-
homedir (str): The users home directory
132+
config_file (Path): The path of json file where it is to be read from
133+
134+
Returns:
135+
Json: The config json object with absolute filepaths
124136
"""
125-
files = ['cwl_dirs.txt', 'yml_dirs.txt', 'renaming_conventions.txt', 'inference_rules.txt']
126-
src_dir = Path(__file__).parent
127-
wicdir = Path(homedir) / 'wic'
128-
wicdir.mkdir(exist_ok=True)
137+
config: Json = {}
138+
# config_file can contain absolute or relative paths
139+
with open(config_file, 'r', encoding='utf-8') as f:
140+
config = json.load(f)
141+
conf_tags = ['search_paths_cwl', 'search_paths_yml']
142+
for tag in conf_tags:
143+
config[tag] = get_absolute_paths(config[tag])
144+
return config
129145

130-
for file in files:
131-
if not (wicdir / file).exists():
132-
logger_wicad.warning(f'Writing {str(wicdir / file)}')
133-
logger_wicad.warning('Please check this file and make sure that the paths in it are correct.')
134-
cmd = ['cp', str(src_dir / file), str(wicdir / file)]
135-
sub.run(cmd, check=True)
136146

137-
write_absolute_config_files(wicdir / 'cwl_dirs.txt')
138-
write_absolute_config_files(wicdir / 'yml_dirs.txt')
147+
def get_default_config() -> Json:
148+
"""Returns the default config with absolute paths
149+
150+
Returns:
151+
Json: The config json object with absolute filepaths
152+
"""
153+
src_dir = Path(__file__).parent
154+
conf_tags = ['search_paths_cwl', 'search_paths_yml']
155+
default_config: Json = {}
156+
# config.json can contain absolute or relative paths
157+
default_config = read_config_from_disk(src_dir/'config.json')
158+
for tag in conf_tags:
159+
default_config[tag] = get_absolute_paths(default_config[tag])
160+
return default_config
139161

140162

141-
def write_absolute_config_files(dirs_file: Path) -> None:
142-
"""Makes the paths within the \*_dirs.txt files absolute
163+
def get_absolute_paths(sub_config: Json) -> Json:
164+
"""Makes the paths within the dirs_file file absolute and write them into sub_config object.
143165
144166
Args:
145-
dirs_file (Path): The path to the \*_dirs.txt file
146-
dirs_file_abs (str): The path to the absolute \*_dirs.txt file
167+
sub_config (dict): The json (sub)object where filepaths are stored
168+
169+
Returns:
170+
Json: The json (sub)object with absolute filepaths
147171
"""
148-
ns_paths = read_lines_pairs(dirs_file)
149-
pairs_abs = [ns + ' ' + str(Path(path).absolute()) for ns, path in ns_paths]
150-
with open(dirs_file, mode='w', encoding='utf-8') as f:
151-
f.write('\n'.join(pairs_abs))
172+
abs_sub_config = copy.deepcopy(sub_config)
173+
for ns in abs_sub_config:
174+
abs_paths = [str(Path(path).absolute()) for path in abs_sub_config[ns]]
175+
abs_sub_config[ns] = abs_paths
176+
return abs_sub_config
152177

153178

154179
def write_absolute_yaml_tags(args: argparse.Namespace, in_dict_in: Yaml, namespaces: Namespaces,
@@ -178,6 +203,3 @@ def write_absolute_yaml_tags(args: argparse.Namespace, in_dict_in: Yaml, namespa
178203
for arg_key_ in arg_keys_:
179204
in_name_ = f'{step_name_i}___{arg_key_}' # {step_name_i}_input___{arg_key}
180205
explicit_edge_calls_copy.update({in_name_: (namespaces + [step_name_i], arg_key_)})
181-
182-
write_absolute_config_files(Path(args.homedir) / 'wic' / 'cwl_dirs.txt')
183-
write_absolute_config_files(Path(args.homedir) / 'wic' / 'yml_dirs.txt')

0 commit comments

Comments
 (0)