You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update documentation and ML pipeline examples
Update examples and visualization tasks
Update queries for getting inputs/outputs of a task
Add JSON files for Stats and Visu example pipelines
Add placeholder for paths in examples/VisuPipeline.json
Remove "estimator" kwarg name in run_method() of Train task
Make output_names optional when creating serializable Task from dict
Add caught error texts into raised error messages
Add ExeKGEditMixin for updating metric values and input dataset of ExeKG
Add method for updating param values of ExeKG's tasks
Add docstrings to ExeKG edit methods
Add ExeKG edit example and update README.md
Minor changes to update_pipeline_name() and fix clear_created_kg()
Handle processing of multiple data splits in a single ExeKG task
Add check to allow optional inputs for ExeKG tasks
Update supported-tasks-and-methods.md
Add flag to query inherited method params only for visu methods
Add requests package
Add catching of OSError when reading pipeline as JSON
Update dependencies in poetry.lock and pyproject.toml
- Upgrade `click` from 8.1.7 to 8.1.8
- Upgrade `jinja2` from 3.1.4 to 3.1.5
- Upgrade `pandas` from 1.5.2 to 2.0.3
- Add new `numpy` versions 1.25.2 and 1.26.4
- Update `numpy` dependency constraints in `pandas`
- Add `tzdata` package version 2024.2
- Upgrade `python-dateutil` to 2.8.2
- Update `scipy` package to version 1.9.3
- Adjust dependency specifications for compatibility
Add missing copyright headers
Copy file name to clipboardExpand all lines: README.md
+19-11Lines changed: 19 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@ ExeKGLib is a Python library that simplifies the construction and execution of M
17
17
18
18
## 🌟 Key Benefits of ExeKGLib
19
19
20
-
1. 🚀 **No-code ML Pipeline Creation**: With ExeKGLib, the user can specify the pipeline's structure and the operations to be performed using a simple JSON file (see [Creating an ML pipeline](#🚀-creating-an-ml-pipeline)), which is then automatically converted to an ExeKG. This ExeKG can be executed to perform the specified operations on the input data (see [Executing an ML pipeline](#🚀-executing-an-ml-pipeline)).
21
-
2. 📦 **Batch Pipeline Creation**: ExeKGLib allows users to create pipelines in a batch fashion through its simple coding interface (see [Creating an ML pipeline](#🚀-creating-an-ml-pipeline)). This enables automatic creation of multiple pipelines as ExeKGs, which can then be queried and analyzed.
20
+
1. 🚀 **No-code ML Pipeline Creation**: With ExeKGLib, the user can specify the pipeline's structure and the operations to be performed using a simple JSON file (see [Creating an ML pipeline](https://boschresearch.github.io/ExeKGLib/usage/#creating-an-ml-pipeline)), which is then automatically converted to an ExeKG. This ExeKG can be executed to perform the specified operations on the input data (see [Executing an ML pipeline](https://boschresearch.github.io/ExeKGLib/usage/#executing-an-ml-pipeline)).
21
+
2. 📦 **Batch Pipeline Creation and Edit**: ExeKGLib allows users to create and edit pipelines in a batch fashion through its simple coding interface (see [Creating an ML pipeline](https://boschresearch.github.io/ExeKGLib/usage/#creating-an-ml-pipeline) and [Editing an ML pipeline](https://boschresearch.github.io/ExeKGLib/usage/#editing-an-ml-pipeline)). This enables automatic creation of multiple pipelines as ExeKGs, which can then be queried and analyzed.
22
22
3. 🔗 **Linked Open Data Integration**: ExeKGLib is a tool that leverages linked open data (LOD) in several significant ways:
23
-
- 📚 **Pipeline Creation Guidance**: It helps guide the user through the pipeline creation process. This is achieved by using a predefined hierarchy of tasks, along with their compatible inputs, outputs, methods, and method parameters (see [available tasks and methods]([task_hierarchy.md](https://boschresearch.github.io/ExeKGLib/supported-methods/))).
24
-
- 🧠 **Enhancing User Understanding**: It enhances the user's understanding of Data Science and the pipeline's functionality. This is achieved by linking the generated pipelines to Knowledge Graph (KG) schemata that encapsulate various Data Science concepts (see [KG schemata](#📜-kg-schemata)).
23
+
- 📚 **Pipeline Creation Guidance**: It helps guide the user through the pipeline creation process. This is achieved by using a predefined hierarchy of tasks, along with their compatible inputs, outputs, methods, and method parameters (see [available tasks and methods](https://boschresearch.github.io/ExeKGLib/supported-tasks-and-methods/)).
24
+
- 🧠 **Enhancing User Understanding**: It enhances the user's understanding of Data Science and the pipeline's functionality. This is achieved by linking the generated pipelines to Knowledge Graph (KG) schemata that encapsulate various Data Science concepts (see [KG schemata](https://boschresearch.github.io/ExeKGLib/external-sources/#kg-schemata)).
25
25
- ✅ **Validation of ExeKGs**: It validates the generated ExeKGs to ensure their executability.
26
26
- 🔄 **Automatic Conversion and Execution**: It automatically converts the ExeKGs to Python code and executes them.
27
27
@@ -48,12 +48,15 @@ For detailed installation instructions, refer to the [installation page](https:/
48
48
We provide [example Python and JSON files](https://github.com/boschresearch/ExeKGLib/tree/main/examples) that can be used to create the following pipelines:
49
49
50
50
1.**🧠 ML pipeline**:
51
-
1.`ml_pipeline_creation[from_json].py` and `MLPipeline.json`: Loads a CSV dataset, concatenates selected features, splits the data into training and testing sets, trains a Support Vector Classifier model, tests the model, calculates performance metrics (accuracy, F1 score, precision, and recall), and visualizes the results in bar plots.
52
-
2.`MLPipelineExtended.json`: An extended version of the above ML pipeline that adds a data splitting step for Stratified K-Fold Cross-Validation. Then, it trains and tests the model using the cross-validation technique and visualizes the validation and test F1 scores in bar plots.
51
+
1.**MLPipelineSimple**: Loads a CSV dataset, concatenates selected features, splits the data into training and testing sets, trains a Support Vector Classifier (SVC) model, tests the model, calculates performance metrics (accuracy, F1 score, precision, and recall), and visualizes the results in bar plots.
52
+
2.**MLPipelineCrossValidation**: An extended version of **MLPipelineSimple** that adds a data splitting step for Stratified K-Fold Cross-Validation. Then, it trains and tests the model using the cross-validation technique and visualizes the validation and test F1 scores in bar plots.
53
+
3.**MLPipelineModelSelection**: A modified version of **MLPipelineSimple** that replaces the training step with a model selection step. Rather than using a fixed model, this pipeline involves training and cross-validating a Support Vector Classifier (SVC) model with various hyperparameters to optimize performance.
53
54
2.**📊 Statistics pipeline**:
54
-
-`stats_pipeline_creation.py`: Loads a specific feature from a CSV dataset, calculates its mean and standard deviation, and visualizes the feature's values using a line plot and the calculated statistics using a bar plot.
55
+
-**StatsPipeline**: Loads a specific feature from a CSV dataset, calculates its mean and standard deviation, and visualizes the feature's values using a line plot and the calculated statistics using a bar plot.
55
56
3.**📈 Visualization pipeline**:
56
-
-`visu_pipeline_creation.py`: The pipeline loads two numerical features from a CSV dataset and visualizes each feature's values using separate line plots.
57
+
-**VisuPipeline**: The pipeline loads two numerical features from a CSV dataset and visualizes each feature's values using separate line plots.
58
+
59
+
> 💡 **Tip**: To fetch the examples into your working directory for easy access, run `typer exe_kg_lib.cli.main run get-examples`.
57
60
58
61
> 🗒️ **Note**: The naming convention for output names (used as inputs for subsequent tasks) in `.json` files can be found in `exe_kg_lib/utils/string_utils.py`. Look for `TASK_OUTPUT_NAME_REGEX`.
59
62
@@ -69,15 +72,20 @@ See [relevant website page](https://boschresearch.github.io/ExeKGLib/supported-t
69
72
### 🚀 Creating an ML pipeline
70
73
71
74
#### 💻 Via code
72
-
See `ml_pipeline_creation.py`, `stats_pipeline_creation.py`, `visu_pipeline_creation.py` in the [provided examples](https://github.com/boschresearch/ExeKGLib/tree/main/examples).
75
+
See the Python files in the [provided examples](https://github.com/boschresearch/ExeKGLib/tree/main/examples).
73
76
74
77
#### 📄 Using JSON
75
-
See `MLPipeline.json` and `ml_pipeline_creation_from_json.py` in the [provided examples](https://github.com/boschresearch/ExeKGLib/tree/main/examples).
78
+
Run `typer exe_kg_lib.cli.main run create-pipeline <json_path>` after replacing `<json_path>` to point to a pipeline's JSON file. See the [provided example JSONs](https://github.com/boschresearch/ExeKGLib/tree/main/examples)
79
+
80
+
> 🗒️ **Note**: Replace `input_data_path` with the path to a dataset and `output_plots_dir` with the directory path where the plots will be saved.
76
81
77
82
#### 🖥️ Step-by-step via CLI
78
83
Run `typer exe_kg_lib.cli.main run create-pipeline`.
79
84
80
-
> 🗒️ **Note**: To fetch the [provided examples](https://github.com/boschresearch/ExeKGLib/tree/main/examples) to your working directory for easy access, run `typer exe_kg_lib.cli.main run get-examples`.
85
+
### 🚀 Editing an ML pipeline
86
+
87
+
#### 💻 Via code
88
+
See the [provided sample script](https://github.com/boschresearch/ExeKGLib/tree/main/examples/ml_pipeline_simple_edit.py).
Copy file name to clipboardExpand all lines: docs/adding-new-task-and-method.md
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -70,15 +70,15 @@ To add the required semantic components:
70
70
```turtle
71
71
ml:{Input1}
72
72
a owl:Class ;
73
-
rdfs:subClassOf ds:DataEntity,
74
-
{Input1DataStructures} .
73
+
rdfs:subClassOf ds:DataEntity, # "ds:DataEntity" can be replaced with a subclass of "ds:Method" like "ml:TrainMethod"
74
+
{Input1DataStructures} . # in case of the above replacement, data structures are not needed
75
75
76
76
...
77
77
78
78
ml:{InputN}
79
79
a owl:Class ;
80
-
rdfs:subClassOf ds:DataEntity,
81
-
{InputNDataStructures} .
80
+
rdfs:subClassOf ds:DataEntity, # "ds:DataEntity" can be replaced with a subclass of "ds:Method" like "ml:TrainMethod"
81
+
{InputNDataStructures} . # in case of the above replacement, data structures are not needed
82
82
83
83
ml:{Output1}
84
84
a owl:Class ;
@@ -267,7 +267,7 @@ To add the required semantic components:
267
267
8. Modify `config.py` in `exe_kg_lib` package to update the value of `KG_SCHEMAS_DIR` to point to the cloned repo's directory from Step 1.
268
268
269
269
## B) Modifying the relevant Python code
270
-
🗒️ **Note**: While modifying the code, consider refering to the conventions mentioned in the [tasks package's documentation](../tasks-package-documentation).
270
+
🗒️ **Note**: While modifying the code, consider refering to the conventions mentioned in the [tasks package's documentation](https://github.com/boschresearch/ExeKGLib/tasks-package-documentation).
0 commit comments