Skip to content

Commit d0fdab5

Browse files
Eyal-Danielidanielperezziguazio-cicdguylei-codeamitnGiniApps
authored
Cherry Pick from Development (#938)
* replace author to Iguazio manually (#905) * Organize CLI directory + new CLI for generating item.yaml files (#906) * create a CLI for generating item.yaml and organize the CLI directory * modify comments to module * PR fixes * Update cli/common/generate_item_yaml.py Co-authored-by: Eyal Danieli <eyal_danieli@mckinsey.com> --------- Co-authored-by: Eyal Danieli <eyal_danieli@mckinsey.com> * fill count events notebook (#908) * avoid noise reduction unit test (#909) * Add histogram-data-drift monitoring application module (without example) (#911) * histogram data drift module with empty example notebook * post review fixes * chore(readme): auto-update asset tables [skip ci] * Fill histogram-data-drift example notebook (#912) * fill data-drift nb * post review fixes * Add evidently demo app monitoring application module (without example) (#913) * sphinx build docs bug fix * add evidently demo app module (empty example notebook) * post review changes * chore(readme): auto-update asset tables [skip ci] * [Translate] Require torch>=2.6 for the translate function to work properly (#915) * lock torch valid version * edit the item.yaml and generated function.yaml * update mlrun version * [CLI] Generated READMEs are produced with broken links to the items (#918) * fix * test fix * test fix * test fix * test fix * final workflow * chore(readme): auto-update asset tables [skip ci] * OpenAI Module without notebook (#917) * First commit OpenAI Module * First commit OpenAI Module * Update example filename in item.yaml * Delete modules/src/openai_proxy/requirements.txt No need due to no unitest * Update item.yaml for OpenAI application configuration * Update modules/src/openai_proxy/openai.py Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> * Change category name from 'GenAI' to 'genai' * Update package requirements with version constraints * Second commit adding notebook * Refactor OpenAI proxy to use base64 encoded script Refactor OpenAI proxy implementation to use base64 encoded script and update FastAPI app configuration. * Change deployment method to OpenAIModule * Third commit adding notebook * Third commit adding notebook * Remove package requirements from item.yaml Removed specific requirements for fastapi and requests. * Rename item and update kind in YAML * Update openai.py * Third commit adding notebook * Fix after review * Fix after review --------- Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> * chore(readme): auto-update asset tables [skip ci] * [Evidently] Fill example notebook (#919) * add notebook + rename directory + correct evidently version * remove extra cell * chore(readme): auto-update asset tables [skip ci] * chore(readme): auto-update asset tables [skip ci] * [CLI + Modules] Fix time format in generate item yaml script (#922) * fix time format for evidently and hist * fix cli script * fix datetime format * chore(readme): auto-update asset tables [skip ci] * chore(readme): auto-update asset tables [skip ci] * Fix CMD first commit * Fix CMD second commit * remove max-width restriction from the main content (#929) * add test, requirement file and notebook * fix cli/utils/helpers.py * [Modules] Modify Evidently & Histogram monitoring apps example notebooks to the change in evaluate() (#934) * histogram_data_drift.ipynb * fix to histogram_data_drift.ipynb * fix to histogram_data_drift.ipynb * evidently_iris.ipynb * fix evidently_iris.ipynb * fix evidently_iris.ipynb * fix evidently dependency * add dependency * remove [ui] from evidently dependency * change notebook name to: openai_proxy_app * [Docs] Add guidelines for contributing new functions or modules (#931) * CONTRIBUTING.md * CONTRIBUTING.md * improvements --------- Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> Co-authored-by: iguazio-cicd <iguaziocicd@gmail.com> Co-authored-by: guylei-code <guyleibu@gmail.com> Co-authored-by: amitnGiniApps <amitn@gini-apps.com>
1 parent b1300be commit d0fdab5

File tree

3 files changed

+121
-40
lines changed

3 files changed

+121
-40
lines changed

CONTRIBUTING.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Contributing To MLRun's Hub
2+
3+
## Types of Assets You Can Contribute
4+
- Modules (either a generic module or a model monitoring application)
5+
- Functions (will be converted to MLRun Runtime)
6+
7+
## How to Contribute
8+
1. Fork this repository on GitHub and create a new branch for your new asset.
9+
2. Add a new directory for your asset under the appropriate directory (`functions/src` for functions, `modules/src` for modules).
10+
3. Populate the directory with your asset files (see the [Asset Structure](#asset-structure) section below).
11+
4. Open a pull request to merge your changes into the main repository, to its **development** branch.
12+
13+
## Asset Structure
14+
### Functions
15+
```txt
16+
functions
17+
├── src
18+
│ ├── your_function_name
19+
│ │ ├── item.yaml
20+
│ │ ├── function.yaml
21+
│ │ ├── your_function_name.py
22+
│ │ ├── your_function_name.ipynb
23+
│ │ ├── test_your_function_name.py
24+
│ │ └── requirements.txt
25+
```
26+
#### item.yaml
27+
Metadata about the function. Can be generated using the following CLI command:
28+
```bash
29+
python -m cli.cli generate-item-yaml function your_function_name
30+
```
31+
Then, fill in all the relevant details. For example: `kind` (either `nuclio:serving`, `serving` or `job`) and `categories` field (you can browse the [MLRun hub UI](https://www.mlrun.org/hub/functions/) to see existing categories. You can specify more than one category per function).
32+
Important: Be consistent with the module name across the directory name, all relevant `item.yaml` fields, and the file names.
33+
34+
#### function.yaml
35+
The MLRun function definition. Can be generated from `item.yaml` using:
36+
```bash
37+
python -m cli.cli item-to-function --item-path functions/src/your_function_name
38+
```
39+
#### your_function_name.py
40+
The main code file for your function. (Notice: keep the code well-documented, the docstrings are used in the hub UI as documentation for the function.)
41+
42+
#### your_function_name.ipynb
43+
A Jupyter notebook demonstrating the function's usage. (Notice: the notebook must be able to run end-to-end automatically without manual intervention.)
44+
45+
#### test_your_function_name.py
46+
Unit tests for your function to cover the function functionality as much as possible. (Will run upon each change to your function).
47+
48+
#### requirements.txt
49+
Any additional Python dependencies required by your function's unit tests. (Notice: The function's own dependencies should be specified in the `item.yaml` file, not here.)
50+
51+
### Modules
52+
```txt
53+
modules
54+
├── src
55+
│ ├── your_module_name
56+
│ │ ├── item.yaml
57+
│ │ ├── your_module_name.py
58+
│ │ ├── your_module_name.ipynb
59+
│ │ ├── test_your_module_name.py
60+
│ │ └── requirements.txt
61+
```
62+
#### item.yaml
63+
Metadata about the module. Can be generated using the following CLI command:
64+
```bash
65+
python -m cli.cli generate-item-yaml module your_module_name
66+
```
67+
Then, fill in all the relevant details. For example: `kind` (either `generic` or `monitoring_application`) and `categories` (you can browse the [MLRun hub UI](https://www.mlrun.org/hub/functions/) to see existing categories. You can specify more than one category per module).
68+
Important: Be consistent with the module name across the directory name, all relevant `item.yaml` fields, and the file names.
69+
70+
#### your_module_name.py
71+
The main code file for your module. (Notice: keep the code well-documented, the docstrings are used in the hub UI as documentation for the module.)
72+
For model-monitoring modules, you can see our [guidelines for writing model monitoring applications](https://docs.mlrun.org/en/stable/model-monitoring/applications.html).
73+
74+
#### your_module_name.ipynb
75+
A Jupyter notebook demonstrating the module's usage.
76+
77+
#### test_your_module_name.py
78+
Unit tests for your module to cover the module functionality as much as possible. (Will run upon each change to your module).
79+
80+
#### requirements.txt
81+
Any additional Python dependencies required by your module's unit tests. (Notice: The module's own dependencies should be specified in the `item.yaml` file, not here.)

functions/README.md

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -9,40 +9,40 @@ it is expected that contributors follow certain guidelines/protocols (please chi
99
<!-- AUTOGEN:START (do not edit below) -->
1010
| Name | Description | Kind | Categories |
1111
| --- | --- | --- | --- |
12-
| [aggregate](https://github.com/mlrun/functions/tree/master/functions/src/aggregate) | Rolling aggregation over Metrics and Lables according to specifications | job | data-preparation |
13-
| [arc_to_parquet](https://github.com/mlrun/functions/tree/master/functions/src/arc_to_parquet) | retrieve remote archive, open and save as parquet | job | utils |
14-
| [auto_trainer](https://github.com/mlrun/functions/tree/master/functions/src/auto_trainer) | Automatic train, evaluate and predict functions for the ML frameworks - Scikit-Learn, XGBoost and LightGBM. | job | machine-learning, model-training |
15-
| [azureml_serving](https://github.com/mlrun/functions/tree/master/functions/src/azureml_serving) | AzureML serving function | serving | machine-learning, model-serving |
16-
| [azureml_utils](https://github.com/mlrun/functions/tree/master/functions/src/azureml_utils) | Azure AutoML integration in MLRun, including utils functions for training models on Azure AutoML platfrom. | job | model-serving, utils |
17-
| [batch_inference](https://github.com/mlrun/functions/tree/master/functions/src/batch_inference) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
18-
| [batch_inference_v2](https://github.com/mlrun/functions/tree/master/functions/src/batch_inference_v2) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
19-
| [describe](https://github.com/mlrun/functions/tree/master/functions/src/describe) | describe and visualizes dataset stats | job | data-analysis |
20-
| [describe_dask](https://github.com/mlrun/functions/tree/master/functions/src/describe_dask) | describe and visualizes dataset stats | job | data-analysis |
21-
| [describe_spark](https://github.com/mlrun/functions/tree/master/functions/src/describe_spark) | | job | data-analysis |
22-
| [feature_selection](https://github.com/mlrun/functions/tree/master/functions/src/feature_selection) | Select features through multiple Statistical and Model filters | job | data-preparation, machine-learning |
23-
| [gen_class_data](https://github.com/mlrun/functions/tree/master/functions/src/gen_class_data) | Create a binary classification sample dataset and save. | job | data-generation |
24-
| [github_utils](https://github.com/mlrun/functions/tree/master/functions/src/github_utils) | add comments to github pull request | job | utils |
25-
| [hugging_face_serving](https://github.com/mlrun/functions/tree/master/functions/src/hugging_face_serving) | Generic Hugging Face model server. | serving | genai, model-serving |
26-
| [load_dataset](https://github.com/mlrun/functions/tree/master/functions/src/load_dataset) | load a toy dataset from scikit-learn | job | data-preparation |
27-
| [mlflow_utils](https://github.com/mlrun/functions/tree/master/functions/src/mlflow_utils) | Mlflow model server, and additional utils. | serving | model-serving, utils |
28-
| [model_server](https://github.com/mlrun/functions/tree/master/functions/src/model_server) | generic sklearn model server | nuclio:serving | model-serving, machine-learning |
29-
| [model_server_tester](https://github.com/mlrun/functions/tree/master/functions/src/model_server_tester) | test model servers | job | monitoring, model-serving |
30-
| [noise_reduction](https://github.com/mlrun/functions/tree/master/functions/src/noise_reduction) | Reduce noise from audio files | job | data-preparation, audio |
31-
| [onnx_utils](https://github.com/mlrun/functions/tree/master/functions/src/onnx_utils) | ONNX intigration in MLRun, some utils functions for the ONNX framework, optimizing and converting models from different framework to ONNX using MLRun. | job | utils, deep-learning |
32-
| [open_archive](https://github.com/mlrun/functions/tree/master/functions/src/open_archive) | Open a file/object archive into a target directory | job | utils |
33-
| [pii_recognizer](https://github.com/mlrun/functions/tree/master/functions/src/pii_recognizer) | This function is used to recognize PII in a directory of text files | job | data-preparation, NLP |
34-
| [pyannote_audio](https://github.com/mlrun/functions/tree/master/functions/src/pyannote_audio) | pyannote's speech diarization of audio files | job | deep-learning, audio |
35-
| [question_answering](https://github.com/mlrun/functions/tree/master/functions/src/question_answering) | GenAI approach of question answering on a given data | job | genai |
36-
| [send_email](https://github.com/mlrun/functions/tree/master/functions/src/send_email) | Send Email messages through SMTP server | job | utils |
37-
| [silero_vad](https://github.com/mlrun/functions/tree/master/functions/src/silero_vad) | Silero VAD (Voice Activity Detection) functions. | job | deep-learning, audio |
38-
| [sklearn_classifier](https://github.com/mlrun/functions/tree/master/functions/src/sklearn_classifier) | train any classifier using scikit-learn's API | job | machine-learning, model-training |
39-
| [sklearn_classifier_dask](https://github.com/mlrun/functions/tree/master/functions/src/sklearn_classifier_dask) | train any classifier using scikit-learn's API over Dask | job | machine-learning, model-training |
40-
| [structured_data_generator](https://github.com/mlrun/functions/tree/master/functions/src/structured_data_generator) | GenAI approach of generating structured data according to a given schema | job | data-generation, genai |
41-
| [test_classifier](https://github.com/mlrun/functions/tree/master/functions/src/test_classifier) | test a classifier using held-out or new data | job | machine-learning, model-testing |
42-
| [text_to_audio_generator](https://github.com/mlrun/functions/tree/master/functions/src/text_to_audio_generator) | Generate audio file from text using different speakers | job | data-generation, audio |
43-
| [tf2_serving](https://github.com/mlrun/functions/tree/master/functions/src/tf2_serving) | tf2 image classification server | nuclio:serving | model-serving, machine-learning |
44-
| [transcribe](https://github.com/mlrun/functions/tree/master/functions/src/transcribe) | Transcribe audio files into text files | job | audio, genai |
45-
| [translate](https://github.com/mlrun/functions/tree/master/functions/src/translate) | Translate text files from one language to another | job | genai, NLP |
46-
| [v2_model_server](https://github.com/mlrun/functions/tree/master/functions/src/v2_model_server) | generic sklearn model server | serving | model-serving, machine-learning |
47-
| [v2_model_tester](https://github.com/mlrun/functions/tree/master/functions/src/v2_model_tester) | test v2 model servers | job | model-testing, machine-learning |
12+
| [aggregate](https://github.com/mlrun/functions/tree/development/functions/src/aggregate) | Rolling aggregation over Metrics and Lables according to specifications | job | data-preparation |
13+
| [arc_to_parquet](https://github.com/mlrun/functions/tree/development/functions/src/arc_to_parquet) | retrieve remote archive, open and save as parquet | job | utils |
14+
| [auto_trainer](https://github.com/mlrun/functions/tree/development/functions/src/auto_trainer) | Automatic train, evaluate and predict functions for the ML frameworks - Scikit-Learn, XGBoost and LightGBM. | job | machine-learning, model-training |
15+
| [azureml_serving](https://github.com/mlrun/functions/tree/development/functions/src/azureml_serving) | AzureML serving function | serving | machine-learning, model-serving |
16+
| [azureml_utils](https://github.com/mlrun/functions/tree/development/functions/src/azureml_utils) | Azure AutoML integration in MLRun, including utils functions for training models on Azure AutoML platfrom. | job | model-serving, utils |
17+
| [batch_inference](https://github.com/mlrun/functions/tree/development/functions/src/batch_inference) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
18+
| [batch_inference_v2](https://github.com/mlrun/functions/tree/development/functions/src/batch_inference_v2) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
19+
| [describe](https://github.com/mlrun/functions/tree/development/functions/src/describe) | describe and visualizes dataset stats | job | data-analysis |
20+
| [describe_dask](https://github.com/mlrun/functions/tree/development/functions/src/describe_dask) | describe and visualizes dataset stats | job | data-analysis |
21+
| [describe_spark](https://github.com/mlrun/functions/tree/development/functions/src/describe_spark) | | job | data-analysis |
22+
| [feature_selection](https://github.com/mlrun/functions/tree/development/functions/src/feature_selection) | Select features through multiple Statistical and Model filters | job | data-preparation, machine-learning |
23+
| [gen_class_data](https://github.com/mlrun/functions/tree/development/functions/src/gen_class_data) | Create a binary classification sample dataset and save. | job | data-generation |
24+
| [github_utils](https://github.com/mlrun/functions/tree/development/functions/src/github_utils) | add comments to github pull request | job | utils |
25+
| [hugging_face_serving](https://github.com/mlrun/functions/tree/development/functions/src/hugging_face_serving) | Generic Hugging Face model server. | serving | genai, model-serving |
26+
| [load_dataset](https://github.com/mlrun/functions/tree/development/functions/src/load_dataset) | load a toy dataset from scikit-learn | job | data-preparation |
27+
| [mlflow_utils](https://github.com/mlrun/functions/tree/development/functions/src/mlflow_utils) | Mlflow model server, and additional utils. | serving | model-serving, utils |
28+
| [model_server](https://github.com/mlrun/functions/tree/development/functions/src/model_server) | generic sklearn model server | nuclio:serving | model-serving, machine-learning |
29+
| [model_server_tester](https://github.com/mlrun/functions/tree/development/functions/src/model_server_tester) | test model servers | job | monitoring, model-serving |
30+
| [noise_reduction](https://github.com/mlrun/functions/tree/development/functions/src/noise_reduction) | Reduce noise from audio files | job | data-preparation, audio |
31+
| [onnx_utils](https://github.com/mlrun/functions/tree/development/functions/src/onnx_utils) | ONNX intigration in MLRun, some utils functions for the ONNX framework, optimizing and converting models from different framework to ONNX using MLRun. | job | utils, deep-learning |
32+
| [open_archive](https://github.com/mlrun/functions/tree/development/functions/src/open_archive) | Open a file/object archive into a target directory | job | utils |
33+
| [pii_recognizer](https://github.com/mlrun/functions/tree/development/functions/src/pii_recognizer) | This function is used to recognize PII in a directory of text files | job | data-preparation, NLP |
34+
| [pyannote_audio](https://github.com/mlrun/functions/tree/development/functions/src/pyannote_audio) | pyannote's speech diarization of audio files | job | deep-learning, audio |
35+
| [question_answering](https://github.com/mlrun/functions/tree/development/functions/src/question_answering) | GenAI approach of question answering on a given data | job | genai |
36+
| [send_email](https://github.com/mlrun/functions/tree/development/functions/src/send_email) | Send Email messages through SMTP server | job | utils |
37+
| [silero_vad](https://github.com/mlrun/functions/tree/development/functions/src/silero_vad) | Silero VAD (Voice Activity Detection) functions. | job | deep-learning, audio |
38+
| [sklearn_classifier](https://github.com/mlrun/functions/tree/development/functions/src/sklearn_classifier) | train any classifier using scikit-learn's API | job | machine-learning, model-training |
39+
| [sklearn_classifier_dask](https://github.com/mlrun/functions/tree/development/functions/src/sklearn_classifier_dask) | train any classifier using scikit-learn's API over Dask | job | machine-learning, model-training |
40+
| [structured_data_generator](https://github.com/mlrun/functions/tree/development/functions/src/structured_data_generator) | GenAI approach of generating structured data according to a given schema | job | data-generation, genai |
41+
| [test_classifier](https://github.com/mlrun/functions/tree/development/functions/src/test_classifier) | test a classifier using held-out or new data | job | machine-learning, model-testing |
42+
| [text_to_audio_generator](https://github.com/mlrun/functions/tree/development/functions/src/text_to_audio_generator) | Generate audio file from text using different speakers | job | data-generation, audio |
43+
| [tf2_serving](https://github.com/mlrun/functions/tree/development/functions/src/tf2_serving) | tf2 image classification server | nuclio:serving | model-serving, machine-learning |
44+
| [transcribe](https://github.com/mlrun/functions/tree/development/functions/src/transcribe) | Transcribe audio files into text files | job | audio, genai |
45+
| [translate](https://github.com/mlrun/functions/tree/development/functions/src/translate) | Translate text files from one language to another | job | genai, NLP |
46+
| [v2_model_server](https://github.com/mlrun/functions/tree/development/functions/src/v2_model_server) | generic sklearn model server | serving | model-serving, machine-learning |
47+
| [v2_model_tester](https://github.com/mlrun/functions/tree/development/functions/src/v2_model_tester) | test v2 model servers | job | model-testing, machine-learning |
4848
<!-- AUTOGEN:END -->

modules/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
<!-- AUTOGEN:START (do not edit below) -->
77
| Name | Description | Kind | Categories |
88
| --- | --- | --- | --- |
9-
| [count_events](https://github.com/mlrun/functions/tree/master/modules/src/count_events) | Count events in each time window | monitoring_application | model-serving |
10-
| [evidently_iris](https://github.com/mlrun/functions/tree/master/modules/src/evidently_iris) | Demonstrates Evidently integration in MLRun for data quality and drift monitoring using the Iris dataset | monitoring_application | model-serving, structured-ML |
11-
| [histogram_data_drift](https://github.com/mlrun/functions/tree/master/modules/src/histogram_data_drift) | Model-monitoring application for detecting and visualizing data drift | monitoring_application | model-serving, structured-ML |
12-
| [openai_proxy_app](https://github.com/mlrun/functions/tree/master/modules/src/openai_proxy_app) | OpenAI application runtime based on fastapi | generic | genai |
9+
| [count_events](https://github.com/mlrun/functions/tree/development/modules/src/count_events) | Count events in each time window | monitoring_application | model-serving |
10+
| [evidently_iris](https://github.com/mlrun/functions/tree/development/modules/src/evidently_iris) | Demonstrates Evidently integration in MLRun for data quality and drift monitoring using the Iris dataset | monitoring_application | model-serving, structured-ML |
11+
| [histogram_data_drift](https://github.com/mlrun/functions/tree/development/modules/src/histogram_data_drift) | Model-monitoring application for detecting and visualizing data drift | monitoring_application | model-serving, structured-ML |
12+
| [openai_proxy_app](https://github.com/mlrun/functions/tree/development/modules/src/openai_proxy_app) | OpenAI application runtime based on fastapi | generic | genai |
1313
<!-- AUTOGEN:END -->

0 commit comments

Comments
 (0)