Cherry Pick from Development (#938)

Eyal-Danieli · danielperezz · iguazio-cicd · web-flow · commit d0fdab557ce3 · 2025-12-09T13:41:52.000+02:00
* replace author to Iguazio manually (#905) * Organize CLI directory + new CLI for generating item.yaml files (#906) * create a CLI for generating item.yaml and organize the CLI directory * modify comments to module * PR fixes * Update cli/common/generate_item_yaml.py Co-authored-by: Eyal Danieli <eyal_danieli@mckinsey.com> --------- Co-authored-by: Eyal Danieli <eyal_danieli@mckinsey.com> * fill count events notebook (#908) * avoid noise reduction unit test (#909) * Add histogram-data-drift monitoring application module (without example) (#911) * histogram data drift module with empty example notebook * post review fixes * chore(readme): auto-update asset tables [skip ci] * Fill histogram-data-drift example notebook (#912) * fill data-drift nb * post review fixes * Add evidently demo app monitoring application module (without example) (#913) * sphinx build docs bug fix * add evidently demo app module (empty example notebook) * post review changes * chore(readme): auto-update asset tables [skip ci] * [Translate] Require torch>=2.6 for the translate function to work properly (#915) * lock torch valid version * edit the item.yaml and generated function.yaml * update mlrun version * [CLI] Generated READMEs are produced with broken links to the items (#918) * fix * test fix * test fix * test fix * test fix * final workflow * chore(readme): auto-update asset tables [skip ci] * OpenAI Module without notebook (#917) * First commit OpenAI Module * First commit OpenAI Module * Update example filename in item.yaml * Delete modules/src/openai_proxy/requirements.txt No need due to no unitest * Update item.yaml for OpenAI application configuration * Update modules/src/openai_proxy/openai.py Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> * Change category name from 'GenAI' to 'genai' * Update package requirements with version constraints * Second commit adding notebook * Refactor OpenAI proxy to use base64 encoded script Refactor OpenAI proxy implementation to use base64 encoded script and update FastAPI app configuration. * Change deployment method to OpenAIModule * Third commit adding notebook * Third commit adding notebook * Remove package requirements from item.yaml Removed specific requirements for fastapi and requests. * Rename item and update kind in YAML * Update openai.py * Third commit adding notebook * Fix after review * Fix after review --------- Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> * chore(readme): auto-update asset tables [skip ci] * [Evidently] Fill example notebook (#919) * add notebook + rename directory + correct evidently version * remove extra cell * chore(readme): auto-update asset tables [skip ci] * chore(readme): auto-update asset tables [skip ci] * [CLI + Modules] Fix time format in generate item yaml script (#922) * fix time format for evidently and hist * fix cli script * fix datetime format * chore(readme): auto-update asset tables [skip ci] * chore(readme): auto-update asset tables [skip ci] * Fix CMD first commit * Fix CMD second commit * remove max-width restriction from the main content (#929) * add test, requirement file and notebook * fix cli/utils/helpers.py * [Modules] Modify Evidently & Histogram monitoring apps example notebooks to the change in evaluate() (#934) * histogram_data_drift.ipynb * fix to histogram_data_drift.ipynb * fix to histogram_data_drift.ipynb * evidently_iris.ipynb * fix evidently_iris.ipynb * fix evidently_iris.ipynb * fix evidently dependency * add dependency * remove [ui] from evidently dependency * change notebook name to: openai_proxy_app * [Docs] Add guidelines for contributing new functions or modules (#931) * CONTRIBUTING.md * CONTRIBUTING.md * improvements --------- Co-authored-by: Daniel Perez <100069700+danielperezz@users.noreply.github.com> Co-authored-by: iguazio-cicd <iguaziocicd@gmail.com> Co-authored-by: guylei-code <guyleibu@gmail.com> Co-authored-by: amitnGiniApps <amitn@gini-apps.com>
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,81 @@
+# Contributing To MLRun's Hub
+
+## Types of Assets You Can Contribute
+- Modules (either a generic module or a model monitoring application)
+- Functions (will be converted to MLRun Runtime)
+
+## How to Contribute
+1. Fork this repository on GitHub and create a new branch for your new asset.
+2. Add a new directory for your asset under the appropriate directory (`functions/src` for functions, `modules/src` for modules).
+3. Populate the directory with your asset files (see the [Asset Structure](#asset-structure) section below).
+4. Open a pull request to merge your changes into the main repository, to its **development** branch.
+
+## Asset Structure
+### Functions
+```txt
+functions
+├── src
+│   ├── your_function_name
+│   │   ├── item.yaml
+│   │   ├── function.yaml
+│   │   ├── your_function_name.py
+│   │   ├── your_function_name.ipynb
+│   │   ├── test_your_function_name.py 
+│   │   └── requirements.txt
+```
+#### item.yaml 
+Metadata about the function. Can be generated using the following CLI command:
+  ```bash
+    python -m cli.cli generate-item-yaml function your_function_name
+  ```
+  Then, fill in all the relevant details. For example: `kind` (either `nuclio:serving`, `serving` or `job`) and `categories` field (you can browse the [MLRun hub UI](https://www.mlrun.org/hub/functions/) to see existing categories. You can specify more than one category per function).
+  Important: Be consistent with the module name across the directory name, all relevant `item.yaml` fields, and the file names.
+
+#### function.yaml
+The MLRun function definition. Can be generated from `item.yaml` using:
+```bash
+  python -m cli.cli item-to-function --item-path functions/src/your_function_name
+  ```
+#### your_function_name.py
+The main code file for your function. (Notice: keep the code well-documented, the docstrings are used in the hub UI as documentation for the function.)
+
+#### your_function_name.ipynb
+A Jupyter notebook demonstrating the function's usage. (Notice: the notebook must be able to run end-to-end automatically without manual intervention.)
+
+#### test_your_function_name.py
+Unit tests for your function to cover the function functionality as much as possible. (Will run upon each change to your function).
+
+#### requirements.txt
+Any additional Python dependencies required by your function's unit tests. (Notice: The function's own dependencies should be specified in the `item.yaml` file, not here.)
+
+### Modules
+```txt
+modules
+├── src
+│   ├── your_module_name
+│   │   ├── item.yaml
+│   │   ├── your_module_name.py
+│   │   ├── your_module_name.ipynb
+│   │   ├── test_your_module_name.py 
+│   │   └── requirements.txt
+```
+#### item.yaml
+Metadata about the module. Can be generated using the following CLI command:
+```bash
+  python -m cli.cli generate-item-yaml module your_module_name
+```
+Then, fill in all the relevant details. For example: `kind` (either `generic` or `monitoring_application`) and `categories` (you can browse the [MLRun hub UI](https://www.mlrun.org/hub/functions/) to see existing categories. You can specify more than one category per module).
+Important: Be consistent with the module name across the directory name, all relevant `item.yaml` fields, and the file names.
+
+#### your_module_name.py
+The main code file for your module. (Notice: keep the code well-documented, the docstrings are used in the hub UI as documentation for the module.)
+For model-monitoring modules, you can see our [guidelines for writing model monitoring applications](https://docs.mlrun.org/en/stable/model-monitoring/applications.html).
+
+#### your_module_name.ipynb
+A Jupyter notebook demonstrating the module's usage. 
+
+#### test_your_module_name.py
+Unit tests for your module to cover the module functionality as much as possible. (Will run upon each change to your module).
+
+#### requirements.txt
+Any additional Python dependencies required by your module's unit tests. (Notice: The module's own dependencies should be specified in the `item.yaml` file, not here.)
diff --git a/functions/README.md b/functions/README.md
@@ -9,40 +9,40 @@ it is expected that contributors follow certain guidelines/protocols (please chi
 <!-- AUTOGEN:START (do not edit below) -->
 | Name | Description | Kind | Categories |
 | --- | --- | --- | --- |
-| [aggregate](https://github.com/mlrun/functions/tree/master/functions/src/aggregate) | Rolling aggregation over Metrics and Lables according to specifications | job | data-preparation |
-| [arc_to_parquet](https://github.com/mlrun/functions/tree/master/functions/src/arc_to_parquet) | retrieve remote archive, open and save as parquet | job | utils |
-| [auto_trainer](https://github.com/mlrun/functions/tree/master/functions/src/auto_trainer) | Automatic train, evaluate and predict functions for the ML frameworks - Scikit-Learn, XGBoost and LightGBM. | job | machine-learning, model-training |
-| [azureml_serving](https://github.com/mlrun/functions/tree/master/functions/src/azureml_serving) | AzureML serving function | serving | machine-learning, model-serving |
-| [azureml_utils](https://github.com/mlrun/functions/tree/master/functions/src/azureml_utils) | Azure AutoML integration in MLRun, including utils functions for training models on Azure AutoML platfrom. | job | model-serving, utils |
-| [batch_inference](https://github.com/mlrun/functions/tree/master/functions/src/batch_inference) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
-| [batch_inference_v2](https://github.com/mlrun/functions/tree/master/functions/src/batch_inference_v2) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
-| [describe](https://github.com/mlrun/functions/tree/master/functions/src/describe) | describe and visualizes dataset stats | job | data-analysis |
-| [describe_dask](https://github.com/mlrun/functions/tree/master/functions/src/describe_dask) | describe and visualizes dataset stats | job | data-analysis |
-| [describe_spark](https://github.com/mlrun/functions/tree/master/functions/src/describe_spark) |  | job | data-analysis |
-| [feature_selection](https://github.com/mlrun/functions/tree/master/functions/src/feature_selection) | Select features through multiple Statistical and Model filters | job | data-preparation, machine-learning |
-| [gen_class_data](https://github.com/mlrun/functions/tree/master/functions/src/gen_class_data) | Create a binary classification sample dataset and save. | job | data-generation |
-| [github_utils](https://github.com/mlrun/functions/tree/master/functions/src/github_utils) | add comments to github pull request | job | utils |
-| [hugging_face_serving](https://github.com/mlrun/functions/tree/master/functions/src/hugging_face_serving) | Generic Hugging Face model server. | serving | genai, model-serving |
-| [load_dataset](https://github.com/mlrun/functions/tree/master/functions/src/load_dataset) | load a toy dataset from scikit-learn | job | data-preparation |
-| [mlflow_utils](https://github.com/mlrun/functions/tree/master/functions/src/mlflow_utils) | Mlflow model server, and additional utils. | serving | model-serving, utils |
-| [model_server](https://github.com/mlrun/functions/tree/master/functions/src/model_server) | generic sklearn model server | nuclio:serving | model-serving, machine-learning |
-| [model_server_tester](https://github.com/mlrun/functions/tree/master/functions/src/model_server_tester) | test model servers | job | monitoring, model-serving |
-| [noise_reduction](https://github.com/mlrun/functions/tree/master/functions/src/noise_reduction) | Reduce noise from audio files | job | data-preparation, audio |
-| [onnx_utils](https://github.com/mlrun/functions/tree/master/functions/src/onnx_utils) | ONNX intigration in MLRun, some utils functions for the ONNX framework, optimizing and converting models from different framework to ONNX using MLRun. | job | utils, deep-learning |
-| [open_archive](https://github.com/mlrun/functions/tree/master/functions/src/open_archive) | Open a file/object archive into a target directory | job | utils |
-| [pii_recognizer](https://github.com/mlrun/functions/tree/master/functions/src/pii_recognizer) | This function is used to recognize PII in a directory of text files | job | data-preparation, NLP |
-| [pyannote_audio](https://github.com/mlrun/functions/tree/master/functions/src/pyannote_audio) | pyannote's speech diarization of audio files | job | deep-learning, audio |
-| [question_answering](https://github.com/mlrun/functions/tree/master/functions/src/question_answering) | GenAI approach of question answering on a given data | job | genai |
-| [send_email](https://github.com/mlrun/functions/tree/master/functions/src/send_email) | Send Email messages through SMTP server | job | utils |
-| [silero_vad](https://github.com/mlrun/functions/tree/master/functions/src/silero_vad) | Silero VAD (Voice Activity Detection) functions. | job | deep-learning, audio |
-| [sklearn_classifier](https://github.com/mlrun/functions/tree/master/functions/src/sklearn_classifier) | train any classifier using scikit-learn's API | job | machine-learning, model-training |
-| [sklearn_classifier_dask](https://github.com/mlrun/functions/tree/master/functions/src/sklearn_classifier_dask) | train any classifier using scikit-learn's API over Dask | job | machine-learning, model-training |
-| [structured_data_generator](https://github.com/mlrun/functions/tree/master/functions/src/structured_data_generator) | GenAI approach of generating structured data according to a given schema | job | data-generation, genai |
-| [test_classifier](https://github.com/mlrun/functions/tree/master/functions/src/test_classifier) | test a classifier using held-out or new data | job | machine-learning, model-testing |
-| [text_to_audio_generator](https://github.com/mlrun/functions/tree/master/functions/src/text_to_audio_generator) | Generate audio file from text using different speakers | job | data-generation, audio |
-| [tf2_serving](https://github.com/mlrun/functions/tree/master/functions/src/tf2_serving) | tf2 image classification server | nuclio:serving | model-serving, machine-learning |
-| [transcribe](https://github.com/mlrun/functions/tree/master/functions/src/transcribe) | Transcribe audio files into text files | job | audio, genai |
-| [translate](https://github.com/mlrun/functions/tree/master/functions/src/translate) | Translate text files from one language to another | job | genai, NLP |
-| [v2_model_server](https://github.com/mlrun/functions/tree/master/functions/src/v2_model_server) | generic sklearn model server | serving | model-serving, machine-learning |
-| [v2_model_tester](https://github.com/mlrun/functions/tree/master/functions/src/v2_model_tester) | test v2 model servers | job | model-testing, machine-learning |
+| [aggregate](https://github.com/mlrun/functions/tree/development/functions/src/aggregate) | Rolling aggregation over Metrics and Lables according to specifications | job | data-preparation |
+| [arc_to_parquet](https://github.com/mlrun/functions/tree/development/functions/src/arc_to_parquet) | retrieve remote archive, open and save as parquet | job | utils |
+| [auto_trainer](https://github.com/mlrun/functions/tree/development/functions/src/auto_trainer) | Automatic train, evaluate and predict functions for the ML frameworks - Scikit-Learn, XGBoost and LightGBM. | job | machine-learning, model-training |
+| [azureml_serving](https://github.com/mlrun/functions/tree/development/functions/src/azureml_serving) | AzureML serving function | serving | machine-learning, model-serving |
+| [azureml_utils](https://github.com/mlrun/functions/tree/development/functions/src/azureml_utils) | Azure AutoML integration in MLRun, including utils functions for training models on Azure AutoML platfrom. | job | model-serving, utils |
+| [batch_inference](https://github.com/mlrun/functions/tree/development/functions/src/batch_inference) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
+| [batch_inference_v2](https://github.com/mlrun/functions/tree/development/functions/src/batch_inference_v2) | Batch inference (also knows as prediction) for the common ML frameworks (SciKit-Learn, XGBoost and LightGBM) while performing data drift analysis. | job | model-serving |
+| [describe](https://github.com/mlrun/functions/tree/development/functions/src/describe) | describe and visualizes dataset stats | job | data-analysis |
+| [describe_dask](https://github.com/mlrun/functions/tree/development/functions/src/describe_dask) | describe and visualizes dataset stats | job | data-analysis |
+| [describe_spark](https://github.com/mlrun/functions/tree/development/functions/src/describe_spark) |  | job | data-analysis |
+| [feature_selection](https://github.com/mlrun/functions/tree/development/functions/src/feature_selection) | Select features through multiple Statistical and Model filters | job | data-preparation, machine-learning |
+| [gen_class_data](https://github.com/mlrun/functions/tree/development/functions/src/gen_class_data) | Create a binary classification sample dataset and save. | job | data-generation |
+| [github_utils](https://github.com/mlrun/functions/tree/development/functions/src/github_utils) | add comments to github pull request | job | utils |
+| [hugging_face_serving](https://github.com/mlrun/functions/tree/development/functions/src/hugging_face_serving) | Generic Hugging Face model server. | serving | genai, model-serving |
+| [load_dataset](https://github.com/mlrun/functions/tree/development/functions/src/load_dataset) | load a toy dataset from scikit-learn | job | data-preparation |
+| [mlflow_utils](https://github.com/mlrun/functions/tree/development/functions/src/mlflow_utils) | Mlflow model server, and additional utils. | serving | model-serving, utils |
+| [model_server](https://github.com/mlrun/functions/tree/development/functions/src/model_server) | generic sklearn model server | nuclio:serving | model-serving, machine-learning |
+| [model_server_tester](https://github.com/mlrun/functions/tree/development/functions/src/model_server_tester) | test model servers | job | monitoring, model-serving |
+| [noise_reduction](https://github.com/mlrun/functions/tree/development/functions/src/noise_reduction) | Reduce noise from audio files | job | data-preparation, audio |
+| [onnx_utils](https://github.com/mlrun/functions/tree/development/functions/src/onnx_utils) | ONNX intigration in MLRun, some utils functions for the ONNX framework, optimizing and converting models from different framework to ONNX using MLRun. | job | utils, deep-learning |
+| [open_archive](https://github.com/mlrun/functions/tree/development/functions/src/open_archive) | Open a file/object archive into a target directory | job | utils |
+| [pii_recognizer](https://github.com/mlrun/functions/tree/development/functions/src/pii_recognizer) | This function is used to recognize PII in a directory of text files | job | data-preparation, NLP |
+| [pyannote_audio](https://github.com/mlrun/functions/tree/development/functions/src/pyannote_audio) | pyannote's speech diarization of audio files | job | deep-learning, audio |
+| [question_answering](https://github.com/mlrun/functions/tree/development/functions/src/question_answering) | GenAI approach of question answering on a given data | job | genai |
+| [send_email](https://github.com/mlrun/functions/tree/development/functions/src/send_email) | Send Email messages through SMTP server | job | utils |
+| [silero_vad](https://github.com/mlrun/functions/tree/development/functions/src/silero_vad) | Silero VAD (Voice Activity Detection) functions. | job | deep-learning, audio |
+| [sklearn_classifier](https://github.com/mlrun/functions/tree/development/functions/src/sklearn_classifier) | train any classifier using scikit-learn's API | job | machine-learning, model-training |
+| [sklearn_classifier_dask](https://github.com/mlrun/functions/tree/development/functions/src/sklearn_classifier_dask) | train any classifier using scikit-learn's API over Dask | job | machine-learning, model-training |
+| [structured_data_generator](https://github.com/mlrun/functions/tree/development/functions/src/structured_data_generator) | GenAI approach of generating structured data according to a given schema | job | data-generation, genai |
+| [test_classifier](https://github.com/mlrun/functions/tree/development/functions/src/test_classifier) | test a classifier using held-out or new data | job | machine-learning, model-testing |
+| [text_to_audio_generator](https://github.com/mlrun/functions/tree/development/functions/src/text_to_audio_generator) | Generate audio file from text using different speakers | job | data-generation, audio |
+| [tf2_serving](https://github.com/mlrun/functions/tree/development/functions/src/tf2_serving) | tf2 image classification server | nuclio:serving | model-serving, machine-learning |
+| [transcribe](https://github.com/mlrun/functions/tree/development/functions/src/transcribe) | Transcribe audio files into text files | job | audio, genai |
+| [translate](https://github.com/mlrun/functions/tree/development/functions/src/translate) | Translate text files from one language to another | job | genai, NLP |
+| [v2_model_server](https://github.com/mlrun/functions/tree/development/functions/src/v2_model_server) | generic sklearn model server | serving | model-serving, machine-learning |
+| [v2_model_tester](https://github.com/mlrun/functions/tree/development/functions/src/v2_model_tester) | test v2 model servers | job | model-testing, machine-learning |
 <!-- AUTOGEN:END -->
diff --git a/modules/README.md b/modules/README.md
@@ -6,8 +6,8 @@
 <!-- AUTOGEN:START (do not edit below) -->
 | Name | Description | Kind | Categories |
 | --- | --- | --- | --- |
-| [count_events](https://github.com/mlrun/functions/tree/master/modules/src/count_events) | Count events in each time window | monitoring_application | model-serving |
-| [evidently_iris](https://github.com/mlrun/functions/tree/master/modules/src/evidently_iris) | Demonstrates Evidently integration in MLRun for data quality and drift monitoring using the Iris dataset | monitoring_application | model-serving, structured-ML |
-| [histogram_data_drift](https://github.com/mlrun/functions/tree/master/modules/src/histogram_data_drift) | Model-monitoring application for detecting and visualizing data drift | monitoring_application | model-serving, structured-ML |
-| [openai_proxy_app](https://github.com/mlrun/functions/tree/master/modules/src/openai_proxy_app) | OpenAI application runtime based on fastapi | generic | genai |
+| [count_events](https://github.com/mlrun/functions/tree/development/modules/src/count_events) | Count events in each time window | monitoring_application | model-serving |
+| [evidently_iris](https://github.com/mlrun/functions/tree/development/modules/src/evidently_iris) | Demonstrates Evidently integration in MLRun for data quality and drift monitoring using the Iris dataset | monitoring_application | model-serving, structured-ML |
+| [histogram_data_drift](https://github.com/mlrun/functions/tree/development/modules/src/histogram_data_drift) | Model-monitoring application for detecting and visualizing data drift | monitoring_application | model-serving, structured-ML |
+| [openai_proxy_app](https://github.com/mlrun/functions/tree/development/modules/src/openai_proxy_app) | OpenAI application runtime based on fastapi | generic | genai |
 <!-- AUTOGEN:END -->