mlrun
diff --git a/‎README.md‎
Lines changed: 50 additions & 0 deletions b/‎README.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎catalog.json‎
Lines changed: 1 addition & 1 deletion b/‎catalog.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎functions/development/tags.json‎
Lines changed: 1 addition & 1 deletion b/‎functions/development/tags.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎modules/development/catalog.json‎
Lines changed: 1 addition & 1 deletion b/‎modules/development/catalog.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎modules/development/oai_spo/1.0.0/src/item.yaml‎
Lines changed: 19 additions & 0 deletions b/‎modules/development/oai_spo/1.0.0/src/item.yaml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎modules/development/oai_spo/1.0.0/src/oai_spo.ipynb‎
Lines changed: 89 additions & 0 deletions b/‎modules/development/oai_spo/1.0.0/src/oai_spo.ipynb‎
Lines changed: 89 additions & 0 deletions
diff --git a/‎modules/development/oai_spo/1.0.0/src/oai_spo.py‎
Lines changed: 210 additions & 0 deletions b/‎modules/development/oai_spo/1.0.0/src/oai_spo.py‎
Lines changed: 210 additions & 0 deletions
diff --git a/‎modules/development/oai_spo/1.0.0/src/requirements.txt‎ b/‎modules/development/oai_spo/1.0.0/src/requirements.txt‎
@@ -1,3 +1,53 @@
+### Change log [2026-01-26 14:39:32]
+1. Item Updated: `verify_schema` (from version: `1.0.0` to `1.0.0`)
+
+### Change log [2026-01-26 14:39:27]
+1. Item Updated: `agent_deployer` (from version: `1.0.0` to `1.0.0`)
+2. Item Updated: `histogram_data_drift` (from version: `1.0.0` to `1.0.0`)
+3. Item Updated: `openai_proxy_app` (from version: `1.0.0` to `1.0.0`)
+4. New item created: `oai_spo` (version: `1.0.0`)
+5. Item Updated: `vllm_module` (from version: `1.0.0` to `1.0.0`)
+6. Item Updated: `count_events` (from version: `1.0.0` to `1.0.0`)
+7. Item Updated: `evidently_iris` (from version: `1.0.0` to `1.0.0`)
+
+### Change log [2026-01-26 14:39:18]
+1. Item Updated: `test_classifier` (from version: `1.1.0` to `1.1.0`)
+2. Item Updated: `sklearn_classifier` (from version: `1.2.0` to `1.2.0`)
+3. Item Updated: `model_server_tester` (from version: `1.1.0` to `1.1.0`)
+4. Item Updated: `azureml_serving` (from version: `1.1.0` to `1.1.0`)
+5. Item Updated: `describe_dask` (from version: `1.2.0` to `1.2.0`)
+6. Item Updated: `batch_inference` (from version: `1.8.0` to `1.8.0`)
+7. Item Updated: `v2_model_server` (from version: `1.2.0` to `1.2.0`)
+8. Item Updated: `gen_class_data` (from version: `1.3.0` to `1.3.0`)
+9. Item Updated: `send_email` (from version: `1.2.0` to `1.2.0`)
+10. Item Updated: `tf2_serving` (from version: `1.1.0` to `1.1.0`)
+11. Item Updated: `aggregate` (from version: `1.4.0` to `1.4.0`)
+12. Item Updated: `open_archive` (from version: `1.2.0` to `1.2.0`)
+13. Item Updated: `describe` (from version: `1.4.0` to `1.4.0`)
+14. Item Updated: `v2_model_tester` (from version: `1.1.0` to `1.1.0`)
+15. Item Updated: `text_to_audio_generator` (from version: `1.3.0` to `1.3.0`)
+16. Item Updated: `pii_recognizer` (from version: `0.4.0` to `0.4.0`)
+17. Item Updated: `github_utils` (from version: `1.1.0` to `1.1.0`)
+18. Item Updated: `sklearn_classifier_dask` (from version: `1.1.1` to `1.1.1`)
+19. Item Updated: `azureml_utils` (from version: `1.4.0` to `1.4.0`)
+20. Item Updated: `question_answering` (from version: `0.5.0` to `0.5.0`)
+21. Item Updated: `structured_data_generator` (from version: `1.6.0` to `1.6.0`)
+22. Item Updated: `arc_to_parquet` (from version: `1.5.0` to `1.5.0`)
+23. Item Updated: `silero_vad` (from version: `1.4.0` to `1.4.0`)
+24. Item Updated: `load_dataset` (from version: `1.2.0` to `1.2.0`)
+25. Item Updated: `auto_trainer` (from version: `1.8.0` to `1.8.0`)
+26. Item Updated: `feature_selection` (from version: `1.6.0` to `1.6.0`)
+27. Item Updated: `translate` (from version: `0.3.0` to `0.3.0`)
+28. Item Updated: `describe_spark` (from version: `1.1.0` to `1.1.0`)
+29. Item Updated: `pyannote_audio` (from version: `1.3.0` to `1.3.0`)
+30. Item Updated: `onnx_utils` (from version: `1.3.0` to `1.3.0`)
+31. Item Updated: `batch_inference_v2` (from version: `2.6.0` to `2.6.0`)
+32. Item Updated: `transcribe` (from version: `1.2.0` to `1.2.0`)
+33. Item Updated: `model_server` (from version: `1.2.0` to `1.2.0`)
+34. Item Updated: `mlflow_utils` (from version: `1.2.0` to `1.2.0`)
+35. Item Updated: `noise_reduction` (from version: `1.1.0` to `1.1.0`)
+36. Item Updated: `hugging_face_serving` (from version: `1.1.0` to `1.1.0`)
+
 ### Change log [2026-01-26 12:54:54]
 1. Item Updated: `verify_schema` (from version: `1.0.0` to `1.0.0`)
 
 
@@ -1 +1 @@
-{"kind": ["job", "serving", "nuclio:serving"], "categories": ["deep-learning", "model-training", "audio", "data-generation", "utilities", "monitoring", "data-preparation", "data-analysis", "NLP", "genai", "machine-learning", "model-serving", "model-testing"]}
+{"kind": ["serving", "job", "nuclio:serving"], "categories": ["model-testing", "model-serving", "genai", "monitoring", "data-analysis", "deep-learning", "machine-learning", "utilities", "model-training", "data-preparation", "NLP", "data-generation", "audio"]}
@@ -0,0 +1,19 @@
+apiVersion: v1
+categories:
+- model-serving
+- structured-ML
+description: OAI SPO use case for industrial optimization
+example: oai_spo.ipynb
+generationDate: 2026-01-26:12-25
+hidden: false
+labels:
+  author: Iguazio
+mlrunVersion: 1.9.2
+name: oai_spo
+spec:
+    filename: oai_spo.py
+    image: mlrun/mlrun
+    kind: generic
+    requirements:
+version: 1.0.0
+
@@ -0,0 +1,89 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# OAI Hub Demo\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Overview\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This notebook demonstrates how to use the OAI Hub class for managing MLRun project setup and artifact logging.\n",
+        "\n",
+        "The OaiHub class handles:\n",
+        "- Environment configuration\n",
+        "- Project creation/retrieval\n",
+        "- Input data processing and logging\n",
+        "- Configuration dataset logging\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Demo\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Initialize OaiHub\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from oai_spo import OaiHub\n",
+        "\n",
+        "# Initialize OaiHub instance\n",
+        "oai_hub = OaiHub(\n",
+        "    project_name=\"oai-spo-demo\",\n",
+        "    data_dir=\"./data\",\n",
+        "    default_env_file=\"default.env\",\n",
+        "    local_env_file=\"local.env\",\n",
+        "    pipeline_config_path=\"pipeline_config.yaml\",\n",
+        "    default_image=\"mlrun/mlrun\",\n",
+        "    source=\"s3://your-bucket/path\",\n",
+        ")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Run Setup\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Run complete setup process\n",
+        "oai_hub.setup()\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}
@@ -0,0 +1,210 @@
+# Copyright 2025 Iguazio
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import sys
+import os
+import datetime as dt
+from pathlib import Path
+from typing import Optional
+import pandas as pd
+import mlrun
+from config.config import load_config
+
+
+class OaiHub:
+    """
+    OAI Hub class for managing MLRun project setup and artifact logging.
+    
+    This class handles:
+    - Environment configuration
+    - Project creation/retrieval
+    - Input data processing and logging
+    - Configuration dataset logging
+    """
+    
+    def __init__(
+        self,
+        project_name: str,
+        data_dir: str,
+        default_env_file: str,
+        local_env_file: str,
+        pipeline_config_path: str,
+        default_image: str,
+        source: str,
+    ):
+        """
+        Initialize OaiHub instance.
+        
+        Args:
+            project_name: Name of the MLRun project
+            data_dir: Directory containing data files
+            default_env_file: Default environment file path
+            local_env_file: Local environment file path (takes precedence)
+            pipeline_config_path: Path to pipeline configuration YAML
+            default_image: Default Docker image for the project
+            source: Source location for the project (S3 path)
+        """
+        self.project_name = project_name
+        self.data_dir = data_dir
+        self.default_env_file = default_env_file
+        self.local_env_file = local_env_file
+        self.pipeline_config_path = pipeline_config_path
+        self.default_image = default_image
+        self.source = source
+        
+        self.project: Optional[mlrun.projects.MlrunProject] = None
+        self.pipeline_config = None
+    
+    def setup_environment(self) -> str:
+        """
+        Load environment variables from env file.
+        Prefers local env file if it exists, otherwise uses default.
+        
+        Returns:
+            Path to the env file that was loaded
+        """
+        env_file = (
+            self.local_env_file
+            if os.path.exists(self.local_env_file)
+            else self.default_env_file
+        )
+        print(f"Loading environment from: {env_file}")
+        mlrun.set_env_from_file(env_file)
+        return env_file
+    
+    def load_configuration(self):
+        """Load pipeline configuration from YAML file."""
+        print(f"Loading configuration from: {self.pipeline_config_path}")
+        self.pipeline_config = load_config(self.pipeline_config_path)
+    
+    def get_or_create_project(self) -> mlrun.projects.MlrunProject:
+        """
+        Get or create the MLRun project.
+        
+        Returns:
+            MLRun project instance
+        """
+        print(f"Getting or creating project '{self.project_name}'...")
+        self.project = mlrun.get_or_create_project(
+            self.project_name,
+            parameters={
+                "source": self.source,
+                "pipeline_config_path": self.pipeline_config_path,
+                "default_image": self.default_image,
+            },
+        )
+        return self.project
+    
+    def process_and_log_input_data(self) -> Optional[str]:
+        """
+        Process input data (shift dates to current time) and log as artifact.
+        
+        Returns:
+            Artifact key if successful, None otherwise
+        """
+        print("Processing 'input_data'...")
+        input_data_path = os.path.join(self.data_dir, "01_raw/sample_input_data.csv")
+        
+        if not os.path.exists(input_data_path):
+            print(f"Warning: {input_data_path} not found.")
+            return None
+        
+        orig_input_data = pd.read_csv(input_data_path)
+        
+        # Shift dates to current time
+        if "date" in orig_input_data.columns:
+            # Convert to datetime if not already
+            orig_input_data["date"] = pd.to_datetime(orig_input_data["date"])
+            
+            # Calculate delta to shift max date to now
+            max_date = orig_input_data["date"].max()
+            delta = dt.datetime.now() - max_date
+            
+            # Apply shift and floor to hour
+            orig_input_data["date"] = orig_input_data["date"] + delta
+            orig_input_data["date"] = orig_input_data["date"].dt.floor("h")
+        
+        print("Logging 'input_data' artifact...")
+        artifact = self.project.log_dataset(
+            key="input_data",
+            df=orig_input_data,
+            format="csv"
+        )
+        return artifact.key if artifact else None
+    
+    def log_config_dataset(
+        self, key: str, filename: str, label_schema: str
+    ) -> Optional[str]:
+        """
+        Log a configuration dataset from a CSV file.
+        
+        Args:
+            key: Artifact key name
+            filename: Name of the CSV file (relative to data_dir/01_raw/)
+            label_schema: Schema label for the artifact
+            
+        Returns:
+            Artifact key if successful, None otherwise
+        """
+        file_path = os.path.join(self.data_dir, f"01_raw/{filename}")
+        
+        if not os.path.exists(file_path):
+            print(f"Warning: {file_path} not found, skipping {key}.")
+            return None
+        
+        print(f"Logging '{key}' from {filename}...")
+        df = pd.read_csv(file_path, sep=";")
+        artifact = self.project.log_dataset(
+            key=key,
+            df=df,
+            format="csv",
+            labels={"parameters_schema": label_schema},
+        )
+        return artifact.key if artifact else None
+    
+    def log_all_config_datasets(self):
+        """Log all configuration datasets."""
+        config_datasets = [
+            ("sample_tags_raw", "sample_tags_raw_config.csv", "raw"),
+            ("sample_tags_meta", "sample_tags_meta_config.csv", "meta"),
+            ("sample_tags_outliers", "sample_tags_outliers_config.csv", "outliers"),
+            ("sample_tags_imputation", "sample_tags_imputation_config.csv", "impute"),
+            (
+                "sample_tags_on_off_dependencies",
+                "sample_tags_on_off_dependencies_config.csv",
+                "on_off",
+            ),
+            ("sample_tags_resample", "sample_tags_resample_config.csv", "resample"),
+        ]
+        
+        for key, filename, label_schema in config_datasets:
+            self.log_config_dataset(key, filename, label_schema)
+    
+    def setup(self):
+        """
+        Complete setup process:
+        1. Setup environment
+        2. Load configuration
+        3. Get or create project
+        4. Process and log input data
+        5. Log all configuration datasets
+        """
+        self.setup_environment()
+        self.load_configuration()
+        self.get_or_create_project()
+        self.process_and_log_input_data()
+        self.log_all_config_datasets()
+        print("Artifact logging completed.")
+
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-{"kind": ["job", "serving", "nuclio:serving"], "categories": ["deep-learning", "model-training", "audio", "data-generation", "utilities", "monitoring", "data-preparation", "data-analysis", "NLP", "genai", "machine-learning", "model-serving", "model-testing"]}`
	`1`	`+{"kind": ["serving", "job", "nuclio:serving"], "categories": ["model-testing", "model-serving", "genai", "monitoring", "data-analysis", "deep-learning", "machine-learning", "utilities", "model-training", "data-preparation", "NLP", "data-generation", "audio"]}`