feat: Add LMEval systematic testing with pod log collection and confi… by artemsa223 · Pull Request #382 · opendatahub-io/opendatahub-tests

artemsa223 · 2025-06-26T15:34:53Z

…gmap parsing

Implemented parse_configmap.py to fetch trustyai-lmeval-tasks.yaml

Added logging utils.py to save logs to tests/model_explainability/lm_eval/task_lists/lmeval_task_logs/.

Added test_lm_eval_timing.py with mock test and mock LMEvalJob to avoid NameError.

Add LMEval systematic testing with pod log collection and configmap parsing

Description

Implemented parse_configmap.py to fetch trustyai-lmeval-tasks.yaml

Added logging utils.py to save logs to tests/model_explainability/lm_eval/task_lists/lmeval_task_logs/.

Added test_lm_eval_timing.py with mock test and mock LMEvalJob to avoid NameError.

How Has This Been Tested?

The changes were tested locally using Python 3.12 in a virtual environment (venv3.12) on macOS. The test suite includes mock tests to validate functionality without GPU cluster access:
Mock Test in test_lm_eval_timing.py:
pytest tests/model_explainability/lm_eval/test_lm_eval_timing.py -k test_lmeval_timing_benchmark_mock -v
Verified logging for the arc_easy task, confirming logs are saved to tests/model_explainability/lm_eval/task_lists/lmeval_task_logs/test-namespace/arc_easy_logs.txt with correct content (Mock arc_easy task output Completed successfully).

Configmap Parsing:
Ran parse_configmap.py with fetch from https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml
python tests/model_explainability/lm_eval/utils/parse_configmap.py
Verified generation of configmap_tasks.py, configmap_tasks.json, and task_names_only.txt in tests/model_explainability/lm_eval/task_lists/, containing ~6,800 tasks.

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added timing benchmark tests for language model evaluation tasks, including both GPU-dependent and mock tests.
- Introduced automated scripts to parse task definitions from a remote ConfigMap and extract, process, and save task lists in multiple formats.
- Added a script to select representative tasks for timing benchmarks and generate corresponding test files.
Tests
- Implemented new tests to measure execution time and save logs for evaluation jobs, including validation of log file creation and content.
Chores
- Enhanced utilities for saving pod logs during evaluation tasks.

…gmap parsing Implemented parse_configmap.py to fetch trustyai-lmeval-tasks.yaml Added logging utils.py to save logs to tests/model_explainability/lm_eval/task_lists/lmeval_task_logs/. Added test_lm_eval_timing.py with mock test and mock LMEvalJob to avoid NameError.

github-actions · 2025-06-26T15:35:11Z

The following are automatically added/executed:

PR size label.
Run pre-commit
Run tox
Add PR author as the PR assignee
Build image based on the PR

Available user actions:

To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
lgtm label removed on each new commit push.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.

Supported labels

{'/hold', '/cherry-pick', '/lgtm', '/verified', '/build-push-pr-image', '/wip'}

coderabbitai · 2025-06-26T15:35:22Z

Walkthrough

New utilities and scripts have been added to automate extraction, selection, and benchmarking of language model evaluation (LMEval) tasks. These include parsing a ConfigMap from GitHub, selecting benchmark tasks, saving pod logs, and introducing parameterized timing benchmark tests. The changes primarily affect test infrastructure and utilities for LMEval jobs.

Changes

File(s)	Change Summary
tests/model_explainability/lm_eval/test_lm_eval_timing.py	Added parameterized and mock timing benchmark tests for LMEvalJob, including pod log saving and runtime measurement.
tests/model_explainability/lm_eval/utils.py	Introduced `save_pod_logs` function to persist pod logs for a given task and namespace.
tests/model_explainability/lm_eval/utils/parse_configmap.py	New script to fetch, parse, and extract task lists from a remote ConfigMap YAML, saving results in multiple formats.
tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py	New script to select and save representative benchmark tasks, and generate corresponding pytest test code.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script(parse_configmap.py)
    participant GitHub
    participant LocalFS

    User->>Script(parse_configmap.py): Run script
    Script(parse_configmap.py)->>GitHub: Fetch ConfigMap YAML
    GitHub-->>Script(parse_configmap.py): Return YAML
    Script(parse_configmap.py)->>Script(parse_configmap.py): Parse YAML, extract tasks
    Script(parse_configmap.py)->>LocalFS: Save tasks as JSON, TXT, Python files
    Script(parse_configmap.py)->>User: Print next steps and summary

sequenceDiagram
    participant User
    participant Script(select_benchmark_tasks.py)
    participant LocalFS

    User->>Script(select_benchmark_tasks.py): Run script
    Script(select_benchmark_tasks.py)->>LocalFS: Load all tasks from ConfigMap JSON
    Script(select_benchmark_tasks.py)->>Script(select_benchmark_tasks.py): Select benchmark tasks
    Script(select_benchmark_tasks.py)->>LocalFS: Save selected tasks (Python, JSON)
    Script(select_benchmark_tasks.py)->>LocalFS: Generate pytest test code
    Script(select_benchmark_tasks.py)->>User: Print summary and instructions

sequenceDiagram
    participant Pytest
    participant test_lmeval_timing_benchmark
    participant LMEvalJob
    participant Pod
    participant save_pod_logs

    Pytest->>test_lmeval_timing_benchmark: Run test with task parameter
    test_lmeval_timing_benchmark->>LMEvalJob: Update and run job
    test_lmeval_timing_benchmark->>Pod: Wait for job to run, measure time
    test_lmeval_timing_benchmark->>save_pod_logs: Save pod logs
    save_pod_logs->>Pod: Retrieve logs
    save_pod_logs->>LocalFS: Write logs to file
    test_lmeval_timing_benchmark->>Pytest: Assert results

Suggested labels

Verified, size/l

Poem

In the warren of code, with scripts so bright,
Rabbits parsed ConfigMaps by pale monitor light.
Tasks were selected, benchmarks set with care,
Pod logs were gathered from clusters out there.
With timing and testing, our burrow’s delight—
The garden of LMEval grows ever more right!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (7)

tests/model_explainability/lm_eval/utils.py (1)
6-6: Remove unused import.

The Timeout import from utilities.constants is not used in this file.
-from utilities.constants import Timeout
tests/model_explainability/lm_eval/test_lm_eval_timing.py (2)
13-22: Consider using proper mocking instead of custom mock class.

The custom LMEvalJob mock class seems like a workaround. Consider using unittest.mock.MagicMock or importing the actual class with conditional handling.
-# Mock LMEvalJob class for local testing
-class LMEvalJob:
-    """Mock LMEvalJob class for type hinting in skipped GPU tests"""
-
-    def __init__(self):
-        pass
-
-    def update(self, task_list):
-        pass
+try:
+    from ocp_resources.lm_eval_job import LMEvalJob
+except ImportError:
+    # Mock for environments where ocp_resources is not available
+    from unittest.mock import MagicMock
+    LMEvalJob = MagicMock
86-94: Simplify assertion logic.

The path verification assertion is overly complex and could be simplified for better readability.
-    assert (
-        Path(log_file)
-        == Path(__file__).parent / "task_lists" / "lmeval_task_logs" / namespace / f"{task_name}_logs.txt"
-    ), f"Log file saved to incorrect path: {log_file}"
+    expected_path = Path(__file__).parent / "task_lists" / "lmeval_task_logs" / namespace / f"{task_name}_logs.txt"
+    assert Path(log_file) == expected_path, f"Log file saved to incorrect path: {log_file}"
tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py (2)
10-11: Fix PEP8 formatting: add blank line before function definition.

Functions should be preceded by two blank lines according to PEP8.
 from pathlib import Path

+
 def load_configmap_tasks():
77-85: Remove unnecessary f-string formatting.

Several f-strings don't contain placeholders and should be regular strings for better performance and clarity.
-        f.write("# Selected tasks for timing benchmark\n")
-        f.write("# These are popular/representative LMEval tasks\n\n")
+        f.write("# Selected tasks for timing benchmark\n")
+        f.write("# These are popular/representative LMEval tasks\n\n")
         f.write("TIMING_BENCHMARK_TASKS = [\n")
         for task in tasks:
             f.write(f'    "{task}",\n')
         f.write("]\n\n")
         f.write(f"# Total: {len(tasks)} tasks\n")
-        f.write("# Estimated time: ~25min × 10 = 4+ hours for 1% sampling\n")
+        f.write("# Estimated time: ~25min × 10 = 4+ hours for 1% sampling\n")
tests/model_explainability/lm_eval/utils/parse_configmap.py (2)
195-195: Remove unused variable.

The files variable is assigned but never used.
     # Save results
-    files = save_extracted_tasks(tasks, task_details, configmap)
+    save_extracted_tasks(tasks, task_details, configmap)
14-15: Fix PEP8 formatting: add blank line before function definition.

Functions should be preceded by two blank lines according to PEP8.
 import yaml

+
 def parse_configmap(

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 906091d and 767fd05.

📒 Files selected for processing (4)

tests/model_explainability/lm_eval/test_lm_eval_timing.py (1 hunks)
tests/model_explainability/lm_eval/utils.py (2 hunks)
tests/model_explainability/lm_eval/utils/parse_configmap.py (1 hunks)
tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py (1 hunks)

🧰 Additional context used

🪛 Flake8 (7.2.0)

tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py

[error] 10-10: expected 2 blank lines, found 1

(E302)

[error] 21-21: expected 2 blank lines, found 1

(E302)

[error] 60-60: expected 2 blank lines, found 1

(E302)

[error] 81-81: f-string is missing placeholders

(F541)

[error] 85-85: expected 2 blank lines, found 1

(E302)

[error] 132-132: expected 2 blank lines, found 1

(E302)

[error] 137-137: f-string is missing placeholders

(F541)

[error] 144-144: f-string is missing placeholders

(F541)

[error] 145-145: f-string is missing placeholders

(F541)

[error] 147-147: f-string is missing placeholders

(F541)

[error] 148-148: f-string is missing placeholders

(F541)

[error] 151-151: expected 2 blank lines after class or function definition, found 1

(E305)

tests/model_explainability/lm_eval/test_lm_eval_timing.py

[error] 8-8: 'utilities.constants.Timeout' imported but unused

(F401)

tests/model_explainability/lm_eval/utils/parse_configmap.py

[error] 14-14: expected 2 blank lines, found 1

(E302)

[error] 47-47: f-string is missing placeholders

(F541)

[error] 108-108: expected 2 blank lines, found 1

(E302)

[error] 137-137: f-string is missing placeholders

(F541)

[error] 155-155: f-string is missing placeholders

(F541)

[error] 168-168: expected 2 blank lines, found 1

(E302)

[error] 170-170: f-string is missing placeholders

(F541)

[error] 171-171: f-string is missing placeholders

(F541)

[error] 172-172: f-string is missing placeholders

(F541)

[error] 173-173: f-string is missing placeholders

(F541)

[error] 174-174: f-string is missing placeholders

(F541)

[error] 175-175: f-string is missing placeholders

(F541)

[error] 177-177: expected 2 blank lines, found 1

(E302)

[error] 197-197: local variable 'files' is assigned to but never used

(F841)

[error] 204-204: expected 2 blank lines after class or function definition, found 1

(E305)

🪛 Pylint (3.3.7)

tests/model_explainability/lm_eval/test_lm_eval_timing.py

[refactor] 15-15: Too few public methods (1/2)

(R0903)

tests/model_explainability/lm_eval/utils/parse_configmap.py

[refactor] 14-14: Too many local variables (20/15)

(R0914)

[refactor] 16-106: Too many nested blocks (6/5)

(R1702)

[refactor] 16-106: Too many nested blocks (7/5)

(R1702)

[refactor] 14-14: Too many statements (55/50)

(R0915)

🪛 Ruff (0.11.9)

tests/model_explainability/lm_eval/utils/parse_configmap.py

197-197: Local variable files is assigned to but never used

Remove assignment to unused variable files

(F841)

coderabbitai · 2025-06-26T15:43:25Z

tests/model_explainability/lm_eval/utils.py

+def save_pod_logs(pod: Pod, task_name: str, namespace: str):
+    """
+    Save pod logs to a file
+    """
+    output_dir = Path(__file__).parent.parent / "lm_eval" / "task_lists" / "lmeval_task_logs" / namespace
+    output_dir.mkdir(parents=True, exist_ok=True)
+    log_file = output_dir / f"{task_name}_logs.txt"
+    logs = pod.get_logs()
+    with open(log_file, "w") as f:
+        f.write(logs)
+    LOGGER.info(f"Saved logs for task {task_name} (pod: {pod.name}) to {log_file}")
+    return str(log_file)


🛠️ Refactor suggestion

Add error handling for pod log retrieval.

The function should handle potential failures when calling pod.get_logs(), as this operation can fail due to network issues, pod status, or permissions.

def save_pod_logs(pod: Pod, task_name: str, namespace: str): """ Save pod logs to a file """ output_dir = Path(__file__).parent.parent / "lm_eval" / "task_lists" / "lmeval_task_logs" / namespace output_dir.mkdir(parents=True, exist_ok=True) log_file = output_dir / f"{task_name}_logs.txt" - logs = pod.get_logs() + try: + logs = pod.get_logs() + except Exception as e: + LOGGER.error(f"Failed to retrieve logs for pod {pod.name}: {e}") + logs = f"Error retrieving logs: {e}" with open(log_file, "w") as f: f.write(logs) LOGGER.info(f"Saved logs for task {task_name} (pod: {pod.name}) to {log_file}") return str(log_file)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def save_pod_logs(pod: Pod, task_name: str, namespace: str):

"""

Save pod logs to a file

"""

output_dir = Path(__file__).parent.parent / "lm_eval" / "task_lists" / "lmeval_task_logs" / namespace

output_dir.mkdir(parents=True, exist_ok=True)

log_file = output_dir / f"{task_name}_logs.txt"

logs = pod.get_logs()

with open(log_file, "w") as f:

f.write(logs)

LOGGER.info(f"Saved logs for task {task_name} (pod: {pod.name}) to {log_file}")

return str(log_file)

def save_pod_logs(pod: Pod, task_name: str, namespace: str):

"""

Save pod logs to a file

"""

output_dir = Path(__file__).parent.parent / "lm_eval" / "task_lists" / "lmeval_task_logs" / namespace

output_dir.mkdir(parents=True, exist_ok=True)

log_file = output_dir / f"{task_name}_logs.txt"

try:

logs = pod.get_logs()

except Exception as e:

LOGGER.error(f"Failed to retrieve logs for pod {pod.name}: {e}")

logs = f"Error retrieving logs: {e}"

with open(log_file, "w") as f:

f.write(logs)

LOGGER.info(f"Saved logs for task {task_name} (pod: {pod.name}) to {log_file}")

return str(log_file)

🤖 Prompt for AI Agents

In tests/model_explainability/lm_eval/utils.py around lines 36 to 47, the function save_pod_logs calls pod.get_logs() without error handling, which can fail due to network or permission issues. Wrap the pod.get_logs() call in a try-except block to catch exceptions, log an error message if retrieval fails, and handle the failure gracefully, such as returning None or an empty string instead of crashing.

coderabbitai · 2025-06-26T15:43:25Z

tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py

+def generate_test_code(tasks):
+    """Generate pytest code for the timing benchmark"""
+    test_code = """# Timing benchmark test for LMEval systematic testing
+import pytest
+import time
+from kubernetes.dynamic import DynamicClient
+from ocp_resources.pod import Pod
+from tests.model_explainability.lm_eval.utils import verify_lmevaljob_running, save_pod_logs
+from utilities.constants import Timeout
+from simple_logger.logger import get_logger
+
+LOGGER = get_logger(name=__name__)
+
+@pytest.mark.gpu
+@pytest.mark.parametrize(
+    "model_namespace, task_name",
+    [
+"""
+    for task in tasks:
+        test_code += f'''
+        pytest.param(
+            {{"name": "test-lmeval-timing-{task.replace("_", "-")}"}},
+            "{task}",
+            id="{task}"
+        ),'''
+    test_code += '''
+    ],
+    indirect=["model_namespace"],
+)
+def test_lmeval_timing_benchmark(admin_client: DynamicClient, model_namespace, task_name: str, lmevaljob_vllm_emulator: LMEvalJob, lmevaljob_vllm_emulator_pod: Pod):
+    """Timing benchmark test - measures execution time for representative tasks"""
+    start_time = time.time()
+    lmevaljob_vllm_emulator.update(task_list={"taskNames": [task_name], "sample_size": "1%"})
+    verify_lmevaljob_running(client=admin_client, lmevaljob=lmevaljob_vllm_emulator)
+    execution_time = time.time() - start_time
+    log_file = save_pod_logs(
+        pod=lmevaljob_vllm_emulator_pod,
+        task_name=task_name,
+        namespace=model_namespace.name,
+    )
+    LOGGER.info(f"Task {task_name} took {execution_time:.2f} seconds with 1% sampling. Logs saved to {log_file}")
+'''
+    output_dir = Path(__file__).parent.parent
+    with open(output_dir / "test_lm_eval_timing.py", "w") as f:
+        f.write(test_code)
+    print(f"   🧪 Test code: {output_dir / 'test_lm_eval_timing.py'}")
+


🛠️ Refactor suggestion

Refactor test code generation for better maintainability.

The string concatenation approach for generating test code is hard to maintain. Consider using a template engine or at least break it into smaller, more readable chunks.

Consider using a template-based approach:

def generate_test_code(tasks): """Generate pytest code for the timing benchmark""" - test_code = """# Timing benchmark test for LMEval systematic testing -import pytest -import time -from kubernetes.dynamic import DynamicClient -from ocp_resources.pod import Pod -from tests.model_explainability.lm_eval.utils import verify_lmevaljob_running, save_pod_logs -from utilities.constants import Timeout -from simple_logger.logger import get_logger - -LOGGER = get_logger(name=__name__) - -@pytest.mark.gpu -@pytest.mark.parametrize( - "model_namespace, task_name", - [ -""" + # Use template approach or break into logical sections + header = _generate_test_header() + parametrize_section = _generate_parametrize_section(tasks) + test_function = _generate_test_function() + test_code = header + parametrize_section + test_function

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 132-132: expected 2 blank lines, found 1

(E302)

[error] 137-137: f-string is missing placeholders

(F541)

[error] 144-144: f-string is missing placeholders

(F541)

[error] 145-145: f-string is missing placeholders

(F541)

[error] 147-147: f-string is missing placeholders

(F541)

[error] 148-148: f-string is missing placeholders

(F541)

🤖 Prompt for AI Agents

In tests/model_explainability/lm_eval/utils/select_benchmark_tasks.py around lines 103 to 149, the test code generation uses extensive string concatenation which reduces maintainability and readability. Refactor this by using a template engine like Jinja2 or Python's built-in string.Template to separate the static test code structure from the dynamic task data. Alternatively, break the code into smaller string chunks and join them cleanly instead of appending to a single string, improving clarity and ease of updates.

coderabbitai · 2025-06-26T15:43:25Z

tests/model_explainability/lm_eval/utils/parse_configmap.py

+def parse_configmap(
+    github_url: str = "https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml",
+):
+    """Parse the ConfigMap from GitHub to extract LMEval tasks"""
+    try:
+        # Fetch YAML content from GitHub
+        response = requests.get(github_url)
+        response.raise_for_status()  # Raise exception for bad status codes
+        configmap = yaml.safe_load(response.text)
+
+        print(f"✅ Successfully loaded ConfigMap from: {github_url}")
+
+        # Print structure to understand the format
+        print("\n📋 ConfigMap Structure:")
+        print(f"   Kind: {configmap.get('kind')}")
+        print(f"   Metadata name: {configmap.get('metadata', {}).get('name')}")
+
+        data_keys = list(configmap.get("data", {}).keys())
+        print(f"   Data keys: {data_keys}")
+
+        # Extract task information
+        tasks = []
+        task_details = []
+        data = configmap.get("data", {})
+
+        print(f"\n🔍 Found data keys: {list(data.keys())}")
+
+        if "tasks" in data:
+            tasks_json = data["tasks"]
+            print(f"   📝 Tasks data type: {type(tasks_json)}")
+            print(f"   📝 Tasks data preview: {tasks_json[:300]}...")
+
+            try:
+                # Parse the JSON string
+                parsed_data = json.loads(tasks_json)
+                print("   ✅ Successfully parsed JSON")
+                print(f"   📊 Top-level keys: {list(parsed_data.keys())}")
+
+                # Extract tasks from "lm-evaluation-harness" array
+                if "lm-evaluation-harness" in parsed_data:
+                    lm_eval_tasks = parsed_data["lm-evaluation-harness"]
+                    print(f"   📋 Found {len(lm_eval_tasks)} lm-evaluation-harness tasks")
+
+                    for task_obj in lm_eval_tasks:
+                        if isinstance(task_obj, dict) and "name" in task_obj:
+                            task_name = task_obj["name"]
+                            task_desc = task_obj.get("description", "No description")
+                            tasks.append(task_name)
+                            task_details.append({"name": task_name, "description": task_desc})
+
+                # Check for other task categories
+                for key, value in parsed_data.items():
+                    if key != "lm-evaluation-harness" and isinstance(value, list):
+                        print(f"   📋 Found additional category '{key}' with {len(value)} tasks")
+                        for task_obj in value:
+                            if isinstance(task_obj, dict) and "name" in task_obj:
+                                task_name = task_obj["name"]
+                                task_desc = task_obj.get("description", "No description")
+                                tasks.append(task_name)
+                                task_details.append({"name": task_name, "description": task_desc, "category": key})
+
+            except json.JSONDecodeError as e:
+                print(f"   ❌ JSON parsing failed: {e}")
+                # Fallback to regex extraction
+                name_matches = re.findall(r'"name":\s*"([^"]+)"', tasks_json)
+                tasks.extend(name_matches)
+                print(f"   🔄 Fallback: extracted {len(name_matches)} task names using regex")
+
+        # Clean and deduplicate tasks
+        clean_tasks = [task.strip() for task in tasks if isinstance(task, str) and task.strip()]
+        unique_tasks = sorted(set(clean_tasks))
+
+        print(f"\n✅ Extracted {len(unique_tasks)} unique tasks from ConfigMap")
+
+        # Show categories if found
+        categories = {detail["category"] for detail in task_details if "category" in detail}
+        if categories:
+            print(f"   📂 Task categories found: {', '.join(categories)}")
+
+        return unique_tasks, task_details, configmap
+
+    except requests.exceptions.RequestException as e:
+        print(f"❌ Failed to fetch ConfigMap from GitHub: {e}")
+        print(f"💡 Ensure the GitHub URL is accessible: {github_url}")
+        return None, None, None
+    except yaml.YAMLError as e:
+        print(f"❌ Error parsing YAML: {e}")
+        return None, None, None
+


🛠️ Refactor suggestion

Refactor function to reduce complexity.

The parse_configmap function is too complex with excessive nesting and multiple responsibilities. Consider breaking it into smaller functions.

Consider refactoring into separate functions:

def fetch_configmap_data(github_url: str) -> dict: """Fetch and parse YAML from GitHub""" # URL validation and fetching logic def extract_tasks_from_json(tasks_json: str) -> tuple: """Extract tasks from JSON string""" # JSON parsing and task extraction logic def parse_configmap(github_url: str = "...") -> tuple: """Main orchestration function""" configmap = fetch_configmap_data(github_url) tasks, task_details = extract_tasks_from_json(configmap["data"]["tasks"]) return clean_and_deduplicate_tasks(tasks), task_details, configmap

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 47-47: f-string is missing placeholders

(F541)

🤖 Prompt for AI Agents

In tests/model_explainability/lm_eval/utils/parse_configmap.py from lines 15 to 103, the parse_configmap function is overly complex with deep nesting and multiple responsibilities. Refactor by splitting it into smaller functions: one to fetch and parse the YAML configmap from GitHub (fetch_configmap_data), another to extract tasks from the JSON string in the configmap data (extract_tasks_from_json), and a main orchestration function (parse_configmap) that calls these helpers and returns cleaned, deduplicated tasks along with task details and the configmap. This will improve readability and maintainability.

coderabbitai · 2025-06-26T15:43:26Z

tests/model_explainability/lm_eval/utils/parse_configmap.py

+def parse_configmap(
+    github_url: str = "https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml",
+):
+    """Parse the ConfigMap from GitHub to extract LMEval tasks"""
+    try:
+        # Fetch YAML content from GitHub
+        response = requests.get(github_url)
+        response.raise_for_status()  # Raise exception for bad status codes
+        configmap = yaml.safe_load(response.text)


⚠️ Potential issue

Add URL validation and security considerations.

Fetching data from external URLs poses security risks. Consider adding URL validation and timeout settings.

def parse_configmap( github_url: str = "https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml", ): """Parse the ConfigMap from GitHub to extract LMEval tasks""" try: + # Validate URL is from expected domain + if not github_url.startswith("https://raw.githubusercontent.com/"): + raise ValueError(f"URL must be from GitHub raw content: {github_url}") + # Fetch YAML content from GitHub - response = requests.get(github_url) + response = requests.get(github_url, timeout=30) response.raise_for_status() # Raise exception for bad status codes

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def parse_configmap(

github_url: str = "https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml",

):

"""Parse the ConfigMap from GitHub to extract LMEval tasks"""

try:

# Fetch YAML content from GitHub

response = requests.get(github_url)

response.raise_for_status() # Raise exception for bad status codes

configmap = yaml.safe_load(response.text)

def parse_configmap(

github_url: str = "https://raw.githubusercontent.com/ruivieira/trustyai-service-operator/RHOAIENG-26883/config/configmaps/trustyai-lmeval-tasks.yaml",

):

"""Parse the ConfigMap from GitHub to extract LMEval tasks"""

try:

+ # Validate URL is from expected domain

+ if not github_url.startswith("https://raw.githubusercontent.com/"):

+ raise ValueError(f"URL must be from GitHub raw content: {github_url}")

+

# Fetch YAML content from GitHub

- response = requests.get(github_url)

+ response = requests.get(github_url, timeout=30)

response.raise_for_status() # Raise exception for bad status codes

configmap = yaml.safe_load(response.text)

🤖 Prompt for AI Agents

In tests/model_explainability/lm_eval/utils/parse_configmap.py around lines 15 to 23, the function fetches data from an external URL without validating the URL or setting a timeout, which can cause security and reliability issues. Add validation to ensure the github_url is a valid and expected URL format, and include a timeout parameter in the requests.get call to prevent hanging requests. This will improve security and robustness when fetching the ConfigMap.

artemsa223 requested a review from a team as a code owner June 26, 2025 15:34

github-actions bot added ModelExplainability size/xl labels Jun 26, 2025

github-actions bot assigned artemsa223 Jun 26, 2025

pre-commit-ci bot and others added 2 commits June 26, 2025 15:35

[pre-commit.ci] auto fixes from pre-commit.com hooks

1b4b688

for more information, see https://pre-commit.ci

Merge branch 'main' into lmeval-systematic-logging

767fd05

coderabbitai bot reviewed Jun 26, 2025

View reviewed changes

adolfo-ab added the wip label Jun 26, 2025

artemsa223 marked this pull request as draft June 26, 2025 17:42

adolfo-ab closed this Jul 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add LMEval systematic testing with pod log collection and confi…#382

feat: Add LMEval systematic testing with pod log collection and confi…#382
artemsa223 wants to merge 3 commits intoopendatahub-io:mainfrom
artemsa223:lmeval-systematic-logging

artemsa223 commented Jun 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

coderabbitai bot commented Jun 26, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 26, 2025

Uh oh!

coderabbitai bot Jun 26, 2025

Uh oh!

coderabbitai bot Jun 26, 2025

Uh oh!

coderabbitai bot Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

artemsa223 commented Jun 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

coderabbitai bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

artemsa223 commented Jun 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)