opendatahub-io
diff --git a/‎AGENTS.md‎
Lines changed: 95 additions & 17 deletions b/‎AGENTS.md‎
Lines changed: 95 additions & 17 deletions
diff --git a/‎CONSTITUTION.md‎
Lines changed: 162 additions & 0 deletions b/‎CONSTITUTION.md‎
Lines changed: 162 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎tests/llama_stack/conftest.py‎
Lines changed: 9 additions & 5 deletions b/‎tests/llama_stack/conftest.py‎
Lines changed: 9 additions & 5 deletions
diff --git a/‎tests/llama_stack/constants.py‎
Lines changed: 1 addition & 1 deletion b/‎tests/llama_stack/constants.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎tests/llama_stack/inference/test_embeddings.py‎
Lines changed: 4 additions & 4 deletions b/‎tests/llama_stack/inference/test_embeddings.py‎
Lines changed: 4 additions & 4 deletions
@@ -1,19 +1,97 @@
 # Overview
+
 This is a testing repo for OpenDataHub and OpenShift AI, which are MLOps platforms for OpenShift.
-The tests contained in the repo are high-level integration tests at the Kubernetes API level.
-
-# Documentation
-All the general information about the repo is contained in the /docs directory.
-At the start of each session, consider if you need to consult any of these files in order to answer:
-- [Guidelines for Getting Started](./docs/GETTING_STARTED.md)
-- [Developer Guide](./docs/DEVELOPER_GUIDE.md)
-- [Style Guide](./docs/STYLE_GUIDE.md)
-
-# Specific Instructions
-- Avoid unnecessary complexity: Aim for the simplest solution that works, while keeping the code clean.
-- Avoid obvious comments: Only add comments to explain especially complex code blocks.
-- Maintain code consistency: Follow existing code patterns and architecture.
-- Maintain locality of behavior: Keep code close to where it's used.
-- Make small, focused changes, unless explicitly asked otherwise.
-- Keep security in mind: Avoid filtering sensitive information and running destructive commands.
-- When in doubt about something, ask the user.
+The tests are high-level integration tests at the Kubernetes API level.
+
+You are an expert QE engineer writing maintainable pytest tests that other engineers can understand without deep domain knowledge.
+
+## Commands
+
+### Validation (run before committing)
+```bash
+# Run all pre-commit checks
+pre-commit run --all-files
+
+# Run tox (CI validation)
+tox
+```
+
+### Test Execution
+```bash
+# Collect tests without running (verify structure)
+uv run pytest --collect-only
+
+# Run specific marker
+uv run pytest -m smoke
+uv run pytest -m "model_serving and tier1"
+
+# Run with setup plan (debug fixtures)
+uv run pytest --setup-plan tests/model_serving/
+```
+
+## Project Structure
+
+```text
+tests/                    # Test modules by component
+├── conftest.py           # All shared fixtures
+├── <component>/          # Component test directories
+│   ├── conftest.py       # Component-scoped fixtures
+│   └── test_*.py         # Test files
+|   └── utils.py          # Component-specific utility functions
+utilities/                # Shared utility functions
+└── <topic>_utils.py      # Topic-specific utility functions
+```
+
+## Essential Patterns
+
+### Tests
+- Every test MUST have a docstring explaining what it tests (see `tests/cluster_health/test_cluster_health.py`)
+- Apply relevant markers from `pytest.ini`: tier (`smoke`, `sanity`, `tier1`, `tier2`), component (`model_serving`, `model_registry`, `llama_stack`), infrastructure (`gpu`, `parallel`, `slow`)
+- Use Given-When-Then format in docstrings for behavioral clarity
+
+### Fixtures
+- Fixture names MUST be nouns: `storage_secret` not `create_secret`
+- Use context managers for resource lifecycle (see `tests/conftest.py:544-550` for pattern)
+- Fixtures do one thing only—compose them rather than nesting
+- Use narrowest scope that meets the need: function > class > module > session
+
+### Kubernetes Resources
+- Use [openshift-python-wrapper](https://github.com/RedHatQE/openshift-python-wrapper) for all K8s API calls
+- Resource lifecycle MUST use context managers to ensure cleanup
+- Use `oc` CLI only when wrapper is not relevant (e.g., must-gather)
+
+## Common Pitfalls
+
+- **ERROR vs FAILED**: Pytest reports fixture failures as ERROR, test failures as FAILED
+- **Heavy imports**: Don't import heavy resources at module level; defer to fixture scope
+- **Flaky tests**: Use `pytest.skip()` with `@pytest.mark.jira("PROJ-123")`, never delete
+- **Fixture scope**: Session fixtures in `tests/conftest.py` run once for entire suite—modify carefully
+
+## Boundaries
+
+### ✅ Always
+- Follow existing patterns before introducing new approaches
+- Add type annotations (mypy strict enforced)
+- Write Google-format docstrings for tests and fixtures
+- Run `pre-commit run --all-files` before suggesting changes
+
+### ⚠️ Ask First
+- Adding new dependencies to `pyproject.toml`
+- Creating new `conftest.py` files
+- Moving fixtures to shared locations
+- Adding new markers to `pytest.ini`
+- Modifying session-scoped fixtures
+
+### 🚫 Never
+- Remove or modify existing tests without explicit request
+- Add code that isn't immediately used (YAGNI)
+- Log secrets, tokens, or credentials
+- Skip pre-commit or type checking
+- Create abstractions for single-use code
+
+## Documentation Reference
+
+Consult these for detailed guidance:
+- [Constitution](./CONSTITUTION.md) - Non-negotiable principles (supersedes all other docs)
+- [Developer Guide](./docs/DEVELOPER_GUIDE.md) - Contribution workflow, fixture examples
+- [Style Guide](./docs/STYLE_GUIDE.md) - Naming, typing, docstrings
@@ -0,0 +1,162 @@
+# OpenDataHub-Tests Constitution
+
+This constitution defines the non-negotiable principles and governance rules for the opendatahub-tests repository. It applies to all test development, whether performed by humans or AI assistants.
+
+## Core Principles
+
+### I. Simplicity First
+
+All changes MUST favor the simplest solution that works. Complexity MUST be justified.
+
+- Aim for the simplest solution that works while keeping the code clean
+- Do not prepare code for the future just because it may be useful (YAGNI)
+- Every function, variable, fixture, and test written MUST be used, or else removed
+- Flexible code MUST NOT come at the expense of readability
+
+**Rationale**: The codebase is maintained by multiple teams; simplicity ensures maintainability and reduces cognitive load.
+
+### II. Code Consistency
+
+All changes MUST follow existing code patterns and architecture.
+
+- Follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
+- Use pre-commit hooks to enforce style (ruff, mypy, flake8)
+- Use absolute import paths; import specific functions rather than modules
+- Use descriptive names; meaningful names are better than short names
+- Add type annotations to all new code; follow the rules defined in [pyproject.toml](./pyproject.toml)
+
+**Rationale**: Consistent patterns reduce the learning curve and prevent architectural drift.
+
+### III. Test Clarity and Dependencies
+
+Each test MUST verify a single aspect of the product and may be dependent on other tests.
+
+- Tests MUST have a clear purpose and be easy to understand
+- Tests MUST be properly documented with docstrings explaining what the test does
+- When test dependencies exist, use pytest-dependency plugin to declare them explicitly, encourage use of dependency marker(s) when possible
+- Group related tests in classes only when they share fixtures; never group unrelated tests
+
+### IV. Fixture Discipline
+
+Fixtures MUST do one thing only and follow proper scoping.
+
+- Fixture names MUST be nouns describing what they provide (e.g., `storage_secret` not `create_secret`)
+- Fixtures MUST handle setup and teardown using context managers where appropriate
+- Use the narrowest fixture scope that meets the need (function > class > module > session)
+- Conftest.py files MUST contain fixtures only; no utility functions or constants
+- Use `request.param` with dict structures for parameterized fixtures
+
+**Rationale**: Single-responsibility fixtures are easier to debug, reuse, and compose.
+
+### V. Interacting with Kubernetes Resources
+
+All cluster interactions MUST use openshift-python-wrapper or oc CLI.
+
+- Use [openshift-python-wrapper](https://github.com/RedHatQE/openshift-python-wrapper) for all K8s API calls
+- For missing resources, generate them using class_generator and contribute to wrapper
+- Use oc CLI only when wrapper is not relevant (e.g., must-gather generation)
+- Resource lifecycle MUST be managed via context managers to ensure cleanup
+
+**Rationale**: Consistent API abstraction ensures portability between ODH (upstream) and RHOAI (downstream).
+
+### VI. Locality of Behavior
+
+Keep code close to where it is used.
+
+- Keep functions and fixtures close to where they're used initially
+- Move to shared locations (utilities, common conftest) only when multiple modules need them
+- Avoid creating abstractions prematurely
+- Small, focused changes are preferred unless explicitly asked otherwise
+
+**Rationale**: Locality reduces navigation overhead and makes the impact of changes obvious.
+
+### VII. Security Awareness
+
+All code MUST consider security implications.
+
+- Never log/expose secrets; redact/mask if printing is unavoidable
+- Avoid running destructive commands without explicit user confirmation
+- Use detect-secrets and gitleaks pre-commit hooks to prevent secret leakage
+- Test code MUST NOT introduce vulnerabilities into the tested systems
+
+**Rationale**: Tests interact with production-like clusters; security lapses can have real consequences.
+
+## Test Development Standards
+
+### Test Documentation
+
+- Every test or test class MUST have a docstring explaining what it tests
+- Docstrings MUST be understandable by engineers from other components, managers, or PMs
+- Use Google-format docstrings
+- Comments are allowed only for complex code blocks (e.g., complex regex)
+
+### Test Markers
+
+- All tests MUST apply relevant markers from pytest.ini
+- Use tier markers (smoke, sanity, tier1, tier2) to indicate test priority
+- Use component markers (model_explainability, llama_stack, rag) for ownership
+- Use infrastructure markers (gpu, parallel, slow) for execution filtering
+
+### Test Organization
+
+- Tests are organized by component in `tests/<component>/`
+- Each component has its own conftest.py for scoped fixtures
+- Utilities go in `utilities/` with topic-specific modules
+
+## AI-Assisted Development Guidelines
+
+### Developer Responsibility
+
+Developers are ultimately responsible for all code, regardless of whether AI tools assisted.
+
+- Always assume AI-generated code is unsafe and incorrect until verified
+- Double-check all AI suggestions against project patterns and this constitution
+- AI tools MUST be guided by AGENTS.md (symlink to CLAUDE.md if needed)
+
+### AI Code Generation Rules
+
+- AI MUST follow existing patterns; never introduce new architectural concepts without justification
+- AI MUST NOT add unnecessary complexity or "helpful" abstractions
+- AI-generated tests MUST have proper docstrings and markers
+- AI MUST ask when in doubt about requirements or patterns
+
+### Specification-Driven Development
+
+When adopting AI-driven spec development:
+
+- Specifications MUST be in structured format (YAML/JSON with defined schema)
+- Tests MUST include requirement traceability (Polarion, Jira markers)
+- Docstrings MUST follow Given-When-Then pattern for behavioral clarity
+- Generated tests MUST pass pre-commit checks before review
+
+## Governance
+
+### Constitution Authority
+
+This constitution supersedes all other practices when there is a conflict. All PRs and reviews MUST verify compliance.
+
+### Amendment Process
+
+1. Propose changes via PR to `CONSTITUTION.md`
+2. Changes require review by at least two maintainers
+3. Breaking changes (principle removal/redefinition) require team discussion
+
+### Versioning Policy
+
+No versioning policy is enforced.
+
+### Compliance Review
+
+- All PRs MUST be verified against constitution principles
+- Pre-commit hooks enforce code quality standards
+- CI (tox) validates test structure and typing
+- Two reviewers required; verified label required before merge
+
+### Guidance Reference
+
+For development runtime guidance, consult:
+- [AGENTS.md](./AGENTS.md) for AI assistant instructions
+- [DEVELOPER_GUIDE.md](./docs/DEVELOPER_GUIDE.md) for contribution details
+- [STYLE_GUIDE.md](./docs/STYLE_GUIDE.md) for code style
+
+**Version**: 1.0.0 | **Ratified**: 2026-01-08 | **Last Amended**: 2026-01-08
@@ -70,7 +70,7 @@ dependencies = [
     "marshmallow==3.26.2,<4", # this version is needed for pytest-jira
     "pytest-html>=4.1.1",
     "fire",
-    "llama_stack_client>=0.3.0,<0.4",
+    "llama_stack_client>=0.4.0,<0.5",
     "pytest-xdist==3.8.0",
     "dictdiffer>=0.9.0",
     "pytest>=9.0.0",
 
@@ -652,7 +652,7 @@ def llama_stack_models(unprivileged_llama_stack_client: LlamaStackClient) -> Mod
     """
     models = unprivileged_llama_stack_client.models.list()
 
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
+    model_id = next(m for m in models if m.custom_metadata["model_type"] == "llm").id
 
     # Ensure getting the right embedding model depending on the available providers
     providers = unprivileged_llama_stack_client.providers.list()
@@ -664,11 +664,15 @@ def llama_stack_models(unprivileged_llama_stack_client: LlamaStackClient) -> Mod
     else:
         raise ValueError("No embedding provider found")
 
-    embedding_model = next(m for m in models if m.api_model_type == "embedding" and m.provider_id == target_provider_id)
-    embedding_dimension = float(embedding_model.metadata["embedding_dimension"])
+    embedding_model = next(
+        m
+        for m in models
+        if m.custom_metadata["model_type"] == "embedding" and m.custom_metadata["provider_id"] == target_provider_id
+    )
+    embedding_dimension = int(embedding_model.custom_metadata["embedding_dimension"])
 
     LOGGER.info(f"Detected model: {model_id}")
-    LOGGER.info(f"Detected embedding_model: {embedding_model.identifier}")
+    LOGGER.info(f"Detected embedding_model: {embedding_model.id}")
     LOGGER.info(f"Detected embedding_dimension: {embedding_dimension}")
 
     return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension)
@@ -700,7 +704,7 @@ def vector_store(
     vector_store = unprivileged_llama_stack_client.vector_stores.create(
         name="test_vector_store",
         extra_body={
-            "embedding_model": llama_stack_models.embedding_model.identifier,
+            "embedding_model": llama_stack_models.embedding_model.id,
             "embedding_dimension": llama_stack_models.embedding_dimension,
             "provider_id": vector_io_provider,
         },
 
@@ -26,7 +26,7 @@ class ModelInfo(NamedTuple):
 
     model_id: str
     embedding_model: Model
-    embedding_dimension: float  # API returns float (e.g., 768.0) despite being conceptually an integer
+    embedding_dimension: int  # API returns integer (e.g., 768)
 
 
 LLS_CORE_POD_FILTER: str = "app=llama-stack"
 
@@ -50,7 +50,7 @@ def test_inference_embeddings(
 
         # Embed single input text with encoding_format=float (the returned embedding item is a list of floats)
         embeddings_response = unprivileged_llama_stack_client.embeddings.create(
-            model=llama_stack_models.embedding_model.identifier,
+            model=llama_stack_models.embedding_model.id,
             input="The food was delicious and the waiter...",
             encoding_format="float",
         )
@@ -63,7 +63,7 @@ def test_inference_embeddings(
         # Embed single input text with encoding_format=base64  (the returned embedding item is
         # a single base64-encoded string)
         embeddings_response = unprivileged_llama_stack_client.embeddings.create(
-            model=llama_stack_models.embedding_model.identifier,
+            model=llama_stack_models.embedding_model.id,
             input="The food was delicious and the waiter...",
             encoding_format="base64",
         )
@@ -74,7 +74,7 @@ def test_inference_embeddings(
         # Embed multiple input sets with encoding_format=float (each returned embedding item is a list of floats)
         input_list = ["Input text 1", "Input text 1", "Input text 1"]
         embeddings_response = unprivileged_llama_stack_client.embeddings.create(
-            model=llama_stack_models.embedding_model.identifier, input=input_list, encoding_format="float"
+            model=llama_stack_models.embedding_model.id, input=input_list, encoding_format="float"
         )
         assert isinstance(embeddings_response, CreateEmbeddingsResponse)
         assert len(embeddings_response.data) == len(input_list)
@@ -86,7 +86,7 @@ def test_inference_embeddings(
         # Embed multiple input sets with base64 encoding format (each returned embedding a single base64-encoded string)
         input_list = ["Input text 1", "Input text 1", "Input text 1"]
         embeddings_response = unprivileged_llama_stack_client.embeddings.create(
-            model=llama_stack_models.embedding_model.identifier, input=input_list, encoding_format="base64"
+            model=llama_stack_models.embedding_model.id, input=input_list, encoding_format="base64"
         )
         assert isinstance(embeddings_response, CreateEmbeddingsResponse)
         assert len(embeddings_response.data) == len(input_list)