|
| 1 | +# Model Explainability Tests |
| 2 | + |
| 3 | +This directory contains tests for AI/ML model explainability, trustworthiness, evaluation, and safety components in OpenDataHub/RHOAI. It covers TrustyAI Service, Guardrails Orchestrator, LM Eval, EvalHub, and the TrustyAI Operator. |
| 4 | + |
| 5 | +## Directory Structure |
| 6 | + |
| 7 | +```text |
| 8 | +model_explainability/ |
| 9 | +├── conftest.py # Shared fixtures (PVC, TrustyAI configmap) |
| 10 | +├── utils.py # Image validation utilities |
| 11 | +│ |
| 12 | +├── evalhub/ # EvalHub service tests |
| 13 | +│ ├── conftest.py |
| 14 | +│ ├── constants.py |
| 15 | +│ ├── test_evalhub_health.py # Health endpoint validation |
| 16 | +│ └── utils.py |
| 17 | +│ |
| 18 | +├── guardrails/ # AI Safety Guardrails tests |
| 19 | +│ ├── conftest.py # Detectors, Tempo, OpenTelemetry fixtures |
| 20 | +│ ├── constants.py |
| 21 | +│ ├── test_guardrails.py # Built-in, HuggingFace, autoconfig tests |
| 22 | +│ ├── upgrade/ |
| 23 | +│ │ └── test_guardrails_upgrade.py # Pre/post-upgrade tests |
| 24 | +│ └── utils.py |
| 25 | +│ |
| 26 | +├── lm_eval/ # Language Model Evaluation tests |
| 27 | +│ ├── conftest.py # LMEvalJob fixtures (HF, local, vLLM, S3, OCI) |
| 28 | +│ ├── constants.py # Task definitions (UNITXT, LLMAAJ) |
| 29 | +│ ├── data/ # Test data files |
| 30 | +│ ├── test_lm_eval.py # HuggingFace, offline, vLLM, S3 tests |
| 31 | +│ └── utils.py |
| 32 | +│ |
| 33 | +├── trustyai_operator/ # TrustyAI Operator validation |
| 34 | +│ ├── test_trustyai_operator.py # Operator image validation |
| 35 | +│ └── utils.py |
| 36 | +│ |
| 37 | +└── trustyai_service/ # TrustyAI Service core tests |
| 38 | + ├── conftest.py # MariaDB, KServe, ISVC fixtures |
| 39 | + ├── constants.py # Storage configs, model formats |
| 40 | + ├── trustyai_service_utils.py # TrustyAI REST client, metrics validation |
| 41 | + ├── utils.py # Service creation, RBAC, MariaDB utilities |
| 42 | + │ |
| 43 | + ├── drift/ # Drift detection tests |
| 44 | + │ ├── model_data/ # Test data batches |
| 45 | + │ └── test_drift.py # Meanshift, KSTest, ApproxKSTest, FourierMMD |
| 46 | + │ |
| 47 | + ├── fairness/ # Fairness metrics tests |
| 48 | + │ ├── conftest.py |
| 49 | + │ ├── model_data/ # Fairness test data |
| 50 | + │ └── test_fairness.py # SPD, DIR fairness metrics |
| 51 | + │ |
| 52 | + ├── service/ # Core service tests |
| 53 | + │ ├── conftest.py |
| 54 | + │ ├── test_trustyai_service.py # Image validation, DB migration, DB cert tests |
| 55 | + │ ├── utils.py |
| 56 | + │ └── multi_ns/ # Multi-namespace tests |
| 57 | + │ └── test_trustyai_service_multi_ns.py |
| 58 | + │ |
| 59 | + └── upgrade/ # Upgrade compatibility tests |
| 60 | + └── test_trustyai_service_upgrade.py |
| 61 | +``` |
| 62 | + |
| 63 | +### Current Test Suites |
| 64 | + |
| 65 | +- **`evalhub/`** - EvalHub service health endpoint validation via kube-rbac-proxy |
| 66 | +- **`guardrails/`** - Guardrails Orchestrator tests with built-in regex detectors (PII), HuggingFace detectors (prompt injection, HAP), auto-configuration, and gateway routing. Includes OpenTelemetry/Tempo trace integration |
| 67 | +- **`lm_eval/`** - Language Model Evaluation tests covering HuggingFace models, local/offline tasks, vLLM integration, S3 storage, and OCI registry artifacts |
| 68 | +- **`trustyai_operator/`** - TrustyAI operator container image validation (SHA256 digests, CSV relatedImages) |
| 69 | +- **`trustyai_service/`** - TrustyAI Service tests for drift detection (4 metrics), fairness metrics (SPD, DIR), database migration, multi-namespace support, and upgrade scenarios. Tests run against both PVC and database storage backends |
| 70 | + |
| 71 | +## Test Markers |
| 72 | + |
| 73 | +```python |
| 74 | +@pytest.mark.model_explainability # Module-level marker |
| 75 | +@pytest.mark.smoke # Critical smoke tests |
| 76 | +@pytest.mark.tier1 # Tier 1 tests |
| 77 | +@pytest.mark.tier2 # Tier 2 tests |
| 78 | +@pytest.mark.pre_upgrade # Pre-upgrade tests |
| 79 | +@pytest.mark.post_upgrade # Post-upgrade tests |
| 80 | +@pytest.mark.rawdeployment # KServe raw deployment mode |
| 81 | +@pytest.mark.skip_on_disconnected # Requires internet connectivity |
| 82 | +``` |
| 83 | + |
| 84 | +## Running Tests |
| 85 | + |
| 86 | +### Run All Model Explainability Tests |
| 87 | + |
| 88 | +```bash |
| 89 | +uv run pytest tests/model_explainability/ |
| 90 | +``` |
| 91 | + |
| 92 | +### Run Tests by Component |
| 93 | + |
| 94 | +```bash |
| 95 | +# Run TrustyAI Service tests |
| 96 | +uv run pytest tests/model_explainability/trustyai_service/ |
| 97 | + |
| 98 | +# Run Guardrails tests |
| 99 | +uv run pytest tests/model_explainability/guardrails/ |
| 100 | + |
| 101 | +# Run LM Eval tests |
| 102 | +uv run pytest tests/model_explainability/lm_eval/ |
| 103 | + |
| 104 | +# Run EvalHub tests |
| 105 | +uv run pytest tests/model_explainability/evalhub/ |
| 106 | +``` |
| 107 | + |
| 108 | +### Run Tests with Markers |
| 109 | + |
| 110 | +```bash |
| 111 | +# Run only smoke tests |
| 112 | +uv run pytest -m "model_explainability and smoke" tests/model_explainability/ |
| 113 | + |
| 114 | +# Run drift detection tests |
| 115 | +uv run pytest tests/model_explainability/trustyai_service/drift/ |
| 116 | + |
| 117 | +# Run fairness tests |
| 118 | +uv run pytest tests/model_explainability/trustyai_service/fairness/ |
| 119 | +``` |
| 120 | + |
| 121 | +## Additional Resources |
| 122 | + |
| 123 | +- [TrustyAI Documentation](https://github.com/trustyai-explainability) |
0 commit comments