sync: main to incubation#69
Merged
saichandrapandraju merged 85 commits intoincubationfrom Mar 13, 2026
Merged
Conversation
* feat: eval-hub-sdk integration poc * fix: update probe_tags parameter type * fix(garak): install eval-hub SDK in Containerfile * fix(garak): Update Job Spec to new location * fix(garak): align adapter with recent evalhub SDK contracts * fix(garak): resolve benchmark_id to probe profile in adapter * fix(garak): read registry settings from env vars directly * fix(garak): align adapter with eval-hub SDK latest (OCIArtifactSpec, DefaultCallbacks) * fix(garak): instantiate GarakScanConfig to access Pydantic model fields * feat(garak): Enhanced GarakAdapter to build and utilize the new configuration structure * Empty commit --------- Co-authored-by: saichandrapandraju <saichandrapandraju@gmail.com>
…ility/evalhub-kfp-poc feat(evalhub): Add preliminary KFP execution mode to eval-hub Garak adapter
…tes Garak typology and intent stubs
better example jsonl run
The Jinja2 template and Vega chart specs under resources/ were not included when building the package because setuptools only discovers Python packages by default. Add package-data configuration so that resources/* is bundled with the distribution.
…el logic. parse_detector now mirrors _is_rejected from earlystop.py: any single safe score from any detector in any generation makes the attempt "refused". Only when every score exceeds the threshold is the attempt "complied". The Vega chart's max-across-attempts aggregation then matches _update_attempt_status. - Friendly attack and scenario names - Charts cosmetic improvements - Strip newlines from stubs to prevent the prompts being split on stub loading (expects one per line). - Add funnel property tests verifying refused(stage N) == total(stage N+1) and max-across-attempts aggregation. - Add per-strategy subsections, each containing a summary table and a variant breakdown appropriate to the probe type.
…ed high_level_stats labels ("Jailbroken questions" / "Safe questions") and pass earlystop_data for full-pipeline comparison.
…bility/mlflow-callback feat(evalhub): Add MLflow artifact saving functionality
Implements test coverage for issue trustyai-explainability#113 to verify that shields work correctly with intents (ART) benchmarks. The test suite covers: - Configuration tests for shield_ids and shield_config - Validation tests for shields API requirements and error handling - Integration tests for the full workflow (register, validate, build command) - Tests verify function-based generator configuration with shield mappings Also fixes pytest configuration typo (python_paths → pythonpath). Fixes trustyai-explainability#113 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…bility/fix-api-keys fix: Secure model API key handling via Kubernetes Secrets with volume mount
Addresses code review comments by: - Extracting shared `create_adapter` context manager to reduce repeated GarakRemoteEvalAdapter initialization pattern with patch.object calls - Adding `create_benchmark_config` factory helper for reusable test BenchmarkConfig instances - Removing duplicate `remote_config` fixture from TestIntentsWithShieldsValidation class (uses module-level fixture) - Updating all test methods to use the new helpers This makes the test suite more maintainable and DRY while keeping all 7 tests passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…bility/test-intents-shields Add comprehensive tests for intents benchmarks with shields
…bility/remove-evalhub-kfp-prefix Remove EVALHUB_ prefix from KFP environment variables
…bility/art-defaults Update default detector for Garak intents
…bility/bump-0.3.0 bump version to 0.3.0
…bility/pypi-publish-fix add requirements-inline-extra.txt and update pyproject.toml to fix pypi release
…bility/artifacts-evalhub introduce _GarakCallbacks to surface S3 artifact URLs in job response
[pull] main from trustyai-explainability:main
saichandrapandraju
approved these changes
Mar 12, 2026
tarilabs
approved these changes
Mar 13, 2026
Member
tarilabs
left a comment
There was a problem hiding this comment.
main -> incubation sync approval.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sync-branches: New code has just landed in main, so let's bring incubation up to speed!