Skip to content

chore: sync rhoai-3.4 with main#392

Merged
chaitanya1731 merged 18 commits intored-hat-data-services:rhoai-3.4from
chaitanya1731:sync-main-to-rhoai-3.4
Apr 22, 2026
Merged

chore: sync rhoai-3.4 with main#392
chaitanya1731 merged 18 commits intored-hat-data-services:rhoai-3.4from
chaitanya1731:sync-main-to-rhoai-3.4

Conversation

@chaitanya1731
Copy link
Copy Markdown

@chaitanya1731 chaitanya1731 commented Apr 22, 2026

Summary

Sync rhoai-3.4 branch with the latest changes from main to keep the release branch up to date.

Description

This PR merges main into rhoai-3.4, bringing in 17 commits. Merge completed cleanly with no conflicts.

33 files changed across controller code, CRDs, RBAC, dashboards, docs, e2e tests, and Dockerfiles.

Commits included

Commit Description Upstream PR
cfc8964 chore(deps): update ubi9/ubi-minimal docker digest to 7d4e475 #384
cbcbd25 Merge remote-tracking branch 'upstream/rhoai'
fad486e chore: promote stable to rhoai #789
fb75981 chore: promote main to stable #788
fb2ea25 feat: add tenant CRD to e2e artifact collection and debug report #787
1b8f212 chore: restrict rbac for db secret #779
e746008 docs: add/update documentation for Maas Tenant #773
b9a8979 chore(deps): update ubi9/go-toolset docker digest to d637b9d #371
147eaa2 fix: per-model(s) top-level values in usage dashboard #772
5928f54 chore(deps): update ubi9/ubi-minimal docker digest to 175bafd #361
6bea2fb chore(deps): update ubi9/go-toolset docker digest to 1e1c895 #360
b327b34 feat: add OIDC token support for model discovery via /v1/models #703
dbf6d03 fix: validate token rate limits and skip invalid subs in TRLP aggregation #752
89fba29 chore: promote main to stable #770
fae753e chore: add .worktrees/ to .gitignore #774
c01dc5b fix: minor updates for external model #771
65ca551 fix: add explicit command to v0.8.2 simulator models #765

How It Was Tested

  • Clean merge with no conflicts verified locally.
  • Existing CI checks will validate the merged content.

jrhyness and others added 18 commits April 17, 2026 21:30
opendatahub-io#765)

…wrapper

## Description

KServe's LLMInferenceServiceConfig template injects `command:
["/bin/bash", "-c"]` for containers without an explicit command, causing
v0.8.2 simulator models to crash with "invalid option" errors. This adds
the explicit command back to all v0.8.2 simulator model YAMLs and
updates trlp-test to v0.8.2 with consistent args.


## How Has This Been Tested?
Currently running smoke test, but models no longer in CrashLoopBackOff. 
```
$ oc get pods -n llm
NAME                                                              READY   STATUS    RESTARTS   AGE
e2e-distinct-2-simulated-kserve-7f849f6b56-kpwp9                  1/1     Running   0          21s
e2e-distinct-simulated-kserve-7bb4cdb4d7-frnz5                    1/1     Running   0          87s
e2e-trlp-test-simulated-kserve-84db68679b-t98f7                   1/1     Running   0          64s
e2e-unconfigured-facebook-opt-125m-simulated-kserve-75cdcctjp2d   1/1     Running   0          66s
facebook-opt-125m-simulated-kserve-8f8dc67b7-4x7g9                1/1     Running   0          57s
premium-simulated-simulated-premium-kserve-6b97b89985-ln8r2       1/1     Running   0          70s
```


## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [ ] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Documentation**
* Updated sample model configurations to explicitly specify container
execution commands for improved clarity and consistency across all
sample deployments.

* **Tests**
* Upgraded test simulator fixture to version 0.8.2 with enhanced
configuration options.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
the CRD initially was very restrictive, with enum specifying the
providers.
the PR make the field less strict, to avoid some issues around it in
payload processing.
generally speaking, this CRD would need to update, and current change is
a temp solution.

## Description
<!--- Describe your changes in detail -->

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran
to -->
<!--- see how your change affects other areas of the code, etc. -->

## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [ ] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Extended provider support to accept custom provider values beyond the
previously hardcoded set, enabling greater flexibility in model
configuration.

* **Bug Fixes**
* Refined resource access permissions to improve security posture by
limiting scope to essential resources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Nir Rozenbaum <nrozenba@redhat.com>
This is a minor change, relevant for people working with Git worktrees.
The feature requires a working directory, and `.worktrees` is a popular
name for it (in the project root directory).

## How Has This Been Tested?
Git ignores the directory after updating `.gitignore` properly

## Merge criteria:
- [x] The commits are squashed in a cohesive manner and have meaningful
messages.
- [x] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [x] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated `.gitignore` to exclude Git worktree directories from version
control.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated promotion of **1 commit(s)** from `main` to `stable`.

```
65ca551 fix: add explicit command to v0.8.2 simulator models to prevent bash … (opendatahub-io#765)
```

---------

Signed-off-by: Nir Rozenbaum <nrozenba@redhat.com>
Co-authored-by: Jim Rhyness <jrhyness@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Nir Rozenbaum <nrozenba@redhat.com>
Co-authored-by: Egor Lunin <egor@centerya.com>
…tion (opendatahub-io#752)

<!--- Provide a general summary of your changes in the Title above -->

## Description
Prevent invalid token rate limits from poisoning the aggregated
TokenRateLimitPolicy and blocking deletion of subscriptions that share a
model.

Changes:
- Add validateTokenRateLimit() that rejects unreasonable values before
they reach Kuadrant (limit > 1B tokens, window > 365 days, bad format)
- Skip subscriptions with invalid limits during TRLP aggregation instead
of including them and causing Kuadrant validation failures
- Log clear error messages identifying the offending subscription
- Add Maximum=1000000000 CRD validation on TokenRateLimit.Limit to
reject extreme values at admission time

Root cause: When one subscription had invalid limits (e.g. limit:
9007199254740991, window: 9007199254740991d), the controller built an
aggregated TRLP that Kuadrant rejected. On delete, the controller tried
to rebuild the TRLP (still including the bad sub if others shared the
model), which failed again, preventing finalizer removal and leaving
subscriptions stuck in Terminating.


<!--- Describe your changes in detail -->

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran
to -->
<!--- see how your change affects other areas of the code, etc. -->

## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [ ] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Enforced token rate limit validation (allowed range 1–1,000,000,000)
and stricter window format/duration checks.
* Subscriptions with invalid limits are skipped during aggregation; if
all candidates are invalid, existing aggregated rate limits are removed.
* **Documentation**
* Clarified CRD schema description to state the maximum allowed token
limit.
* **Tests**
  * Updated test input to reflect the adjusted maximum window value.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Wen Liang <liangwen12year@gmail.com>
…datahub-io#703)

## Description

Enable users to discover available models using OIDC tokens without
first minting an API key. OIDC tokens behave identically to K8s tokens:
return all accessible models from all subscriptions based on group
membership.

For  https://redhat.atlassian.net/browse/RHOAIENG-51554



Changes:
- Controller reads OIDC config from ModelsAsService CR
(spec.externalOIDC)
- Dynamically generates oidc-identities authentication in model
AuthPolicies
- Updated CEL expressions to extract username/groups from both OIDC and
K8s tokens
- Updated OPA rules to handle OIDC token structure (auth.identity.sub,
preferred_username)
- Added unit tests for OIDC config fetching and CEL expression
validation
- Added E2E tests for OIDC token authentication and negative test cases

OIDC authentication uses JWT signature validation (no per-request
callout). Authorino validates tokens using public keys from the OIDC
provider's issuerUrl.



## How Has This Been Tested?
- unit test
- new e2e tests 


## Merge criteria:

- [ ] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* OIDC bearer-token support for API requests; controller discovers
cluster OIDC issuer and optional client ID to enable JWT-based auth for
model listing. Username/groups are derived from API keys, OIDC claims,
or Kubernetes identity for subscription selection and cache keys.

* **Chores**
* RBAC updated and controller now watches cluster service configuration
to refresh auth policies.

* **Tests**
* Added unit and e2e tests for OIDC config handling, claim composition,
CEL expressions, and token-based access.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
… digest to 1e1c895 (opendatahub-io#360)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…r digest to 175bafd (opendatahub-io#361)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…#772)

<!--- Provide a general summary of your changes in the Title above -->

## Description
The top-levels values of Total Requests, Total Errors, Success Rate, and
Active Users, are now calculated also based on the selected model(s).

## How Has This Been Tested?
Tested manually

## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [x] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [x] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Improved Usage dashboard to be model-aware and aggregated per
user/subscription/namespace, enhancing accuracy for Active Users,
Success Rate, Token Consumption, Total Errors, and Total Requests.
* Updated gating and fallback logic for authorization and rate-limit
signals to produce more reliable time-series and per-user token
consumption reporting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Arik Hadas <ahadas@redhat.com>
… digest to d637b9d (opendatahub-io#371)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
<!--- Provide a general summary of your changes in the Title above -->

## Description
Add / Update documentation for Maas Tenant based architecture change.

[RHOAIENG-58939](https://redhat.atlassian.net/browse/RHOAIENG-58939)

## How Has This Been Tested?
N/A - docs only changes

## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [x] The commits are squashed in a cohesive manner and have meaningful
messages.
- [ ] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [ ] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Updated installation and setup to use the Tenant CR as the primary
platform configuration and documented controller self-bootstrapping of a
default-tenant
* Clarified architecture: Tenant reconciler responsibilities, resource
ownership/cleanup, and reconciler split
* Updated quickstart and verification steps, namespace/resource
locations, and command examples
  * Added Tenant CR details to release notes and configuration guidance
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!--- Provide a general summary of your changes in the Title above -->

## Description
Restrict the maas-controller-role ClusterRole so that get on secrets is
scoped to only `maas-db-config` via resourceNames, rather than granting
get on all secrets cluster-wide.

Why list/watch remains unrestricted: The controller-runtime informer
started by `Watches(&corev1.Secret{})` requires list and watch.
Kubernetes RBAC does not support resourceNames on collection verbs — the
API server ignores it. The Watch already has a client-side predicate
(`secretNamedMaaSDB`) that filters to only `maas-db-config`.

Addresses: lphiri's feedback on PR opendatahub-io#735 — "The DB secret has a specific
name we know ahead of time. I suggest creating a role with a
resourceName constraint."

https://redhat.atlassian.net/browse/RHOAIENG-58934

How Has This Been Tested?
```
make -C maas-controller manifests — generated YAML matches updated markers
make -C maas-controller test — all unit tests pass (no logic changes)
make -C maas-controller verify-codegen — confirms markers and YAML are in sync
```
To verify on a cluster after deploy:
```
# Should return "yes"
kubectl auth can-i get secrets/maas-db-config \
  --as=system:serviceaccount:opendatahub:maas-controller -n opendatahub
# Should return "no"
kubectl auth can-i get secrets/some-other-secret \
  --as=system:serviceaccount:opendatahub:maas-controller -n opendatahub
```

## Merge criteria:
<!--- This PR will be merged by any repository approver when it meets
all the points in the checklist -->
<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->

- [x] The commits are squashed in a cohesive manner and have meaningful
messages.
- [x] Testing instructions have been added in the PR body (for PRs
involving changes that are not immediately obvious).
- [x] The developer has manually tested the changes and verified that
the changes work


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Security Updates**
* Restricted secret access: get permission is now limited to the
specific maas-db-config secret.
* List and watch permissions for secrets remain available, preserving
discovery and monitoring capabilities while reducing exposure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ndatahub-io#787)

## Summary
Add the `tenants.maas.opendatahub.io` CRD to e2e must-gather artifact
collection and the auth debug report.

## Description
The Tenant CRD was recently introduced but was missing from the e2e
debugging and artifact-collection utilities in `auth_utils.sh`.

- Add `tenants.maas.opendatahub.io` to the `MAAS_CRDS` array so
`collect_maas_crs()` dumps Tenant CR YAML to `tenants.yaml`.
- Add `kubectl get tenants` to `collect_cluster_state()` alongside other
MaaS CRs.
- Add Tenant listing and status/condition detail to
`run_auth_debug_report()` under the MaaS CRs section.
- Update header comment to document the new `tenants.yaml` artifact.

## How it was tested
- Verified script syntax with `bash -n`.
- Confirmed the Tenant CRD name matches
`deployment/base/maas-controller/crd/bases/maas.opendatahub.io_tenants.yaml`.
- Confirmed namespace usage aligns with
`TenantReconciler.TenantNamespace` (sourced from
`--maas-subscription-namespace`).

Made with [Cursor](https://cursor.com)

Signed-off-by: Chaitanya Kulkarni <ckulkarn@redhat.com>
Signed-off-by: Chaitanya Kulkarni <chkulkar@redhat.com>
Automated promotion of **9 commit(s)** from `main` to `stable`.

```
fb2ea25 feat: add tenant CRD to e2e artifact collection and debug report (opendatahub-io#787)
1b8f212 chore: restrict rbac for db secret (opendatahub-io#779)
e746008 docs: add/update documentation for Maas Tenant (opendatahub-io#773)
147eaa2 fix: per-model(s) top-level values in usage dashboard (opendatahub-io#772)
b327b34 feat: add OIDC token support for model discovery via /v1/models (opendatahub-io#703)
dbf6d03 fix: validate token rate limits and skip invalid subs in TRLP aggregation (opendatahub-io#752)
fae753e chore: add .worktrees/ to .gitignore (opendatahub-io#774)
c01dc5b fix: minor updates for external model (opendatahub-io#771)
65ca551 fix: add explicit command to v0.8.2 simulator models to prevent bash … (opendatahub-io#765)
```
Automated promotion of **11 commit(s)** from `stable` to `rhoai`.

```
fb2ea25 feat: add tenant CRD to e2e artifact collection and debug report (opendatahub-io#787)
1b8f212 chore: restrict rbac for db secret (opendatahub-io#779)
e746008 docs: add/update documentation for Maas Tenant (opendatahub-io#773)
147eaa2 fix: per-model(s) top-level values in usage dashboard (opendatahub-io#772)
b327b34 feat: add OIDC token support for model discovery via /v1/models (opendatahub-io#703)
dbf6d03 fix: validate token rate limits and skip invalid subs in TRLP aggregation (opendatahub-io#752)
89fba29 chore: promote main to stable (opendatahub-io#770)
fae753e chore: add .worktrees/ to .gitignore (opendatahub-io#774)
c01dc5b fix: minor updates for external model (opendatahub-io#771)
65ca551 fix: add explicit command to v0.8.2 simulator models to prevent bash … (opendatahub-io#765)
```
…r digest to 7d4e475 (opendatahub-io#384)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
@chaitanya1731 chaitanya1731 merged commit 703a624 into red-hat-data-services:rhoai-3.4 Apr 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants