fix: enable positive_data_acceptance fuzz checks via spec tightening and test hooks by lugi0 · Pull Request #1720 · opendatahub-io/model-registry

lugi0 · 2026-04-24T11:02:21Z

Summary

Resolves RHOAIENG-58824: Schemathesis's positive_data_acceptance check generates schema-conformant requests and expects 2xx responses, but 53 tests fail with RejectedPositiveData. This PR closes the spec-server contract gaps and adds fuzz test hooks for server-side limitations that cannot be expressed in OpenAPI 3.0.

Result: 53 → 0 stateless fuzz test failures.

Root causes found and addressed

Spec tightening (37 failures)

Root cause	Failures	Fix
Path parameter IDs defined as `type: string` with no constraints; server requires numeric int32	19	Added `format: int64`, `pattern: "^[1-9][0-9]{0,8}$"` to all ID path params
`pageSize` query param defined as `type: string`; server parses as int32	8	Changed to `type: integer, format: int32, minimum: 1, maximum: 2147483647`
Singleton GET endpoints mark all params optional; server requires `externalId` OR `name+parentResourceId`	6	Schemathesis hook enforces valid param combinations
`filterQuery` allows any string; server lexer only accepts ASCII grammar	3	Added `pattern: "^[\x20-\x7E]*$"`
`nextPageToken` has no format constraint; server expects base64	1	Improved description (opaque cursor)

Spec + hook fixes (14 failures)

Root cause	Failures	Fix
Required string fields accept empty string `""`; server's `IsZeroValue()` treats it as missing	10	Added `minLength: 1` to required string fields in Create schemas
Catalog server missing validation middleware for null bytes	3	Added `middleware.ValidationMiddleware()` to `catalog/cmd/catalog.go`
Artifact `oneOf` discriminator not enforced	1	Hook randomizes across valid artifact types for POST; ensures required fields per type

Additional issues found during re-verification

Issue	Fix
`MetadataIntValue` spec says `format: int64` but server uses int32 (`StringToInt32`)	Changed to `format: int32`, `pattern: "^-?[0-9]{1,9}$"`
MetadataValue types allow fuzz-generated extra properties	Added `additionalProperties: false` to all 6 MetadataValue types
`MetadataProtoValue` not supported by EmbedMD converter (no case in switch)	Hook replaces ProtoValue with StringValue
`MetadataStructValue` base64 round-trip bug — write decodes, read doesn't re-encode; PATCHing resources with stored StructValues crashes	Hook replaces StructValue with StringValue (see server bug section for proposed fix)
Non-ASCII/surrogate characters in request bodies break Go JSON parser	Hook strips null bytes and surrogates via `encode("utf-8", errors="ignore")`
Non-ASCII characters in `customProperties` keys	Hook sanitizes keys to ASCII-only
Backslash/quotes in `name`/`externalId` params break server's internal filter query construction	Hook strips `\`, `'`, `"` from these params
Go strict JSON decoder rejects extra properties not in struct	Per-path property whitelist derived from spec, applied in `map_body` hook
`metadataType` field used `example` instead of `enum`	Changed to single-value `enum` for proper discriminator enforcement
Generated spec files (catalog.yaml, model-registry.yaml) were stale	Properly regenerated via `scripts/merge_openapi.sh`
Metric `step` is int64 in Go struct but `timestamp` is string — spec says both are `type: string, format: int64`	Hook sets `step` as integer, `timestamp` as string
Artifact `artifactType` is immutable but included in update schema without `readOnly`	Hook GETs existing artifact to match type on PATCH; randomizes type on POST for coverage
Schemathesis ignores `minLength`/`format` inherited through `allOf`	Hook replaces empty `name` with random value; validates numeric-string fields (IDs, timestamps)
`BaseResource.name` missing `minLength: 1`	Added at source so all Create schemas inherit it

`make test-fuzz` reliability fixes

Issue	Fix
Port-forwards start in background with no readiness check; tests fail on connection refused	Added curl retry loops (30s timeout) for model-registry (`:8080`), minio (`:9000`), local registry (`:5001`)
`$STATUS` variable never set in `test-fuzz`; target always exits 0	Combined shell commands to properly capture and propagate exit codes from both stateless and stateful runs
Prior fuzz runs leave corrupted `customProperties` in DB; PATCH tests fail loading existing resources	Added `scripts/cleanup.sh` before both stateless and stateful test runs
No stateful fuzz tests for the catalog API	Added `test_catalog_stateful.py` and included in `make test-fuzz`
Stateful test fails non-deterministically with `Unsatisfiable` after thousands of successful steps	Added `@pytest.mark.flaky(reruns=2)` — caused by tight spec constraints + allOf making data generation borderline

Design decisions

Per-path property whitelist vs additionalProperties: false: The Go server uses strict JSON decoding (DisallowUnknownFields), rejecting any property not in the struct. OpenAPI 3.0's allOf + additionalProperties: false is fundamentally broken — it evaluates per-subschema, so properties valid in the composite are rejected as "additional" in the base. Instead of fighting the spec, the map_body hook maintains a per-path whitelist (_PATH_PROPERTIES) derived from the spec, stripping fuzz-generated extra properties before they reach the server.

Hooks vs spec changes: Some server behaviors cannot be expressed in OpenAPI 3.0 (parameter dependencies, discriminator enforcement, strict decoding). These are handled via Schemathesis hooks in conftest.py rather than incorrect spec annotations.

⚠️ ID format: int64 in spec vs int32 in server — needs discussion

This PR annotates path parameter IDs with format: int64 and pattern: "^[1-9][0-9]{0,8}$". The pattern safely constrains values to the int32 range (max 999,999,999 < 2,147,483,647), so no functional issues arise. However, the server actually validates all path IDs as int32, not int64:

internal/apiutils/api_utils.go:42 — ValidateIDAsInt32() calls strconv.ParseInt(id, 10, 32).
Every path ID handler (registered_model.go, model_version.go, experiment.go, inference_service.go, serving_environment.go, artifact.go, serve_model.go, experiment_run.go) uses ValidateIDAsInt32, never ValidateIDAsInt64.
Meanwhile, the response body BaseResource.id at common.yaml:98 is already declared as format: int64 — this pre-dates this PR and is the existing convention.

We chose format: int64 for path params to stay consistent with the existing response body id field. But this creates a documented-vs-actual gap:

Layer	Declared format	Actual behavior
Response body `BaseResource.id`	`int64` (pre-existing)	Server generates int32-range IDs
Path parameters (this PR)	`int64` (matches response body)	Server validates as int32
`MetadataIntValue.int_value` (this PR)	`int32` (changed from int64)	Server uses `StringToInt32()`

Options for follow-up discussion:

Keep int64 everywhere (current state) — consistent spec, pattern constrains to safe range, but spec claims to support larger IDs than the server accepts
Change path params to int32 — accurate to server, but inconsistent with response body id which still says int64
Change everything to int32 — accurate to server across the board, but this is a larger API contract change that could affect generated clients
Fix the server to use int64 — the most forward-looking fix; the database (MySQL/Postgres) likely supports int64 already, and ValidateIDAsInt32 could be changed to ValidateIDAsInt64

Note: ValidateIDAsInt64 already exists in api_utils.go:66 — it's just never called for path parameters.

Server bugs identified (not fixed, worked around)

These are documented for separate follow-up:

MetadataProtoValue not supported — EmbedMD converter (openapi_embedmd_converter_util.go:42-78) has no case for ProtoValue in its switch statement
MetadataStructValue base64 round-trip bug — the write converter (openapi_embedmd_converter_util.go:61) base64-decodes struct_value, but the read converter (embdemd_openapi_converter_util.go:47) returns raw JSON without re-encoding. Any resource with a StructValue becomes un-PATCHable because the server re-processes stored customProperties through the write converter on PATCH. Proposed one-line fix: change line 47 from NewMetadataStructValue(string(asJSON)) to NewMetadataStructValue(base64.StdEncoding.EncodeToString(asJSON)) — identical to what the ByteValue read path already does at line 60-61
name/externalId filter injection — server constructs internal filter queries like name = '<value>' without escaping; backslash/quotes break the participle lexer
Non-ASCII in customProperties keys — Go JSON decoder fails on surrogate pairs and non-UTF-8 characters in property key names
MetadataIntValue int32/int64 mismatch — spec said int64, server uses StringToInt32() (fixed in this PR for int_value; see also ID format discussion above)
All path ID validation uses int32 (ValidateIDAsInt32) despite spec declaring int64 (see section above)
Metric step vs timestamp type inconsistency — both are type: string, format: int64 in spec, but Go struct has step as int64 (JSON number) and timestamp as string (JSON string)
Artifact artifactType immutable but in update schema — server rejects type changes on PATCH, and infers "unknown" when field is omitted; update schema should either exclude artifactType or mark it readOnly

Test plan

make openapi/validate — both specs pass
Stateless fuzz tests: 86 passed, 0 SUBFAILED, 63 subtests passed
Stateful fuzz test (model-registry): passes; @flaky(reruns=2) for occasional Unsatisfiable (see known limitation below)
Stateful fuzz test (catalog): added — test_catalog_stateful.py
make test-fuzz from scratch (kind cluster creation → deploy → test): no timing failures
Reviewer ran cd clients/python && make test-fuzz and stateless/stateful tests directly — confirmed green

⚠️ Known limitation: stateful test `Unsatisfiable` — needs further investigation

The model-registry stateful test (test_mr_api_stateful) can non-deterministically fail with hypothesis.errors.Unsatisfiable even after thousands of successful API calls. This happens when Hypothesis's state machine exhausts 1000 attempts to find a valid next transition. The failure is mitigated with @pytest.mark.flaky(reruns=2) but can still occur.

What we know:

The failure is non-deterministic — depends on the random seed. Some seeds produce runs that complete successfully; others reach a state where all generated transitions are filtered out.
It occurs AFTER extensive successful exploration (often 5000+ successful API calls), not at startup.
It is NOT caused by server errors — the state machine simply can't generate valid data for its next step.
It became more likely after tightening the OpenAPI spec constraints (patterns, minLength, format), because Schemathesis struggles to generate conformant data through allOf composition.

Root cause hypothesis:
Schemathesis does not fully enforce constraints inherited through allOf during data generation. When the spec has tight constraints (e.g., pattern: "^[1-9][0-9]{0,8}$" on IDs, minLength: 1 on names), Hypothesis generates many invalid values that get filtered, eventually hitting the 1000-filter hard limit. The hooks fix values AFTER generation, but Hypothesis's internal filtering happens BEFORE the hooks run.

Possible directions for investigation:

Schemathesis before_generate_body hook — could constrain generation strategies at the source rather than fixing values after the fact, reducing the filter rejection rate
Custom Hypothesis strategies — register custom strategies for string fields with format/pattern constraints so Hypothesis generates valid values directly
Increase Hypothesis filter tolerance — the 1000-attempt Unsatisfiable threshold is hardcoded in Hypothesis; a custom wrapper could retry with a different seed
Reduce spec constraint strictness — loosen some patterns (e.g., "^[0-9]+$" instead of "^[1-9][0-9]{0,8}$") to give Hypothesis more room, at the cost of less precise spec documentation

`--hypothesis-show-statistics` verbosity

The make test-fuzz target passes --hypothesis-show-statistics to both stateless and stateful pytest runs. This produces extensive per-test statistics output that is useful for debugging Hypothesis generation issues but noisy for routine runs. Consider removing it from the default target and keeping it as an opt-in flag (e.g., make test-fuzz STATS=1).

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added request validation middleware to the API server.
Bug Fixes
- Enhanced API robustness with stricter input validation for resource identifiers and query parameters.
- Improved deployment reliability with automatic health checks for service readiness.
- Strengthened test framework with improved data sanitization and validation.

…cceptance checks Resolves RHOAIENG-58824: Schemathesis API rejects valid requests that conform to the OpenAPI spec. Reduces stateless fuzz test failures from 53 to 0 by closing spec-server contract gaps and adding test hooks for server-side limitations that cannot be expressed in OpenAPI 3.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…verage Addresses issues found during stateful fuzz test stabilization: - Replace MetadataStructValue with StringValue in hooks (server base64 round-trip bug makes resources with StructValue un-PATCHable) - Randomize artifact types on POST for coverage instead of hardcoding doc-artifact; ensure per-type required fields and correct value types - GET existing artifact on PATCH to match immutable artifactType - Validate numeric-string fields (IDs, timestamps) that Schemathesis fills with arbitrary Unicode despite format constraints in allOf - Replace empty name with random value (Schemathesis ignores minLength through allOf); add minLength: 1 to BaseResource.name at source - Add catalog stateful test and include in make test-fuzz - Add @pytest.mark.flaky(reruns=2) for non-deterministic Unsatisfiable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-ci · 2026-04-24T11:02:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lugi0
Once this PR has been reviewed and has the lgtm label, please assign rareddy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-24T11:02:33Z

📝 Walkthrough

Walkthrough

This pull request tightens OpenAPI schema validation across three catalog and model-registry API specifications by enforcing stricter constraints on resource identifiers (minLength, readOnly, numeric patterns), metadata value objects (additionalProperties: false, enum-constrained metadataType), and query parameters (ASCII patterns, numeric formats). A ValidationMiddleware is integrated into the catalog HTTP server handler. Python fuzz tests are expanded with Schemathesis hooks to sanitize inputs (null bytes, invalid properties, type corrections) and a new stateful test suite for the catalog API is introduced. Test configuration enables positive data acceptance checks and deployment scripts add health-check polling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Security & Quality Observations

OpenAPI Schema Hardening

Input pattern constraints on filter queries ([\x20-\x7E] printable ASCII) and numeric ID fields (^[1-9][0-9]{0,8}$) reduce parser/injection surface area. Valid approach for specification-level defense.
additionalProperties: false on metadata schemas prevents schema-drift attacks; validates discriminator consistency via enum on metadataType.
readOnly: true on server-generated id fields correctly prevents client tampering at the spec level.

ValidationMiddleware Integration

Middleware addition in catalog/cmd/catalog.go (line 2/-1) is minimal and correct, but confirm the middleware actually validates against the schema—wrapping a handler doesn't enforce validation without explicit OpenAPI validator involvement.

Fuzz Test Conftest.py (250 new lines)

Sanitization logic is shallow: map_body recursively cleans null bytes but doesn't validate against a schema's actual type constraints before sending—for example, coercing numeric strings or rewriting invalid enums masks deeper API contract failures.
Hook function correctness: map_query removes tokens like "nextPageToken" from unsupported endpoints but the token removal logic is hardcoded and fragile; any OpenAPI path structure change could silently skip validation.
Non-determinism in test data: Use of secrets module for random name generation in fuzz tests improves coverage but should seed a deterministic random state for reproducibility (CWE-330).
Property-set restriction: Logic restricting properties to spec-allowed sets is correct but doesn't validate cardinality or required-field presence—misses catching API mutations that drop mandatory fields.

Stateful Fuzz Test

Configuration disables Hypothesis health checks for too_much_data, slow_runs, and large_data via settings.register_profile(...). This suppresses data-size and performance warnings; acceptable for stress testing but risks masking slow/memory-bound bugs.

Flaky Test Marker

@pytest.mark.flaky(reruns=2) on test_mr_api_stateful masks intermittency instead of fixing root cause. Acceptable as temporary mitigation but flag for root-cause investigation.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately summarizes the main change: enabling positive_data_acceptance fuzz checks through OpenAPI spec tightening and test hooks.
Description check	✅ Passed	Description comprehensively covers root causes (37 spec tightening, 14 spec+hook fixes), design decisions, server bugs, test plan results, and known limitations. Follows template with adequate detail.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

api/openapi/model-registry.yaml (1)
2070-2073: experimentId has both pattern and minLength constraints.

The pattern ^[1-9][0-9]{0,8}$ already ensures at least one character, making minLength: 1 redundant. Not harmful, but adds unnecessary validation overhead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/openapi/model-registry.yaml` around lines 2070 - 2073, The experimentId
property in model-registry.yaml redundantly includes minLength: 1 while its
regex pattern ^[1-9][0-9]{0,8}$ already enforces at least one digit; remove the
minLength: 1 entry from the experimentId schema to avoid duplicate validation
rules (locate the experimentId field in the OpenAPI model-registry.yaml and
delete the minLength line).
clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py (1)
5-6: Cross-module import couples catalog tests to model_registry test module.

Importing call_and_validate_with_null_byte_handling from tests.fuzz_api.model_registry.test_mr_stateless creates a dependency that can cause confusing failures if the model_registry module changes. Consider moving shared helpers to a common module like tests.fuzz_api.helpers or conftest.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py` around
lines 5 - 6, The test imports the helper
call_and_validate_with_null_byte_handling directly from
tests.fuzz_api.model_registry.test_mr_stateless which couples catalog tests to
the model_registry test module; extract that helper into a shared location
(e.g., create tests.fuzz_api.helpers or add it to conftest.py) and update the
import in clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py
to import call_and_validate_with_null_byte_handling from the new shared module;
also update any other tests that used the old import to reference the shared
helper so tests no longer depend on test_mr_stateless.
clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py (1)
8-8: Flaky reruns mask root cause of Hypothesis Unsatisfiable failures.

While @pytest.mark.flaky(reruns=2) mitigates CI noise, Unsatisfiable typically indicates over-constrained generation or conflicting schema rules. Track whether these failures persist and consider adjusting Hypothesis settings or schema hooks if they do.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py` at line 8,
The flaky decorator (`@pytest.mark.flaky`) is hiding Hypothesis Unsatisfiable
failures; remove that decorator from the test(s) in
clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py and instead
surface and debug Hypothesis failures by applying explicit Hypothesis settings
on the test function(s) (e.g. add
`@hypothesis.settings`(max_examples=<reasonable>, deadline=None,
suppress_health_check=[...]) or adjust strategies/assume calls) so the
Unsatisfiable error is reported and you can relax conflicting schema constraints
or generation limits accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/openapi/catalog.yaml`:
- Around line 2360-2362: The OpenAPI parameter currently declares type: string
and format: int64 but uses pattern: "^[1-9][0-9]{0,8}$", which incorrectly
limits IDs to 9 digits; update the pattern to match 64-bit signed integer widths
(for example use "^[1-9][0-9]{0,18}$") or remove the pattern entirely so it
aligns with BaseResource.id's int64 representation, and make the same change for
the other occurrences noted (the pattern at the other block around lines
2394-2396).
- Around line 1752-1755: The schema for int_value currently advertises int32 but
uses a restrictive pattern ("^-?[0-9]{1,9}$") that rejects valid int32 numbers;
update the schema for int_value to either remove the string pattern and use a
numeric representation (type: integer, format: int32) or, if you must keep it as
string, replace the pattern with one that matches the full signed 32-bit range
(e.g., a pattern that allows -2147483648 through 2147483647) so values like
1000000000 and -2147483648 validate correctly; locate the int_value entry and
change the type/format or widen the pattern accordingly.
- Around line 2405-2409: The OpenAPI schema for parameter "pageSize" currently
allows up to 2147483647 which effectively permits abusive large requests; change
the "maximum" value for the pageSize schema to the server's actual hard cap (the
real maximum entries your list handlers will accept) so the API contract rejects
oversized values at the edge, and update the schema description for pageSize to
document the enforced server maximum and any clamping behavior; make these edits
where "pageSize" is defined in api/openapi/catalog.yaml (schema type
integer/format int32).

In `@api/openapi/src/model-registry.yaml`:
- Around line 149-151: The OpenAPI schema declares fields as type: string with
format: int64 but applies a pattern "^[1-9][0-9]{0,8}$" which limits IDs to
1..999,999,999 and contradicts the int64 claim; fix by choosing one of two
consistent options: (A) if IDs must support full int64, remove the restrictive
pattern (or replace it with a pattern that allows the full signed 64-bit range)
and keep format: int64, or (B) if IDs are intentionally capped, change format:
int64 to format: string (or remove int64) and update the pattern and description
to document the actual max allowed value; apply the chosen fix wherever the same
trio appears (the block containing "type: string", "format: int64", "pattern:
\"^[1-9][0-9]{0,8}$\"" and the other occurrences called out in the comment).

In `@catalog/cmd/catalog.go`:
- Around line 199-202: The current ValidationMiddleware only filters null bytes
and does not enforce OpenAPI schema constraints, so requests routed via
openapi.NewRouter(ctrl, mcpCtrl) (e.g., handlers like GetModel) can receive
unvalidated input; update the server setup to wrap the router with a full
OpenAPI request validator (for example oapi-codegen's middleware or
kin-openapi’s router/validation middleware) instead of or in addition to
ValidationMiddleware so that formats, regexes, min/max lengths and enums are
enforced at runtime; specifically, replace or augment the Handler passed to the
http.Server (where server := &http.Server{ Handler:
middleware.ValidationMiddleware(openapi.NewRouter(...)) }) with a validator
middleware that loads the compiled OpenAPI spec and validates incoming requests
before they reach GetModel and other handlers, or alternatively extend
middleware.ValidationMiddleware to perform full schema validation and plug that
in.

In `@clients/python/tests/fuzz_api/conftest.py`:
- Around line 151-154: The PATCH handling in map_case() calls
_get_artifact_type() without the same request kwargs (Authorization header and
verify_ssl) used elsewhere, so the probe GET can 401 or fail TLS and map_case()
will incorrectly overwrite artifactType; update the call sites in map_case()
(and the similar block at lines ~185-193) to pass the same request kwargs (auth
headers, verify_ssl) through to _get_artifact_type(), or alternatively
short-circuit and do not mutate case.body["artifactType"] when the lookup
raises/auth-fails/unverifiable TLS (i.e., treat a failed probe as "unknown" and
leave the original artifactType intact). Ensure the unique symbols
_get_artifact_type and map_case are updated accordingly and add a small check
for probe failure to avoid silent fallback to fuzzed values.
- Around line 68-72: Replace the brittle int()-based validation for
numeric-string ID fields with a regex check matching the spec ^[1-9][0-9]{0,8}$:
add "import re" with the other imports, use re.fullmatch(r'^[1-9][0-9]{0,8}$',
value) instead of int(value) tests when validating fields listed in
_NUMERIC_STRING_FIELDS, and update any fallback random generation (e.g.,
randbelow(999999999)) to produce only 1..999999999 (for example
randbelow(999999999) + 1) so generated IDs never produce "0" or other
schema-invalid forms; apply the same change to the other occurrence around lines
91-96.

---

Nitpick comments:
In `@api/openapi/model-registry.yaml`:
- Around line 2070-2073: The experimentId property in model-registry.yaml
redundantly includes minLength: 1 while its regex pattern ^[1-9][0-9]{0,8}$
already enforces at least one digit; remove the minLength: 1 entry from the
experimentId schema to avoid duplicate validation rules (locate the experimentId
field in the OpenAPI model-registry.yaml and delete the minLength line).

In `@clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py`:
- Around line 5-6: The test imports the helper
call_and_validate_with_null_byte_handling directly from
tests.fuzz_api.model_registry.test_mr_stateless which couples catalog tests to
the model_registry test module; extract that helper into a shared location
(e.g., create tests.fuzz_api.helpers or add it to conftest.py) and update the
import in clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py
to import call_and_validate_with_null_byte_handling from the new shared module;
also update any other tests that used the old import to reference the shared
helper so tests no longer depend on test_mr_stateless.

In `@clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py`:
- Line 8: The flaky decorator (`@pytest.mark.flaky`) is hiding Hypothesis
Unsatisfiable failures; remove that decorator from the test(s) in
clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py and instead
surface and debug Hypothesis failures by applying explicit Hypothesis settings
on the test function(s) (e.g. add
`@hypothesis.settings`(max_examples=<reasonable>, deadline=None,
suppress_health_check=[...]) or adjust strategies/assume calls) so the
Unsatisfiable error is reported and you can relax conflicting schema constraints
or generation limits accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 546d8130-ecef-44c7-a4f8-358a6e29be10

📥 Commits

Reviewing files that changed from the base of the PR and between 35327a4 and bf0e61c.

📒 Files selected for processing (12)

api/openapi/catalog.yaml
api/openapi/model-registry.yaml
api/openapi/src/catalog.yaml
api/openapi/src/lib/common.yaml
api/openapi/src/model-registry.yaml
catalog/cmd/catalog.go
clients/python/Makefile
clients/python/schemathesis.toml
clients/python/tests/fuzz_api/conftest.py
clients/python/tests/fuzz_api/model_catalog/test_catalog_stateful.py
clients/python/tests/fuzz_api/model_catalog/test_catalog_stateless.py
clients/python/tests/fuzz_api/model_registry/test_mr_stateful.py

coderabbitai · 2026-04-24T11:08:43Z

        int_value:
-          format: int64
+          format: int32
          type: string
+          pattern: "^-?[0-9]{1,9}$"


⚠️ Potential issue | 🟠 Major

int32 constraint rejects valid int32 values.

^-?[0-9]{1,9}$ caps int_value at 9 digits, so values like 1000000000 and -2147483648 become schema-invalid even though this field is documented as int32. Either widen the constraint to the full signed int32 domain or stop advertising int32 here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/openapi/catalog.yaml` around lines 1752 - 1755, The schema for int_value currently advertises int32 but uses a restrictive pattern ("^-?[0-9]{1,9}$") that rejects valid int32 numbers; update the schema for int_value to either remove the string pattern and use a numeric representation (type: integer, format: int32) or, if you must keep it as string, replace the pattern with one that matches the full signed 32-bit range (e.g., a pattern that allows -2147483648 through 2147483647) so values like 1000000000 and -2147483648 validate correctly; locate the int_value entry and change the type/format or widen the pattern accordingly.

coderabbitai · 2026-04-24T11:08:43Z

        type: string
+        format: int64
+        pattern: "^[1-9][0-9]{0,8}$"


⚠️ Potential issue | 🟠 Major

ID regex narrows advertised int64 IDs to 9 digits.

Both parameters are marked format: int64, but the new regex only permits 1..999999999. Once IDs cross 9 digits, valid routes and parent filters will fail schema validation while BaseResource.id still advertises int64 strings. Align the regex with the real ID width everywhere, or downgrade the format consistently.

Also applies to: 2394-2396

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/openapi/catalog.yaml` around lines 2360 - 2362, The OpenAPI parameter currently declares type: string and format: int64 but uses pattern: "^[1-9][0-9]{0,8}$", which incorrectly limits IDs to 9 digits; update the pattern to match 64-bit signed integer widths (for example use "^[1-9][0-9]{0,18}$") or remove the pattern entirely so it aligns with BaseResource.id's int64 representation, and make the same change for the other occurrences noted (the pattern at the other block around lines 2394-2396).

coderabbitai · 2026-04-24T11:08:43Z

      schema:
-        type: string
+        type: integer
+        format: int32
+        minimum: 1
+        maximum: 2147483647


⚠️ Potential issue | 🟠 Major

Major: pageSize is effectively unbounded (CWE-770).

Exploit scenario: a caller can send pageSize=2147483647; if validation middleware accepts it, downstream list handlers must clamp it themselves or absorb the query/load. Set this to the server’s real hard cap so the contract blocks abusive values at the edge.

🔧 Remediation

schema: type: integer format: int32 minimum: 1 - maximum: 2147483647 + maximum: 1000 # replace with the handler's actual enforced cap

As per coding guidelines, REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

schema:

type: string

type: integer

format: int32

minimum: 1

maximum: 2147483647

schema:

type: integer

format: int32

minimum: 1

maximum: 1000 # replace with the handler's actual enforced cap

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/openapi/catalog.yaml` around lines 2405 - 2409, The OpenAPI schema for parameter "pageSize" currently allows up to 2147483647 which effectively permits abusive large requests; change the "maximum" value for the pageSize schema to the server's actual hard cap (the real maximum entries your list handlers will accept) so the API contract rejects oversized values at the edge, and update the schema description for pageSize to document the enforced server maximum and any clamping behavior; make these edits where "pageSize" is defined in api/openapi/catalog.yaml (schema type integer/format int32).

coderabbitai · 2026-04-24T11:08:43Z

          type: string
+          format: int64
+          pattern: "^[1-9][0-9]{0,8}$"


⚠️ Potential issue | 🟠 Major

The new ID regex contradicts the declared int64 range.

These fields are now documented as int64, but the regex only allows 1..999999999. That makes the contract self-contradictory: either these IDs are truly int64-like and the pattern is too narrow, or the format is overstating the allowed range. As written, any legitimate ID beyond 9 digits becomes undeclarable in the spec.

Patch

- pattern: "^[1-9][0-9]{0,8}$" + pattern: "^[1-9][0-9]{0,18}$"

If the server is intentionally capped below full int64, encode that real bound consistently and drop the misleading int64 claim.

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

Also applies to: 1771-1779, 2000-2009, 2164-2169, 2277-2282, 2407-2412

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/openapi/src/model-registry.yaml` around lines 149 - 151, The OpenAPI schema declares fields as type: string with format: int64 but applies a pattern "^[1-9][0-9]{0,8}$" which limits IDs to 1..999,999,999 and contradicts the int64 claim; fix by choosing one of two consistent options: (A) if IDs must support full int64, remove the restrictive pattern (or replace it with a pattern that allows the full signed 64-bit range) and keep format: int64, or (B) if IDs are intentionally capped, change format: int64 to format: string (or remove int64) and update the pattern and description to document the actual max allowed value; apply the chosen fix wherever the same trio appears (the block containing "type: string", "format: int64", "pattern: \"^[1-9][0-9]{0,8}$\"" and the other occurrences called out in the comment).

coderabbitai · 2026-04-24T11:08:43Z

 	server := &http.Server{
 		Addr:    catalogCfg.ListenAddress,
-		Handler: openapi.NewRouter(ctrl, mcpCtrl),
+		Handler: middleware.ValidationMiddleware(openapi.NewRouter(ctrl, mcpCtrl)),
 	}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if openapi.NewRouter or related code includes request validation rg -n "Validate|validate|Schema|schema" catalog/internal/server/openapi/

Repository: opendatahub-io/model-registry

Length of output: 525

🏁 Script executed:

#!/bin/bash # Find and read the ValidationMiddleware implementation find . -type f -name "*.go" | xargs grep -l "ValidationMiddleware" | head -5

Repository: opendatahub-io/model-registry

Length of output: 228

🏁 Script executed:

#!/bin/bash # Search for the middleware package and ValidationMiddleware definition rg -n "func.*ValidationMiddleware" --type go

Repository: opendatahub-io/model-registry

Length of output: 380

🏁 Script executed:

#!/bin/bash # Find openapi router implementation rg -n "func.*NewRouter" catalog/internal/server/openapi/ --type go

Repository: opendatahub-io/model-registry

Length of output: 166

🏁 Script executed:

#!/bin/bash # Read ValidationMiddleware implementation cat -n internal/server/middleware/validation.go

Repository: opendatahub-io/model-registry

Length of output: 3623

🏁 Script executed:

#!/bin/bash # Read NewRouter implementation cat -n catalog/internal/server/openapi/routers.go

Repository: opendatahub-io/model-registry

Length of output: 2027

🏁 Script executed:

#!/bin/bash # Read validation tests to understand what is being validated cat -n internal/server/middleware/validation_test.go

Repository: opendatahub-io/model-registry

Length of output: 7198

🏁 Script executed:

#!/bin/bash # Find OpenAPI service handler implementations to check if they validate schemas find catalog/internal/server/openapi -name "*.go" -not -name "*test.go" -not -name "routers.go" | head -10

Repository: opendatahub-io/model-registry

Length of output: 580

🏁 Script executed:

#!/bin/bash # Look for typical handler patterns with validation in the OpenAPI service rg -n "func.*Handler|func $.*$ " catalog/internal/server/openapi/api_model_catalog_service_service.go | head -20

Repository: opendatahub-io/model-registry

Length of output: 2221

🏁 Script executed:

#!/bin/bash # Read GetModel handler implementation sed -n '336,380p' catalog/internal/server/openapi/api_model_catalog_service_service.go

Repository: opendatahub-io/model-registry

Length of output: 1594

🏁 Script executed:

#!/bin/bash # Search for any validation of IDs or input parameters in handlers rg -A 5 "Validate|validate.*ID|validate.*[Pp]aram" catalog/internal/server/openapi/api_model_catalog_service_service.go | head -40

Repository: opendatahub-io/model-registry

Length of output: 253

OpenAPI schema validation missing; only null-byte filtering enforced by middleware.

ValidationMiddleware checks for null bytes but skips schema validation: format, regex patterns, minLength, maxLength, enums. Handlers (e.g., GetModel) receive unvalidated input and pass it directly to the provider. Implement request schema validation (e.g., via kin-openapi or oapi-codegen middleware) or document that OpenAPI spec constraints are not enforced server-side.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@catalog/cmd/catalog.go` around lines 199 - 202, The current ValidationMiddleware only filters null bytes and does not enforce OpenAPI schema constraints, so requests routed via openapi.NewRouter(ctrl, mcpCtrl) (e.g., handlers like GetModel) can receive unvalidated input; update the server setup to wrap the router with a full OpenAPI request validator (for example oapi-codegen's middleware or kin-openapi’s router/validation middleware) instead of or in addition to ValidationMiddleware so that formats, regexes, min/max lengths and enums are enforced at runtime; specifically, replace or augment the Handler passed to the http.Server (where server := &http.Server{ Handler: middleware.ValidationMiddleware(openapi.NewRouter(...)) }) with a validator middleware that loads the compiled OpenAPI spec and validates incoming requests before they reach GetModel and other handlers, or alternatively extend middleware.ValidationMiddleware to perform full schema validation and plug that in.

coderabbitai · 2026-04-24T11:08:43Z

+_NUMERIC_STRING_FIELDS = {
+    "registeredModelId", "servingEnvironmentId", "modelVersionId",
+    "experimentId", "experimentRunId", "startTimeSinceEpoch",
+    "endTimeSinceEpoch", "timestamp",
+}


⚠️ Potential issue | 🟠 Major

Validate numeric-string IDs against the actual regex, not int().

int() accepts values like "0", "-1", " 7", and "0001", which still violate the tightened ^[1-9][0-9]{0,8}$ contract in api/openapi/src/model-registry.yaml. The fallback also uses randbelow(999999999), so it can generate "0" and keep producing schema-invalid payloads.

Patch

+_POSITIVE_ID_RE = re.compile(r"^[1-9][0-9]{0,8}$") + for field in _NUMERIC_STRING_FIELDS: if field in body and isinstance(body[field], str): - try: - int(body[field]) - except ValueError: - body[field] = str(secrets.randbelow(999999999)) + if not _POSITIVE_ID_RE.fullmatch(body[field]): + body[field] = str(secrets.randbelow(999999999) + 1)

Add import re with the other imports.

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

Also applies to: 91-96

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/python/tests/fuzz_api/conftest.py` around lines 68 - 72, Replace the brittle int()-based validation for numeric-string ID fields with a regex check matching the spec ^[1-9][0-9]{0,8}$: add "import re" with the other imports, use re.fullmatch(r'^[1-9][0-9]{0,8}$', value) instead of int(value) tests when validating fields listed in _NUMERIC_STRING_FIELDS, and update any fallback random generation (e.g., randbelow(999999999)) to produce only 1..999999999 (for example randbelow(999999999) + 1) so generated IDs never produce "0" or other schema-invalid forms; apply the same change to the other occurrence around lines 91-96.

coderabbitai · 2026-04-24T11:08:43Z

+        elif method == "PATCH":
+            existing_type = _get_artifact_type(case)
+            if existing_type:
+                case.body["artifactType"] = existing_type


⚠️ Potential issue | 🟠 Major

PATCH artifact-type preservation is bypassed in authenticated or self-signed environments.

_get_artifact_type() does its probe GET without the same Authorization header or verify_ssl setting used everywhere else. When that lookup 401s or fails TLS verification, map_case() falls back to the fuzzed artifactType, which is exactly the immutability failure this hook is trying to avoid.

Thread the test request kwargs into this lookup, or stop mutating artifactType when the lookup cannot be performed reliably.

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

Also applies to: 185-193

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/python/tests/fuzz_api/conftest.py` around lines 151 - 154, The PATCH handling in map_case() calls _get_artifact_type() without the same request kwargs (Authorization header and verify_ssl) used elsewhere, so the probe GET can 401 or fail TLS and map_case() will incorrectly overwrite artifactType; update the call sites in map_case() (and the similar block at lines ~185-193) to pass the same request kwargs (auth headers, verify_ssl) through to _get_artifact_type(), or alternatively short-circuit and do not mutate case.body["artifactType"] when the lookup raises/auth-fails/unverifiable TLS (i.e., treat a failed probe as "unknown" and leave the original artifactType intact). Ensure the unique symbols _get_artifact_type and map_case are updated accordingly and add a small check for probe failure to avoid silent fallback to fuzzed values.

lugi0 and others added 2 commits April 24, 2026 10:00

openshift-ci Bot requested review from dbasunag and fege April 24, 2026 11:02

github-actions Bot added Area/MR Python client Area/Go REST server Area/Model Catalog labels Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

lugi0 closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable positive_data_acceptance fuzz checks via spec tightening and test hooks#1720

fix: enable positive_data_acceptance fuzz checks via spec tightening and test hooks#1720
lugi0 wants to merge 2 commits into
opendatahub-io:mainfrom
lugi0:fix/rhoaieng-58824-spec-tightening

lugi0 commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Security & Quality Observations

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lugi0 commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root causes found and addressed

Spec tightening (37 failures)

Spec + hook fixes (14 failures)

Additional issues found during re-verification

make test-fuzz reliability fixes

Design decisions

⚠️ ID format: int64 in spec vs int32 in server — needs discussion

Server bugs identified (not fixed, worked around)

Test plan

⚠️ Known limitation: stateful test Unsatisfiable — needs further investigation

--hypothesis-show-statistics verbosity

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-ci Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Security & Quality Observations

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lugi0 commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

`make test-fuzz` reliability fixes

⚠️ Known limitation: stateful test `Unsatisfiable` — needs further investigation

`--hypothesis-show-statistics` verbosity

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading