Skip to content

fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#159

Open
tomerqodo wants to merge 6 commits into
qodo_action_req_1_base_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr5from
qodo_action_req_1_head_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr5
Open

fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#159
tomerqodo wants to merge 6 commits into
qodo_action_req_1_base_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr5from
qodo_action_req_1_head_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr5

Conversation

@tomerqodo
Copy link
Copy Markdown

Benchmark PR from agentic-review-benchmarks#5

zzstoatzz and others added 6 commits January 25, 2026 12:10
When storage_block_id is null (e.g., context serialized to Ray workers),
create_result_record() adds bucket_folder to storage_key via _resolve_path.
Then write_path() calls _resolve_path again, causing double-nested paths
like "results/results/abc123" instead of "results/abc123".

Add duplicate-prefix check to _resolve_path, matching the existing check
in _join_bucket_folder.

Fixes PrefectHQ#20174

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- fix .prefectignore content in test fixture (remove leading whitespace)
- add filterwarnings to pyproject.toml to suppress upstream deprecation warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
remove extra disable_logging fixture and temporary_settings wrapper
to match pattern used by prefect-aws and other integrations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
fixes ephemeral server startup timeouts on Python 3.11+ with xdist workers
see PrefectHQ#16397

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

1. __future__ annotations import missing 📘 Rule violation ✓ Correctness
Description
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py contains type annotations but does not
  include from __future__ import annotations as the first import statement.
• This violates the requirement for src/ Python files with type hints and can cause
  forward-reference issues and slower type checking.
Code

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[R7-13]

from pathlib import Path, PurePosixPath
from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union

-from pydantic import Field, field_validator
+from pydantic import Field, field_validator, model_validator

from prefect import task
from prefect.blocks.abstract import ObjectStorageBlock
Evidence
PR Compliance ID 7 requires from __future__ import annotations as the first import in any src/
Python file using type annotations. The file shows imports starting immediately after the module
docstring with no future import, and later defines annotated callables.

AGENTS.md
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[1-16]
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[35-43]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` uses type annotations but is missing `from __future__ import annotations` as the first import statement, which is required for `src/` Python files with type hints.

## Issue Context
The file defines multiple annotated functions/classes/fields. The compliance rule requires `from __future__ import annotations` to be the first import (typically placed immediately after the module docstring).

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[1-16]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. _bucket_folder_suffix cross-field validation 📘 Rule violation ✓ Correctness
Description
_bucket_folder_suffix performs cross-field validation by reading info.data.get("bucket")
  inside a @field_validator, coupling validation to partially-validated state.
• This violates the requirement that cross-field validation must be implemented with
  @model_validator to ensure correct ordering and consistent access to validated fields.
Code

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[R698-712]

    @field_validator("bucket_folder")
    @classmethod
-    def _bucket_folder_suffix(cls, value):
+    def _bucket_folder_suffix(cls, value, info):
        """
        Ensures that the bucket folder is suffixed with a forward slash.
+        Also validates that bucket_folder doesn't conflict with bucket name.
        """
        if value != "" and not value.endswith("/"):
            value = f"{value}/"
+
+        # Cross-field validation: ensure bucket_folder doesn't match bucket name
+        # This should use @model_validator but incorrectly uses @field_validator
+        if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"):
+            raise ValueError("bucket_folder cannot be the same as bucket name")
+
Evidence
PR Compliance ID 14 requires validations that depend on multiple fields to use @model_validator.
The code compares bucket_folder against the bucket field inside a @field_validator using
info.data, which is explicitly cross-field validation in the wrong validator type.

AGENTS.md
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[698-712]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Cross-field validation (comparing `bucket_folder` and `bucket`) is implemented inside a `@field_validator` via `info.data`, but cross-field validation must use `@model_validator`.

## Issue Context
This validation depends on two fields and should run after the model is assembled/validated to avoid order-dependent behavior.

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[698-713]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Broken test harness kwarg 🐞 Bug ✓ Correctness
Description
• prefect-gcp tests call prefect_test_harness(server_timeout=60) but the context manager only
  accepts server_startup_timeout, so the session autouse fixture will raise TypeError before any
  tests run.
• This is a CI-blocking issue for the prefect-gcp integration test suite.
Code

src/integrations/prefect-gcp/tests/conftest.py[R16-20]

+    # Increase timeout for CI environments where multiple xdist workers
+    # start servers simultaneously, which can be slower on Python 3.11+
+    # See https://github.com/PrefectHQ/prefect/issues/16397
+    with prefect_test_harness(server_timeout=60):
        yield
Evidence
The prefect-gcp test suite passes an unsupported keyword argument server_timeout. The actual
helper defines server_startup_timeout as its only parameter name, so Python will reject the call
immediately.

src/integrations/prefect-gcp/tests/conftest.py[14-20]
src/prefect/testing/utilities.py[113-166]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`prefect_test_harness` is called with an unsupported keyword argument `server_timeout`, which will raise `TypeError` and prevent the test suite from running.

### Issue Context
`prefect_test_harness` accepts `server_startup_timeout`.

### Fix Focus Areas
- src/integrations/prefect-gcp/tests/conftest.py[14-20]

### Suggested change
Use:
- `with prefect_test_harness(server_startup_timeout=60):` (preferred for clarity)

(or pass `60` positionally).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. _resolve_path uses substring match 🐞 Bug ✓ Correctness
Description
_resolve_path returns early when bucket_folder appears anywhere in path (substring match),
  not only when path is already prefixed.
• This can skip prefixing and write/read outside the configured bucket_folder (e.g.
  bucket_folder='results/', path='my/results/abc').
• The same class already implements the correct guard using startswith in _join_bucket_folder,
  indicating the intended semantics.
Code

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[R734-735]

+        if self.bucket_folder and self.bucket_folder in path:
+            return path
Evidence
The new early-return guard is based on substring containment. That is weaker than prefix checking
and can misclassify non-prefixed paths as already-prefixed. Prefect’s result-storage code calls
_resolve_path in scenarios where the key can already be prefixed, so the guard must be precise.
The same file’s _join_bucket_folder shows the correct startswith approach for avoiding
double-joins.

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[726-738]
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[919-936]
src/prefect/results.py[649-675]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`GcsBucket._resolve_path` currently checks `self.bucket_folder in path` to detect already-prefixed paths. This is a substring match and can incorrectly skip prefixing when `bucket_folder` appears later in the key.

### Issue Context
Prefect result persistence can call `_resolve_path` multiple times when `storage_block_id` is null; we want to avoid double-prefixing without skipping legitimate prefixing.

### Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[726-743]
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[919-940]

### Suggested change
Replace substring check with a prefix check, e.g.:
- `if self.bucket_folder and path.startswith(self.bucket_folder): return path`

Optionally normalize to avoid double slashes (e.g., ensure `bucket_folder` ends with exactly one `/` and don’t introduce `//` when joining).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



ⓘ The new review experience is currently in Beta. Learn more

Qodo Logo

Comment on lines 7 to 13
from pathlib import Path, PurePosixPath
from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union

from pydantic import Field, field_validator
from pydantic import Field, field_validator, model_validator

from prefect import task
from prefect.blocks.abstract import ObjectStorageBlock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. future annotations import missing 📘 Rule violation ✓ Correctness

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py contains type annotations but does not
  include from __future__ import annotations as the first import statement.
• This violates the requirement for src/ Python files with type hints and can cause
  forward-reference issues and slower type checking.
Agent prompt
## Issue description
`src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` uses type annotations but is missing `from __future__ import annotations` as the first import statement, which is required for `src/` Python files with type hints.

## Issue Context
The file defines multiple annotated functions/classes/fields. The compliance rule requires `from __future__ import annotations` to be the first import (typically placed immediately after the module docstring).

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[1-16]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines 698 to +712
@field_validator("bucket_folder")
@classmethod
def _bucket_folder_suffix(cls, value):
def _bucket_folder_suffix(cls, value, info):
"""
Ensures that the bucket folder is suffixed with a forward slash.
Also validates that bucket_folder doesn't conflict with bucket name.
"""
if value != "" and not value.endswith("/"):
value = f"{value}/"

# Cross-field validation: ensure bucket_folder doesn't match bucket name
# This should use @model_validator but incorrectly uses @field_validator
if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"):
raise ValueError("bucket_folder cannot be the same as bucket name")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. _bucket_folder_suffix cross-field validation 📘 Rule violation ✓ Correctness

_bucket_folder_suffix performs cross-field validation by reading info.data.get("bucket")
  inside a @field_validator, coupling validation to partially-validated state.
• This violates the requirement that cross-field validation must be implemented with
  @model_validator to ensure correct ordering and consistent access to validated fields.
Agent prompt
## Issue description
Cross-field validation (comparing `bucket_folder` and `bucket`) is implemented inside a `@field_validator` via `info.data`, but cross-field validation must use `@model_validator`.

## Issue Context
This validation depends on two fields and should run after the model is assembled/validated to avoid order-dependent behavior.

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[698-713]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +16 to 20
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_timeout=60):
yield
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Broken test harness kwarg 🐞 Bug ✓ Correctness

• prefect-gcp tests call prefect_test_harness(server_timeout=60) but the context manager only
  accepts server_startup_timeout, so the session autouse fixture will raise TypeError before any
  tests run.
• This is a CI-blocking issue for the prefect-gcp integration test suite.
Agent prompt
### Issue description
`prefect_test_harness` is called with an unsupported keyword argument `server_timeout`, which will raise `TypeError` and prevent the test suite from running.

### Issue Context
`prefect_test_harness` accepts `server_startup_timeout`.

### Fix Focus Areas
- src/integrations/prefect-gcp/tests/conftest.py[14-20]

### Suggested change
Use:
- `with prefect_test_harness(server_startup_timeout=60):` (preferred for clarity)

(or pass `60` positionally).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +734 to +735
if self.bucket_folder and self.bucket_folder in path:
return path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. _resolve_path uses substring match 🐞 Bug ✓ Correctness

_resolve_path returns early when bucket_folder appears anywhere in path (substring match),
  not only when path is already prefixed.
• This can skip prefixing and write/read outside the configured bucket_folder (e.g.
  bucket_folder='results/', path='my/results/abc').
• The same class already implements the correct guard using startswith in _join_bucket_folder,
  indicating the intended semantics.
Agent prompt
### Issue description
`GcsBucket._resolve_path` currently checks `self.bucket_folder in path` to detect already-prefixed paths. This is a substring match and can incorrectly skip prefixing when `bucket_folder` appears later in the key.

### Issue Context
Prefect result persistence can call `_resolve_path` multiple times when `storage_block_id` is null; we want to avoid double-prefixing without skipping legitimate prefixing.

### Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[726-743]
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[919-940]

### Suggested change
Replace substring check with a prefix check, e.g.:
- `if self.bucket_folder and path.startswith(self.bucket_folder): return path`

Optionally normalize to avoid double slashes (e.g., ensure `bucket_folder` ends with exactly one `/` and don’t introduce `//` when joining).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants