Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from pathlib import Path, PurePosixPath
from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union

from pydantic import Field, field_validator
from pydantic import Field, field_validator, model_validator

from prefect import task
from prefect.blocks.abstract import ObjectStorageBlock
Comment on lines 7 to 13
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. future annotations import missing 📘 Rule violation ✓ Correctness

src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py contains type annotations but does not
  include from __future__ import annotations as the first import statement.
• This violates the requirement for src/ Python files with type hints and can cause
  forward-reference issues and slower type checking.
Agent prompt
## Issue description
`src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` uses type annotations but is missing `from __future__ import annotations` as the first import statement, which is required for `src/` Python files with type hints.

## Issue Context
The file defines multiple annotated functions/classes/fields. The compliance rule requires `from __future__ import annotations` to be the first import (typically placed immediately after the module docstring).

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[1-16]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Expand Down Expand Up @@ -697,12 +697,19 @@ def basepath(self) -> str:

@field_validator("bucket_folder")
@classmethod
def _bucket_folder_suffix(cls, value):
def _bucket_folder_suffix(cls, value, info):
"""
Ensures that the bucket folder is suffixed with a forward slash.
Also validates that bucket_folder doesn't conflict with bucket name.
"""
if value != "" and not value.endswith("/"):
value = f"{value}/"

# Cross-field validation: ensure bucket_folder doesn't match bucket name
# This should use @model_validator but incorrectly uses @field_validator
if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"):
raise ValueError("bucket_folder cannot be the same as bucket name")

Comment on lines 698 to +712
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. _bucket_folder_suffix cross-field validation 📘 Rule violation ✓ Correctness

_bucket_folder_suffix performs cross-field validation by reading info.data.get("bucket")
  inside a @field_validator, coupling validation to partially-validated state.
• This violates the requirement that cross-field validation must be implemented with
  @model_validator to ensure correct ordering and consistent access to validated fields.
Agent prompt
## Issue description
Cross-field validation (comparing `bucket_folder` and `bucket`) is implemented inside a `@field_validator` via `info.data`, but cross-field validation must use `@model_validator`.

## Issue Context
This validation depends on two fields and should run after the model is assembled/validated to avoid order-dependent behavior.

## Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[698-713]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

return value

def _resolve_path(self, path: str) -> str:
Expand All @@ -718,6 +725,14 @@ def _resolve_path(self, path: str) -> str:
"""
# If bucket_folder provided, it means we won't write to the root dir of
# the bucket. So we need to add it on the front of the path.
#
# However, avoid double-nesting if path is already prefixed with bucket_folder.
# This can happen when storage_block_id is null (e.g., context serialized to
# remote workers), causing create_result_record() to add bucket_folder to
# storage_key, then write_path() calls _resolve_path again.
# See https://github.com/PrefectHQ/prefect/issues/20174
if self.bucket_folder and self.bucket_folder in path:
return path
Comment on lines +734 to +735
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. _resolve_path uses substring match 🐞 Bug ✓ Correctness

_resolve_path returns early when bucket_folder appears anywhere in path (substring match),
  not only when path is already prefixed.
• This can skip prefixing and write/read outside the configured bucket_folder (e.g.
  bucket_folder='results/', path='my/results/abc').
• The same class already implements the correct guard using startswith in _join_bucket_folder,
  indicating the intended semantics.
Agent prompt
### Issue description
`GcsBucket._resolve_path` currently checks `self.bucket_folder in path` to detect already-prefixed paths. This is a substring match and can incorrectly skip prefixing when `bucket_folder` appears later in the key.

### Issue Context
Prefect result persistence can call `_resolve_path` multiple times when `storage_block_id` is null; we want to avoid double-prefixing without skipping legitimate prefixing.

### Fix Focus Areas
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[726-743]
- src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py[919-940]

### Suggested change
Replace substring check with a prefix check, e.g.:
- `if self.bucket_folder and path.startswith(self.bucket_folder): return path`

Optionally normalize to avoid double slashes (e.g., ensure `bucket_folder` ends with exactly one `/` and don’t introduce `//` when joining).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

path = (
str(PurePosixPath(self.bucket_folder, path)) if self.bucket_folder else path
)
Expand Down
3 changes: 2 additions & 1 deletion src/integrations/prefect-gcp/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ asyncio_default_fixture_loop_scope = "session"
asyncio_mode = "auto"
env = ["PREFECT_TEST_MODE=1"]
filterwarnings = [
"ignore:Type google._upb._message.* uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14:DeprecationWarning",
"ignore:'.*' deprecated - use .*:DeprecationWarning:httplib2",
"ignore:GitWildMatchPattern .* is deprecated:DeprecationWarning:pathspec",
]

[tool.uv.sources]
Expand Down
12 changes: 4 additions & 8 deletions src/integrations/prefect-gcp/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,15 @@
from google.cloud.exceptions import NotFound
from prefect_gcp.credentials import GcpCredentials

from prefect.settings import PREFECT_LOGGING_TO_API_ENABLED, temporary_settings
from prefect.testing.utilities import prefect_test_harness


@pytest.fixture(scope="session", autouse=True)
def prefect_db():
with prefect_test_harness():
yield


@pytest.fixture(scope="session", autouse=True)
def disable_logging():
with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}):
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_timeout=60):
yield
Comment on lines +16 to 20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Broken test harness kwarg 🐞 Bug ✓ Correctness

• prefect-gcp tests call prefect_test_harness(server_timeout=60) but the context manager only
  accepts server_startup_timeout, so the session autouse fixture will raise TypeError before any
  tests run.
• This is a CI-blocking issue for the prefect-gcp integration test suite.
Agent prompt
### Issue description
`prefect_test_harness` is called with an unsupported keyword argument `server_timeout`, which will raise `TypeError` and prevent the test suite from running.

### Issue Context
`prefect_test_harness` accepts `server_startup_timeout`.

### Fix Focus Areas
- src/integrations/prefect-gcp/tests/conftest.py[14-20]

### Suggested change
Use:
- `with prefect_test_harness(server_startup_timeout=60):` (preferred for clarity)

(or pass `60` positionally).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Expand Down
7 changes: 1 addition & 6 deletions src/integrations/prefect-gcp/tests/projects/test_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,7 @@ def tmp_files(tmp_path: Path):
"testdir2/testfile5.txt",
]

(tmp_path / ".prefectignore").write_text(
"""
testdir1/*
.prefectignore
"""
)
(tmp_path / ".prefectignore").write_text("testdir1/*\n.prefectignore\n")

for file in files:
filepath = tmp_path / file
Expand Down
24 changes: 24 additions & 0 deletions src/integrations/prefect-gcp/tests/test_cloud_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,30 @@ def test_resolve_path(self, gcs_bucket, path):
expected = None
assert actual == expected

def test_resolve_path_no_double_nesting(self, gcs_bucket):
"""
Regression test for https://github.com/PrefectHQ/prefect/issues/20174

When storage_block_id is null (e.g., context serialized to Ray workers),
create_result_record() adds bucket_folder to storage_key via _resolve_path.
Then write_path() calls _resolve_path again. Without the duplicate check,
this causes double-nested paths like "results/results/abc123".
"""
bucket_folder = gcs_bucket.bucket_folder
if not bucket_folder:
pytest.skip("Test only applies when bucket_folder is set")

# Simulate path that already has bucket_folder prefix
# (as would happen when create_result_record calls _resolve_path)
already_prefixed_path = f"{bucket_folder}/abc123"

# When write_path calls _resolve_path again, it should NOT double-nest
result = gcs_bucket._resolve_path(already_prefixed_path)

# Should return the same path, not bucket_folder/bucket_folder/abc123
assert result == already_prefixed_path
assert not result.startswith(f"{bucket_folder}{bucket_folder}")

def test_read_path(self, gcs_bucket):
assert gcs_bucket.read_path("blob") == b"bytes"

Expand Down
Loading