Skip to content

[FSTORE-2036] PR 4a/4 — Python SDK support for Unity Catalog OAuth M2M#966

Merged
jimdowling merged 5 commits into
logicalclocks:mainfrom
jimdowling:fstore-2036-uc-oauth-m2m-sdk
May 26, 2026
Merged

[FSTORE-2036] PR 4a/4 — Python SDK support for Unity Catalog OAuth M2M#966
jimdowling merged 5 commits into
logicalclocks:mainfrom
jimdowling:fstore-2036-uc-oauth-m2m-sdk

Conversation

@jimdowling

Copy link
Copy Markdown
Contributor

Summary

PR 4 of 4 for FSTORE-2099, hopsworks-api half. Extends UnityCatalogConnector so the Python SDK round-trips the new OAuth fields the backend (#3032 / #3033) and frontend (logicalclocks/hopsworks-front#1919) added. Legacy PAT-only construction keeps working unchanged.

Companion loadtest PR: logicalclocks/loadtest (link in this thread once the loadtest PR is opened).

Spec: uc-oauth2/uc-oauth2.md in the per-feature workspace. Ticket: https://hopsworks.atlassian.net/browse/FSTORE-2099

What changes

  • Constructor gains auth_method, client_id, client_secret, oauth_endpoint, account_id, account_host, has_access_token, has_client_secret. auth_method defaults to "PAT" when absent so existing code paths (and downstream fixtures) keep producing PAT connectors. OAUTH_M2M without an explicit oauth_endpoint defaults to "WORKSPACE", matching the frontend default.
  • Write-only-friendly booleans: has_access_token and has_client_secret come from the server (hasAccessToken / hasClientSecret in camelCase). When a caller builds a connector locally with a secret in hand, the booleans fall back to "is the secret non-None" so client code that constructs in-process still reports correct state.
  • from_response_json is unchanged — it already uses humps.decamelize + **kwargs splat, which picks up the new fields by name once they're declared on the constructor.
  • Existing get_unity_catalog fixture updated to match the post-PR-1 backend wire format (hasAccessToken: true; no decrypted access_token in GET responses). Two new fixtures (get_unity_catalog_oauth_workspace, get_unity_catalog_oauth_account) cover the OAuth modes.
  • Tests extended from 4 to 8 in TestUnityCatalogConnector — round-trip for both OAuth modes; legacy construction defaulting to PAT; OAuth construction defaulting oauth_endpoint to WORKSPACE.

Test plan

  • uv run pytest python/tests/test_storage_connector.py::TestUnityCatalogConnector8/8 passing.
  • uv run ruff check — clean.
  • uv run docsig python/hsfs/storage_connector.py — clean.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  python/hsfs
  feature_group.py 5226-5227, 5362
  storage_connector.py 3581, 3606-3610, 3617-3625, 3633, 3648-3742, 3754-3759, 3793-3798, 3804, 3810, 3816, 3822, 3828, 3834, 3849-3851, 3870-3872, 3876-3886
  python/hsfs/core
  storage_connector_api.py 104-114
Project Total  

This report was generated by python-coverage-comment-action

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the Python SDK’s UnityCatalogConnector model to support Unity Catalog OAuth M2M by round-tripping new OAuth-related fields and adding “secret present” booleans that reflect backend write-only secrets, while keeping legacy PAT-only construction working unchanged.

Changes:

  • Add OAuth M2M fields (auth_method, client_id, client_secret, oauth_endpoint, account_id, account_host) and write-only-friendly flags (has_access_token, has_client_secret) to UnityCatalogConnector.
  • Update UC connector fixtures to reflect backend write-only secret behavior and add OAuth workspace/account fixtures.
  • Expand TestUnityCatalogConnector to cover PAT back-compat defaults and OAuth endpoint defaults + parsing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
python/hsfs/storage_connector.py Adds UC OAuth M2M fields + secret-presence booleans and exposes them via public properties.
python/tests/test_storage_connector.py Updates UC parsing assertions (write-only token) and adds OAuth + defaulting tests.
python/tests/fixtures/storage_connector_fixtures.json Updates UC GET fixture to new wire format and adds OAuth workspace/account fixtures.

Comment thread python/tests/fixtures/storage_connector_fixtures.json Outdated
Comment thread python/tests/fixtures/storage_connector_fixtures.json Outdated
@jimdowling jimdowling force-pushed the fstore-2036-uc-oauth-m2m-sdk branch from da4116c to bb3b56b Compare May 24, 2026 09:11
@jimdowling jimdowling marked this pull request as ready for review May 24, 2026 09:55
@jimdowling jimdowling force-pushed the fstore-2036-uc-oauth-m2m-sdk branch from 8539c5c to cb0242c Compare May 25, 2026 10:40
Comment thread python/hsfs/storage_connector.py
@jimdowling jimdowling force-pushed the fstore-2036-uc-oauth-m2m-sdk branch 3 times, most recently from ecb45df to d95b1cf Compare May 26, 2026 08:03
…ark reads

https://hopsworks.atlassian.net/browse/FSTORE-2036

SDK half of FSTORE-2036. Extends UnityCatalogConnector to round-trip
the new OAuth fields the backend (PR 1 / PR 2) and frontend (PR 3)
added, and rewires the PySpark read path so the SDK calls Databricks
directly for vended S3 temp-credentials. Legacy PAT-only construction
keeps working unchanged.

OAuth fields. The constructor gains auth_method, client_id,
client_secret, oauth_endpoint, account_id, account_host,
has_access_token, and has_client_secret. auth_method defaults to "PAT"
when absent so existing code paths and fixtures that construct
connectors with just access_token keep producing PAT connectors. When
the caller asks for OAUTH_M2M without specifying oauth_endpoint, it
defaults to "WORKSPACE", matching the frontend default.

has_access_token and has_client_secret are write-only-friendly
booleans: the server emits them on read so a caller can tell whether a
secret is on file without ever seeing it. When constructed locally
with a secret in hand, has_* falls back to "is the secret non-None"
so client code that builds a connector in-process still reports the
correct state.

from_response_json keeps using humps.decamelize + **kwargs splat; the
new fields are picked up by name. The existing get_unity_catalog
fixture is updated to match the post-PR-1 backend wire format
(hasAccessToken: true on read; no decrypted access_token in the
response). Two new fixtures (get_unity_catalog_oauth_workspace,
get_unity_catalog_oauth_account) cover the OAuth modes.

PySpark read path. Mirrors the existing Python / Arrow-Flight
architecture where the backend provides the bearer and flyingduck owns
the data plane. The backend exposes a single new endpoint
GET /storageconnectors/{name}/uc_bearer that returns
{access_token, expires_in_seconds} with Cache-Control: no-store; the
SDK takes that bearer, calls Databricks for the table metadata and
vended temp-table-credentials, validates Delta + AWS, builds per-bucket
S3A keys, and runs the Delta read locally. UnityCatalogConnector.read
auto-detects Databricks-hosted Spark (cluster usage tags or
DATABRICKS_RUNTIME_VERSION) and routes to spark.read.table() in that
case; force_vended=True overrides the detection when the cluster
identity lacks the SP's grants. _assert_delta_extension_loaded surfaces
an actionable error when the SparkSession was built without the Delta
extension, with a copy-paste fix block.

Reviewed-by: GitHub Copilot <Copilot@users.noreply.github.com>
Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jimdowling jimdowling force-pushed the fstore-2036-uc-oauth-m2m-sdk branch from d95b1cf to d740327 Compare May 26, 2026 11:56
jimdowling and others added 4 commits May 26, 2026 23:41
https://hopsworks.atlassian.net/browse/FSTORE-2036

Post-merge formatting fixup for the CI ruff format check.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hopsworks-api into fstore-2036-uc-oauth-m2m-sdk
@jimdowling jimdowling enabled auto-merge (squash) May 26, 2026 21:55
@jimdowling jimdowling merged commit 2cce155 into logicalclocks:main May 26, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants