Skip to content

IcebergConfigurer.writeable check fails for HadoopFileIO tables (scheme s3a:// vs warehouse s3://) — signing rejects all commits #12426

@catcarreira-create

Description

@catcarreira-create

Version

  • Nessie server: 0.107.5 (Quarkus image ghcr.io/projectnessie/nessie:0.107.5)
  • Iceberg client: 1.7.0
  • Spark: 3.5.x
  • pyiceberg: 0.7.x (also affected when using HadoopFileIO via RestCatalog Python client)

Description

When a Spark client writes to a Nessie REST catalog using HadoopFileIO (the default classical Iceberg FileIO via hadoop-aws), the resulting metadata.location is prefixed with scheme s3a://. The Nessie server registers the warehouse with scheme s3://. The internal IcebergConfigurer.icebergConfigPerTable() writeable derivation appears to apply S3Utils.normalizeS3Scheme only to the warehouse path, not to the metadata.location it compares against — so the resulting array writeable[] is empty and all PUT/DELETE signing requests are subsequently rejected with HTTP 403 unauthorized signing request.

End result: tables created via HadoopFileIO against a Nessie REST catalog backed by S3-compatible storage (MinIO, AWS S3) become silently unwriteable. Reads work fine.

Reproduction

Minimal Nessie config

nessie:
  image: ghcr.io/projectnessie/nessie:0.107.5
  environment:
    nessie.catalog.default-warehouse: dev
    nessie.catalog.warehouses.dev.location: s3://my-bucket/    # scheme s3://
    nessie.catalog.service.s3.default-options.endpoint: http://minio:9000
    nessie.catalog.service.s3.default-options.path-style-access: "true"
    nessie.catalog.service.s3.default-options.region: us-east-1

Spark client config that triggers the bug

spark-submit \
  --conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.demo.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
  --conf spark.sql.catalog.demo.uri=http://nessie:19120/iceberg \
  --conf spark.sql.catalog.demo.warehouse=dev \
  # io-impl OMITTED -> defaults to HadoopFileIO which triggers the bug.
  --conf spark.hadoop.fs.s3a.endpoint=http://minio:9000 \
  ...

Operation that fails

df = spark.createDataFrame([(1, "x")], ["id", "name"])
df.writeTo("demo.test_ns.test_table").create()

Observed error

org.apache.iceberg.exceptions.CommitFailedException:
  Unauthorized signing request
  caused by: org.apache.iceberg.aws.s3.signer.S3V4RestSignerClient:
    HTTP 403 — signing rejected, writeable=[]

Inspecting GET /iceberg/v1/{prefix}/config?warehouse=dev:

{
  "overrides": {
    "s3.signer.uri": "http://nessie:19120/iceberg/v1/aws/s3/sign",
    "writeable": []
  }
}

The Spark HadoopFileIO writes metadata.location: s3a://my-bucket/... to the commit. The warehouse is registered as s3://my-bucket/. Tables written with HadoopFileIO have metadata.location: s3a://...; tables written with io-impl=S3FileIO (AWS SDK v2 direct) have metadata.location: s3://... and work correctly.

Root cause hypothesis

In IcebergConfigurer.icebergConfigPerTable() (servers/quarkus-server/src/main/java/.../catalog/IcebergConfigurer.java or equivalent; approximate lines 225-320 of 0.107.5), the writeable derivation appears to be roughly:

String warehouseNormalized = S3Utils.normalizeS3Scheme(warehouse.getLocation());
// "s3://my-bucket/"

String metadataLoc = table.metadataLocation();
// "s3a://my-bucket/.../00000-...metadata.json"

if (metadataLoc.startsWith(warehouseNormalized)) {
  // "s3a://...".startsWith("s3://...") -> FALSE
  writeable.add(metadataLoc);
}
// -> writeable[] stays empty

S3Utils.normalizeS3Scheme() is called on the warehouse but not on the metadataLoc before the startsWith check. The scheme mismatch (s3:// vs s3a://) makes the predicate always false, so the writeable array stays empty and the signing endpoint then rejects every PUT/DELETE.

Suggested fix

Normalize both sides before the comparison:

String warehouseNormalized = S3Utils.normalizeS3Scheme(warehouse.getLocation());
String metadataLocNormalized = S3Utils.normalizeS3Scheme(metadataLoc);

if (metadataLocNormalized.startsWith(warehouseNormalized)) {
  writeable.add(metadataLoc);
}

A more defensive variant would normalize the warehouse scheme at storage time and normalize all metadata locations once at startup.

Workaround

Force io-impl=org.apache.iceberg.aws.s3.S3FileIO on every Spark client. S3FileIO writes s3:// natively, eliminating the mismatch.

Impact

  • Severity: high for any user mixing Spark + Nessie REST + S3-compatible storage (MinIO, AWS S3) with default HadoopFileIO.
  • Workaround known: yes (io-impl=S3FileIO) but requires Iceberg 1.7+ and iceberg-aws-bundle.
  • Silenced confusion: the failure surfaces in the signing endpoint, not the scheme check itself — users typically spend hours debugging credentials before reaching the actual cause.

Happy to help test a fix against this scenario if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions