Skip to content

Delta Lake input source: add support for user-provided configuration #18596

@yurmix

Description

@yurmix

Description

The Delta Kernel API accepts Hadoop config via the Engine interface. However, DeltaInputSource hides this and instantiates the engine with an empty Configuration().

It is proposed to add a hadoopProperties JSON field to the Delta Lake input source.
Example:

"inputSource": {
  "type": "delta",
  "tablePath": "s3a://bucket/path/to/table",
  "hadoopProperties": {
    "fs.s3a.access.key": "${AWS_ACCESS_KEY_ID}",
    "fs.s3a.secret.key": "${AWS_SECRET_ACCESS_KEY}",
    "fs.s3a.session.token": "${AWS_SESSION_TOKEN}",
    "fs.s3a.endpoint": "s3.amazonaws.com"
  }
}

Motivation

Allow providing Hadoop properties such as S3A credentials, endpoints, and similar options at the ingestion spec level rather than globally.

This enables ingesting from different environments and adding configuration specific to the ingestion task.

Example use case: support reading Delta tables from S3 with temporary AWS STS credentials without relying on global configuration.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions