Skip to content

Conversation

@nicklan
Copy link
Collaborator

@nicklan nicklan commented Sep 25, 2025

What changes are proposed in this pull request?

We discussed that we want to be able to point examples (and a future "telemetry example") at tables in cloud storage. This introduces ways to pass all the credentials/config in our examples, and adds some usage examples to each one.

It also migrates write-table to use common.

How was this change tested?

Running the examples

help output for read-table-single-threaded:

An example program that dumps out the data of a delta table

Usage: read-table-single-threaded [OPTIONS] <PATH>

Arguments:
  <PATH>  Path to the table

Options:
      --region <REGION>         Region to specify to the cloud access store (only applies to S3)
      --option <OPTION>         Extra key-value pairs to pass to the ObjectStore builder. Note different object stores accept different configuration options, see the object_store types: AmazonS3Builder, MicrosoftAzureBuilder, and GoogleCloudStorageBuilder. Specify as "key=value", and pass multiple times to set more than one option
      --env-creds               Get credentials from the environment. For details see the object_store types: AmazonS3Builder, MicrosoftAzureBuilder, and GoogleCloudStorageBuilder. Specifically the `from_env` method
      --public                  Specify that the table is "public" (i.e. no cloud credentials are needed). This is required for things like s3 public buckets, otherwise the kernel will try and authenticate by talking to the aws metadata server, which will fail unless you're on an ec2 instance
  -l, --limit <LIMIT>           Limit to printing only LIMIT rows
      --schema-only             Only print the schema of the table
      --columns [<COLUMNS>...]  Comma separated list of columns to select
  -h, --help                    Print help
  -V, --version                 Print version

Examples:
  Read table at foo/bar/bazz, relative to where invoked:
    read-table-single-threaded foo/bar/bazz

  Get S3 credentials, region, etc. from the environment, and read table on S3:
    read-table-single-threaded --env_creds s3://path/to/table

  Specify azure credentials on the command line and read table in azure:
    read-table-single-threaded --option AZURE_STORAGE_ACCOUNT_NAME=my_account --option AZURE_STORAGE_ACCOUNT_KEY=my_key [more --option args] az://account/container/path

  Read a table in a public S3 bucket in us-west-2 region:
    read-table-single-threaded --region us-west-2 --public s3://my/public/table

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Sep 25, 2025
@codecov
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.79%. Comparing base (a3429b7) to head (58e5ec5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1352   +/-   ##
=======================================
  Coverage   84.79%   84.79%           
=======================================
  Files         113      113           
  Lines       28613    28613           
  Branches    28613    28613           
=======================================
  Hits        24263    24263           
  Misses       3196     3196           
  Partials     1154     1154           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (for the future) should we consider making an actual "utility" as an actual binary? i suppose cargo examples are basically that but it might be 'nicer' to separate some exemplary stuff from actual utils?

HashMap::from([("region", region.clone())])
if args.env_creds {
let (scheme, _path) = ObjectStoreScheme::parse(url).map_err(|e| {
delta_kernel::Error::Generic(format!("Object store could not parse url: {}", e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
delta_kernel::Error::Generic(format!("Object store could not parse url: {}", e))
delta_kernel::Error::generic(format!("Object store could not parse url: {e}"))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not, because it's already a String because I'm using format!. So we'll being doing an unneeded to_string() if we use generic.

@OussamaSaoudi
Copy link
Collaborator

Agreed with @zachschuermann we should have a cli tool for quickly reading/inspecting delta tables

@nicklan
Copy link
Collaborator Author

nicklan commented Sep 29, 2025

So, these are actually their own crates, they just happen to live in a directory named examples. you can't even run them via cargo run --example. Would we want to just move them into a directory named something else?

@nicklan nicklan merged commit af78ccb into delta-io:main Sep 29, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants