Querying partitioned tables

I want to merge multiple tables with the same into a single one.

Say I have a table `Sales(transaction_id, date, amount)` and sharded files in my file system,

```
sales/
  - 2024/
    - 12/
      - 30.csv
      - 31.csv
  - 2025/
    - 01/
      - 01.csv
      - 02.csv
      - 03.csv
```

Is there a convenient way to treat `sales/**/*.csv` as a single table?

So far it seems that `bdt query` supports 2 flags for input tables,
- `--table path/to/single_file.csv`
  - A single table is read with name `single_file`
- `--tables path/to/directory/`
  - Multiple tables are read, each one with it's own basename
    - This imports N tables
    - I don't see much value here, why not using the shell to expand something like `path/to/directory/*.csv`?
    
    
I kind of want a new input file flag that expects a table name, and a set of (compatible) files,

```fish
bdt query \
  --partitioned_table sales sales/**/*.csv \  # Shell will expand these globs
  --sql "
    select
      count(*)
    from
      sales
  "
```

Which would use a flag with 1+N arguments, `--partitioned_table sales sales/2024/12/30.csv sales/2024/12/31.csv sales/2025/01/01.csv sales/2025/01/02.csv  sales/2025/01/03.csv`, and make the table `sales` available.

Is there a way to get this today? I tried the `--tables` flag, but instead got N different tables that were hard to work with as a unit.

It's not hard to create a single file that concatenates all tables, but I'd nice not needing to create it as it'd allow writing queries from the shell, with a tiny rewrite `--partitioned_table sales sales/2024/12/*.csv` would get me info about sales in December 2024 without any made-up disk writes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Querying partitioned tables #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Querying partitioned tables #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions