Skip to content

DatabaseClient stream() and stream_to_file() #109

@stephen29xie

Description

@stephen29xie

Current behaviour:

from omniduct.duct import Duct
duct = Duct.for_protocol(protocol='sqlalchemy')(...)

query = 'SELECT * FROM ...'

# 1
duct.stream(query, format='csv', batch=2)

# 2
duct.stream_to_file(query, '.../data.csv', batch=2)

# 3
duct.stream_to_file(query, '.../data.csv')

1: Batched stream() to memory repeatedly writes the column names with each batch.

2: Thus, when wrapped by stream_to_file(), the column names are written to file repeatedly for each batch

Eg:

State,City
California,San Francisco
Oregon,Portland
State,City
Texas,Houston
California,Los Angeles

3: When batch=None, stream(), and thus stream_to_file() does not write column names at all. So the output data file will not contain a column names header.

Eg:

California,San Francisco
Oregon,Portland
Texas,Houston
California,Los Angeles

In my opinion, the desired behaviour should be:

  • When streaming to csv file, the column names should be written once, as a header.
  • When streaming to memory, the generator should return only row data (no column names), like a cursor would.

What do you think about this? I can open a PR to get this done.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions