Skip to content

Parallel file writing #311

Open
Open
@mkeskells

Description

@mkeskells

It looks from the code and observations that the file writing is serial
If there are lots of files to be written then this seems to be limited by latency issues

it seems to me that this could be easily changed

the record writing is currently done like this (in the GCS sink)

recordGrouper.records().forEach(this::flushFile);

and it could be changed like this

           recordGrouper
                    .records()
                    .entrySet()
                    .parallelStream()
                    .forEach(entry -> flushFile(entry.getKey(), entry.getValue()));

probably we want some more controls, limiting the grouping, making it optional etc

What are the thoughts of the team on this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions