Skip to content

Introduce Bulk Load CSV Files Mode #31

Open
@zprobst

Description

Currently, users of nodestream are looking to leverage the performance of the bulk load functionality in Neptune DB and rely on nodestream to help with the data mapping and data source abstraction. It would be great to introduce a mode for the Neptune plugin that would build up some bulk load CSV files and then when the ingestion is done, load them into the graph. This would bypass the main bottleneck of OpenCypher performance and give a significant performance boost.

However, this would require a bit of a roundabout process where the database connector would take the nodes and edges and write out CSV files, copy them to S3, and then invoke the bulk loader from the AWS SDK. Users would need to provide an S3 bucket to stage the CSV files and add a role to their Neptune cluster which allows it to read from S3.

The effort required is comparable to translating from OpenCypher to Gremlin, which is also expected to give a performance boost and allow nodestream to connect to any TinkerPop compliant database.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions