Skip to content

Using buffer for writing to a file during preprocessing  #260

@bhanu77prakash

Description

@bhanu77prakash

In the data preprocessing code, there is a function that write a triple to a file

def write_triple(f, ent, rel, t, S, P, O):
    """Write a triple to a file. """
    f.write(str(ent[t[S]]) + "\t" + str(rel[t[P]]) + "\t" + str(ent[t[O]]) + "\n")

I think writing this way would take a lot of time when you deal with 100s of millions of relations. An ideal method would be to maintain a buffer (e.g. a string) and then dump whenever it reaches certain threshold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions