Skip to content

Reddit Data #41

@KeremTurgutlu

Description

@KeremTurgutlu

Data preparation involves downloading reddit comment and submission data form https://files.pushshift.io/reddit/ and it is written that total data is around 700GB. However, the actual size of the data is around ~2TB, for training GODEL unitl which YYYY-MM reddit data you've used?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions