Skip to content

[Dataset] Clarification on Dataset processing #767

Open
@BowenYao18

Description

@BowenYao18

cites_edge = add_self_loops(remove_self_loops(paper_paper_edges)[0])[0]
self.edge_dict = {
('paper', 'cites', 'paper'): (torch.cat([cites_edge[1, :], cites_edge[0, :]]), torch.cat([cites_edge[0, :], cites_edge[1, :]])),

Let me use an example.

  1. Assume we have edge file like this:
[0, 1, 2]  # cites_edge[0, :]
[1, 2, 3]  # cites_edge[1, :]
  1. Then, we first do

add_self_loops(remove_self_loops(paper_paper_edges)[0])[0]

, which gives us this:

[0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :]
[1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :]
  1. Then, we have its reverse edge:
[1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :]
[0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :]
  1. If we follow this code

(torch.cat([cites_edge[1, :], cites_edge[0, :]]), torch.cat([cites_edge[0, :], cites_edge[1, :]])

, we should have this:

[1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3]  # cites_edge[1, :] + cites_edge[0, :]
[0, 1, 2, 0, 1, 2, 3, 1, 2, 3, 0, 1, 2, 3]  # cites_edge[0, :] + cites_edge[1, :]

Instead of this below since we must exactly follow the MLPerf, we cannot have the other way around like this (torch.cat([cites_edge[0, :], cites_edge[1, :]]), torch.cat([cites_edge[1, :], cites_edge[0, :]])

[1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3]  # cites_edge[0, :] + cites_edge[1, :]
[0, 1, 2, 0, 1, 2, 3, 1, 2, 3, 0, 1, 2, 3]  # cites_edge[1, :] + cites_edge[0, :]

Am I understanding this correctly? Does the order matters here? Thank you!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions