Skip to content

Inconsistent embedding shapes in embedding pkl file depending on batch size in process_pdbs.py #9

@ishaanganti

Description

@ishaanganti

The saved embedding shapes in the embedding pkl file for the atom_embedding and block_embedding seem to be incorrect; for the exact same system (but different frames in a MD trajectory), I've found that different dimensions are saved. The behavior seems to depend on how many PDBs are processed at once (i.e. how many entries there are in the CSV passed into 'process_pdbs.py'). For this one PDB, for example, I got its embedding within a batch of 60 PDBs and again within a batch of 2 PDBs. The dimensions from the batch of 60 were:

id: 1_2_f6.pdb_ABCDEFGHIJK_L_LIG
graph_embedding: shape (32,)
block_embedding: shape (0, 32)
atom_embedding: shape (635, 32)
block_id: shape (155,)
atom_id: shape (859,)

And from the batch of 2:
id: 1_2_f6.pdb_ABCDEFGHIJK_L_LIG
graph_embedding: shape (32,)
block_embedding: shape (155, 32)
atom_embedding: shape (859, 32)
block_id: shape (155,)
atom_id: shape (859,)

But both had the same exact same embedding:
"1_2_f6.pdb_ABCDEFGHIJK_L_LIG": [-0.004074007738381624, 0.3173443675041199, -0.11195822060108185, -0.01592610776424408, -0.06759312003850937, 0.164351224899292, 0.024977976456284523, -0.16799166798591614, -0.09120690077543259, -0.1385965496301651, 0.46841853857040405, -0.018586348742246628, 0.17197035253047943, 0.13865599036216736, -0.06969504058361053, -0.2753834128379822, -0.00586903840303421, -0.12763790786266327, 0.051467981189489365, 0.41604381799697876, -0.26409682631492615, -0.12303176522254944, 0.010855807922780514, -0.09284979850053787, 0.03329434618353844, -0.015801483765244484, 0.2621055543422699, -0.10459930449724197, 0.10786740481853485, 0.03054526075720787, 0.12038268893957138, -0.23565126955509186]

So I don't think this affects the final embeddings seriously.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions