-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Hi, thanks for releasing this great work!
I’m currently exploring the FAISS index at:
faiss_index/SwissProt/ProTrek_650M_UniRef50/
Inside this directory, I noticed that:
sequence/ids.tsvcontains UniProt IDs, and each line corresponds to a protein sequence.- Similarly,
structure/ids.tsvalso contains UniProt IDs for protein structures. - There’s also a
text/folder, which seems to contain textual annotations.
My question is:
How are these three parts (sequence, structure, and text) aligned with each other?
Is the matching done through a pointer (e.g.,ids.tsv.pointer.npy)?
I tried checking the correspondence by comparing line indices — for example, line 0 in sequence/ids.tsv vs. line 0 in text/ids.tsv — but they don’t seem to match.
Could you please clarify:
- How to correctly align entries between
sequence,structure, andtext? - If a mapping file or pointer is used, where can I find it?
Thanks a lot for your help!
Metadata
Metadata
Assignees
Labels
No labels