How are `sequence`, `structure`, and `text` files matched in FAISS index?

Hi, thanks for releasing this great work!

I’m currently exploring the FAISS index at:

```
faiss_index/SwissProt/ProTrek_650M_UniRef50/
```

Inside this directory, I noticed that:

* `sequence/ids.tsv` contains UniProt IDs, and each line corresponds to a protein sequence.
* Similarly, `structure/ids.tsv` also contains UniProt IDs for protein structures.
* There’s also a `text/` folder, which seems to contain textual annotations.

My question is:
How are these three parts (`sequence`, `structure`, and `text`) aligned with each other?
Is the matching done through a **pointer** (e.g.,ids.tsv.pointer.npy)?

I tried checking the correspondence by comparing line indices — for example, line 0 in `sequence/ids.tsv` vs. line 0 in `text/ids.tsv` — but they don’t seem to match.

Could you please clarify:

1. How to correctly align entries between `sequence`, `structure`, and `text`?
2. If a mapping file or pointer is used, where can I find it?

Thanks a lot for your help!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How are `sequence`, `structure`, and `text` files matched in FAISS index? #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How are sequence, structure, and text files matched in FAISS index? #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How are `sequence`, `structure`, and `text` files matched in FAISS index? #16