Pre-computed files need to be regenerated for each set of parameters

**Context**. Real-time PDB parsing with the BioPython package, e.g. typically: https://github.com/shervinea/enzynet/blob/31d30e0272e0c9425e0c76085761f211b89f8b7c/enzynet/pdb.py#L53 is expensive and bottlenecks the training process if done on the fly. 

For this reason, we put in place a "precomputation stage" https://github.com/shervinea/enzynet/blob/31d30e0272e0c9425e0c76085761f211b89f8b7c/enzynet/volume.py#L123 that takes all enzymes beforehand and stores target volumes in a dedicated folder.

**Current limitation**. This process is repeated for each set of parameters {weights considered, interpolation level between atoms p, volume size}. This is ineffective from the perspectives of:
- total computations performed: PDB parsing is the same for all these configurations and needs to be identically repeated for each of them. The only remaining operations are relatively cheap: e.g. 2D -> 3D mapping, points interpolation. With a proper implementation, these last steps can easily be done on the fly without becoming a bottleneck.
- space: the number/size of produced files increases with the same pace as the number of configurations that the user tries out (!).

**Desired behavior**. Coordinates + weights precomputation from PDB files is done only once and produces a parsed version of the data that is:
1. Light enough so that it can be transformed to target volumes on the fly
2. Complete enough so that all configurations' data can be derived from them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-computed files need to be regenerated for each set of parameters #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pre-computed files need to be regenerated for each set of parameters #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions