Skip to content

Commit 6b41fe6

Browse files
Update README.md
1 parent b2b0111 commit 6b41fe6

File tree

1 file changed

+67
-29
lines changed

1 file changed

+67
-29
lines changed

README.md

Lines changed: 67 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,34 @@ Or alternatively run this command:
4444

4545
Please note there is another package called spectra which is not related to this tool. Spectrae (which stands for spectral evaluation) implements the spectral framework for model evaluation.
4646

47+
## Definition of terms
48+
49+
This work and GitHub repository use terms related to the **spectral framework for model evaluation**. Below is a quick refresher on these key concepts.
50+
51+
### **Spectral Property**
52+
Every dataset has an underlying property that, as it changes, causes model performance to decrease. This is referred to as the **spectral property**.
53+
54+
However, **not every property qualifies as a spectral property**.
55+
For example:
56+
- When predicting protein structure, the performance of a protein folding model does **not** change based on the number of **M** amino acids in a sequence.
57+
- Instead, model performance **does** change based on **structural similarity**—this is an example of a **spectral property**.
58+
59+
### **Spectral Property Graph (SPG)**
60+
For a given dataset, a **spectral property graph (SPG)** is defined as:
61+
- **Nodes**: Samples in the dataset.
62+
- **Edges**: Connections between samples that share a spectral property.
63+
64+
Every SPG is defined by a flattened adjacency matrix, this saves memory and allowed SPECTRA to utilize GPUs to speed up computation.
65+
66+
### **Spectral Parameter**
67+
The **spectral parameter** can be thought of as a **sparsification probability**.
68+
69+
When SPECTRA runs on an SPG:
70+
1. It selects a random node.
71+
2. It decides whether to **delete edges** with a certain probability—this probability is the **spectral parameter**.
72+
3. The closer the spectral parameter is to **1**, the **stricter** the splits generated by SPECTRA will be.
73+
74+
4775
## How to use spectra
4876

4977
### Step 1: Define the spectral property, cross-split overlap, and the spectra dataset wrapper
@@ -86,7 +114,7 @@ class [Name]_Dataset(SpectraDataset):
86114
pass
87115
```
88116

89-
Spectra implements the user definition of the spectra property and cross split overlap.
117+
Spectra implements the user definition of the spectra property.
90118

91119

92120
```python
@@ -103,52 +131,62 @@ class [Name]_spectra(spectra):
103131
'''
104132
return similarity
105133

106-
def cross_split_overlap(self, train, test):
107-
'''
108-
Define this function to return the overlap between a list of train and test samples.
134+
```
135+
### Step 2: Initialize SPECTRA and calculate the flattened adjacency matrix
109136

110-
Example: Average pairwise similarity between train and test set protein sequences.
137+
1. **Initialize SPECTRA**
138+
- Initially, pass in no spectral property graph.
111139

112-
'''
113-
140+
2. **Pass SPECTRA and dataset into the `Spectra_Property_Graph_Constructor`**
141+
- Additional arguments:
142+
- **`num_chunks`**: If your dataset is very large, you can split up the construction into chunks to allow multiple jobs to compute similarity. This parameter controls the number of chunks.
143+
- **`binary`**: If `True`, the similarity returns either `0` or `1`; otherwise, it returns a floating-point number.
114144

115-
return cross_split_overlap
116-
```
117-
### Step 2: Initialize SPECTRA and precalculate pairwise spectral properties
145+
3. **Call `create_adjacency_matrix`**
146+
- This function takes in the **chunk number** to calculate:
147+
- If `num_chunks = 0`, the pairwise similarity is calculated in one go, so the input to `create_adjacency_matrix` should be `0`.
148+
- If `num_chunks = 10`, the input should be the chunk number you want to calculate (e.g., `0` to `9`).
149+
150+
4. **Combine the adjacency matrices**
151+
- Call `combine_adjacency_matrices()` in the graph constructor to combine all the adjacency matrices into a single matrix.
118152

119-
Initialize SPECTRA, passing in True or False to the binary argument if the spectral property returns a binary or continuous value. Then precalculate the pairwise spectral properties.
120153

121154
```python
122-
init_spectra = [name]_spectra([name]_Dataset, binary = True)
123-
init_spectra.pre_calculate_spectra_properties([name])
155+
from spectrae import Spectral_Property_Graph_Constructor
156+
spectra = [name]_spectra([name]_Dataset, spg=None)
157+
construct_spg = Spectra_Property_Graph_Constructor(spectra, [name]_Dataset, num_chunks = 0, binary = [False/True])
158+
construct_spg.create_adjacency_matrix(0)
159+
construct_spg.combine_adjacency_matrices()
124160
```
125-
### Step 3: Initialize SPECTRA and precalculate pairwise spectral properties
126161

127-
Generate SPECTRA splits. The ```generate_spectra_splits``` function takes in 4 important parameters:
128-
1. ```number_repeats```: the number of times to rerun SPECTRA for the same spectral parameter, the number of repeats must equal the number of seeds as each rerun uses a different seed.
129-
2. ```random_seed```: the random seeds used by each SPECTRA rerun, [42, 44] indicates two reruns the first of which will use a random seed of 42, the second will use 44.
130-
3. ```spectra_parameters```: the spectral parameters to run on, they must range from 0 to 1 and be string formatted to the correct number of significant figures to avoid float formatting errors.
131-
4. ```force_reconstruct```: True to force the model to regenerate SPECTRA splits even if they have already been generated.
132162

163+
### Step 3: Generate SPECTRA Splits
133164

134-
```python
135-
spectra_parameters = {'number_repeats': 3,
136-
'random_seed': [42, 44, 46],
137-
'spectral_parameters': ["{:.2f}".format(i) for i in np.arange(0, 1.05, 0.05)],
138-
'force_reconstruct': True,
139-
}
165+
1. **Initialize the Spectral Property Graph**
166+
- Pass in the flattened adjacency matrix you just generated to the Spectral_Property_Graph to create the spectral property graph.
140167

141-
init_spectra.generate_spectra_splits(**spectra_parameters)
168+
2. **Recreate SPECTRA**
169+
- Use the SPECTRA dataset along with the created spectral property graph to reinstantiate SPECTRA.
142170

171+
3. **Call `generate_spectra_split`** with the following arguments:
172+
- **`spectra_param`**: The spectral parameter to run, must be between `0` and `1` (inclusive).
173+
- **`degree_choosing`**: Only applicable to binary graphs; optimizes the algorithm by prioritizing deletion of nodes with a low degree first.
174+
- **`num_splits`**: Number of splits to generate (usually `20`, which translates to spectral parameters between `0` and `1` in intervals of `0.05`).
175+
- **`path_to_save`**: Location to store generated SPECTRA splits.
176+
- **`debug_mode`**: Controls the amount of information to output.
177+
178+
```python
179+
spg = Spectral_Property_Graph(FlattenedAdjacency("flattened_adjacency_matrix.pt"))
180+
spectra = [name]_spectra(dataset, spg)
181+
spectra.generate_spectra_split(spectra_param, degree_choosing = [True/False], num_splits = [int], path_to_save="", debug_mode = [True/False])
143182
```
144183

145184
### Step 4: Investigate generated SPECTRA splits
146185

147-
After SPECTRA has completed, the user should investigate the generated splits. Specifically ensuring that on average the cross-split overlap decreases as the spectral parameter increases. This can be achieved by using ```return_all_split_stats``` to gather the cross_split_overlap, train size, and test size of each generated split. Example outputs can be seen in the tutorials.
186+
After SPECTRA has completed, the user should investigate the generated splits. Specifically ensuring that on average the cross-split overlap decreases as the spectral parameter increases. This can be achieved by using ```return_all_split_stats``` to gather the cross_split_overlap, train size, and test size of each generated split. Example outputs can be seen in the tutorials. The path_to_save should be the same path you used in the previous step.
148187

149188
```python
150-
stats = init_spectra.return_all_split_stats()
151-
plt.scatter(stats['SPECTRA_parameter'], stats['cross_split_overlap'])
189+
spectra.return_all_split_stats(show_progress = True, path_to_save = save_path)
152190
```
153191

154192
## Spectra tutorials

0 commit comments

Comments
 (0)