Update README.md

yashaektefaie · web-flow · commit 6b41fe64ca27 · 2025-03-07T10:46:10.000-05:00
diff --git a/README.md b/README.md
@@ -44,6 +44,34 @@ Or alternatively run this command:
 
 Please note there is another package called spectra which is not related to this tool. Spectrae (which stands for spectral evaluation) implements the spectral framework for model evaluation.
 
+## Definition of terms
+
+This work and GitHub repository use terms related to the **spectral framework for model evaluation**. Below is a quick refresher on these key concepts.
+
+### **Spectral Property**
+Every dataset has an underlying property that, as it changes, causes model performance to decrease. This is referred to as the **spectral property**.  
+
+However, **not every property qualifies as a spectral property**.  
+For example:
+- When predicting protein structure, the performance of a protein folding model does **not** change based on the number of **M** amino acids in a sequence.
+- Instead, model performance **does** change based on **structural similarity**—this is an example of a **spectral property**.
+
+### **Spectral Property Graph (SPG)**
+For a given dataset, a **spectral property graph (SPG)** is defined as:
+- **Nodes**: Samples in the dataset.
+- **Edges**: Connections between samples that share a spectral property.
+
+Every SPG is defined by a flattened adjacency matrix, this saves memory and allowed SPECTRA to utilize GPUs to speed up computation.
+
+### **Spectral Parameter**
+The **spectral parameter** can be thought of as a **sparsification probability**.  
+
+When SPECTRA runs on an SPG:
+1. It selects a random node.
+2. It decides whether to **delete edges** with a certain probability—this probability is the **spectral parameter**.
+3. The closer the spectral parameter is to **1**, the **stricter** the splits generated by SPECTRA will be.
+
+
 ## How to use spectra
 
 ### Step 1: Define the spectral property, cross-split overlap, and the spectra dataset wrapper
@@ -86,7 +114,7 @@ class [Name]_Dataset(SpectraDataset):
         pass
 ```
 
-Spectra implements the user definition of the spectra property and cross split overlap.
+Spectra implements the user definition of the spectra property.
 
 
 ```python 
@@ -103,52 +131,62 @@ class [Name]_spectra(spectra):
         '''
         return similarity
 
-    def cross_split_overlap(self, train, test):
-        '''
-            Define this function to return the overlap between a list of train and test samples.
+```
+### Step 2: Initialize SPECTRA and calculate the flattened adjacency matrix
 
-            Example: Average pairwise similarity between train and test set protein sequences.
+1. **Initialize SPECTRA**  
+   - Initially, pass in no spectral property graph.
 
-        '''
-        
+2. **Pass SPECTRA and dataset into the `Spectra_Property_Graph_Constructor`**  
+   - Additional arguments:
+     - **`num_chunks`**: If your dataset is very large, you can split up the construction into chunks to allow multiple jobs to compute similarity. This parameter controls the number of chunks.
+     - **`binary`**: If `True`, the similarity returns either `0` or `1`; otherwise, it returns a floating-point number.
 
-        return cross_split_overlap
-```
-### Step 2: Initialize SPECTRA and precalculate pairwise spectral properties
+3. **Call `create_adjacency_matrix`**  
+   - This function takes in the **chunk number** to calculate:
+     - If `num_chunks = 0`, the pairwise similarity is calculated in one go, so the input to `create_adjacency_matrix` should be `0`.
+     - If `num_chunks = 10`, the input should be the chunk number you want to calculate (e.g., `0` to `9`).
+    
+4. **Combine the adjacency matrices**  
+   - Call `combine_adjacency_matrices()` in the graph constructor to combine all the adjacency matrices into a single matrix.
 
-Initialize SPECTRA, passing in True or False to the binary argument if the spectral property returns a binary or continuous value. Then precalculate the pairwise spectral properties.
 
 ```python
-init_spectra = [name]_spectra([name]_Dataset, binary = True)
-init_spectra.pre_calculate_spectra_properties([name])
+from spectrae import Spectral_Property_Graph_Constructor
+spectra = [name]_spectra([name]_Dataset, spg=None)
+construct_spg = Spectra_Property_Graph_Constructor(spectra, [name]_Dataset, num_chunks = 0, binary = [False/True])
+construct_spg.create_adjacency_matrix(0)
+construct_spg.combine_adjacency_matrices()
 ```
-### Step 3: Initialize SPECTRA and precalculate pairwise spectral properties
 
-Generate SPECTRA splits. The ```generate_spectra_splits``` function takes in 4 important parameters: 
-1. ```number_repeats```: the number of times to rerun SPECTRA for the same spectral parameter, the number of repeats must equal the number of seeds as each rerun uses a different seed. 
-2. ```random_seed```: the random seeds used by each SPECTRA rerun, [42, 44] indicates two reruns the first of which will use a random seed of 42, the second will use 44. 
-3. ```spectra_parameters```: the spectral parameters to run on, they must range from 0 to 1 and be string formatted to the correct number of significant figures to avoid float formatting errors.
-4. ```force_reconstruct```: True to force the model to regenerate SPECTRA splits even if they have already been generated.
 
+### Step 3: Generate SPECTRA Splits
 
-```python
-spectra_parameters = {'number_repeats': 3, 
-                      'random_seed': [42, 44, 46],
-                      'spectral_parameters': ["{:.2f}".format(i) for i in np.arange(0, 1.05, 0.05)],
-                      'force_reconstruct': True,
-                                              }
+1. **Initialize the Spectral Property Graph**  
+   - Pass in the flattened adjacency matrix you just generated to the Spectral_Property_Graph to create the spectral property graph.
 
-init_spectra.generate_spectra_splits(**spectra_parameters)
+2. **Recreate SPECTRA**  
+   - Use the SPECTRA dataset along with the created spectral property graph to reinstantiate SPECTRA.
 
+3. **Call `generate_spectra_split`** with the following arguments:  
+   - **`spectra_param`**: The spectral parameter to run, must be between `0` and `1` (inclusive).  
+   - **`degree_choosing`**: Only applicable to binary graphs; optimizes the algorithm by prioritizing deletion of nodes with a low degree first.  
+   - **`num_splits`**: Number of splits to generate (usually `20`, which translates to spectral parameters between `0` and `1` in intervals of `0.05`).  
+   - **`path_to_save`**: Location to store generated SPECTRA splits.  
+   - **`debug_mode`**: Controls the amount of information to output. 
+
+```python
+spg = Spectral_Property_Graph(FlattenedAdjacency("flattened_adjacency_matrix.pt"))
+spectra = [name]_spectra(dataset, spg)
+spectra.generate_spectra_split(spectra_param, degree_choosing = [True/False], num_splits = [int], path_to_save="", debug_mode = [True/False])
 ```
 
 ### Step 4: Investigate generated SPECTRA splits
 
-After SPECTRA has completed, the user should investigate the generated splits. Specifically ensuring that on average the cross-split overlap decreases as the spectral parameter increases. This can be achieved by using ```return_all_split_stats``` to gather the cross_split_overlap, train size, and test size of each generated split. Example outputs can be seen in the tutorials. 
+After SPECTRA has completed, the user should investigate the generated splits. Specifically ensuring that on average the cross-split overlap decreases as the spectral parameter increases. This can be achieved by using ```return_all_split_stats``` to gather the cross_split_overlap, train size, and test size of each generated split. Example outputs can be seen in the tutorials. The path_to_save should be the same path you used in the previous step.
 
 ```python
-stats = init_spectra.return_all_split_stats()
-plt.scatter(stats['SPECTRA_parameter'], stats['cross_split_overlap'])
+spectra.return_all_split_stats(show_progress = True, path_to_save = save_path)
 ```
 
 ## Spectra tutorials