|
| 1 | +Frequently Asked Questions |
| 2 | +========================== |
| 3 | + |
| 4 | +General Questions |
| 5 | +----------------- |
| 6 | + |
| 7 | +What is TorchSOM? |
| 8 | +~~~~~~~~~~~~~~~~ |
| 9 | + |
| 10 | +TorchSOM is a modern PyTorch-based implementation of Self-Organizing Maps (SOMs), designed for efficient training and comprehensive visualization of high-dimensional data clustering and analysis. |
| 11 | + |
| 12 | +How does TorchSOM differ from other SOM implementations? |
| 13 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 14 | + |
| 15 | +TorchSOM offers several advantages: |
| 16 | + |
| 17 | +- **GPU acceleration** through PyTorch |
| 18 | +- **Modern Python practices** with type hints and Pydantic validation |
| 19 | +- **Comprehensive visualization suite** with matplotlib integration |
| 20 | +- **Flexible architecture** supporting multiple SOM variants |
| 21 | + |
| 22 | +Installation and Setup |
| 23 | +---------------------- |
| 24 | + |
| 25 | +Which Python versions are supported? |
| 26 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 27 | + |
| 28 | +We recommend using Python 3.9+. |
| 29 | + |
| 30 | +Do I need a GPU to use TorchSOM? |
| 31 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 32 | + |
| 33 | +No, TorchSOM works on both CPU and GPU. |
| 34 | +However, GPU acceleration significantly improves training speed for large datasets and maps. |
| 35 | +We recommend using a GPU for training. |
| 36 | + |
| 37 | +Data Preprocessing |
| 38 | +----------------- |
| 39 | + |
| 40 | +Should I always normalize my data? |
| 41 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 42 | + |
| 43 | +Yes, normalization is crucial because: |
| 44 | + |
| 45 | +- Features with larger scales dominate the distance calculation |
| 46 | +- SOM learning is sensitive to feature magnitudes |
| 47 | +- StandardScaler or MinMaxScaler from scikit-learn both work well |
| 48 | + |
| 49 | +What about categorical features? |
| 50 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 51 | + |
| 52 | +SOMs operate exclusively on numerical data. Therefore, it is essential to convert any categorical features into a numerical format before using them with TorchSOM. Common strategies include: |
| 53 | + |
| 54 | +1. **One-hot encoding** for nominal (unordered) categories |
| 55 | +2. **Ordinal encoding** for ordered categories |
| 56 | +3. **Target or frequency encoding** for high-cardinality categories |
| 57 | + |
| 58 | +If your dataset contains a mix of numerical and categorical features, ensure all features are numerically encoded prior to training. |
| 59 | + |
| 60 | +Similarly, when visualizing classification or label maps, assign numerical levels to each class or category to enable proper mapping and interpretation in the visualization outputs. |
| 61 | + |
| 62 | +Performance and Optimization |
| 63 | +---------------------------- |
| 64 | + |
| 65 | +My training is very slow. How can I speed it up? |
| 66 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 67 | + |
| 68 | +Try these optimizations: |
| 69 | + |
| 70 | +1. **Enable GPU**: Use ``device="cuda"`` if available |
| 71 | +2. **Increase batch size**: Try 64, 128, or 256 |
| 72 | +3. **Reduce map size**: Start smaller and scale up |
| 73 | +4. **Use PCA initialization**: ``initialization_mode="pca"`` |
| 74 | +5. **Reduce epochs**: Monitor convergence and stop early |
| 75 | + |
| 76 | +How much memory does TorchSOM use? |
| 77 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 78 | + |
| 79 | +Memory usage depends on: |
| 80 | + |
| 81 | +- **Map size**: O(x × y × num_features) |
| 82 | +- **Batch size**: Larger batches use more memory |
| 83 | +- **Data size**: Keep datasets in reasonable sizes |
| 84 | + |
| 85 | +For large datasets, consider: |
| 86 | +- Processing in batches |
| 87 | +- Using CPU instead of GPU |
| 88 | +- Reducing precision (float32 vs float64) |
| 89 | + |
| 90 | +Visualization Issues |
| 91 | +------------------- |
| 92 | + |
| 93 | +Why are some neurons white in my visualizations? |
| 94 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 95 | + |
| 96 | +White neurons typically indicate: |
| 97 | + |
| 98 | +- **Unactivated neurons**: No data points assigned as BMU |
| 99 | +- **Zero values**: In some visualizations, zero values appear white |
| 100 | +- **NaN values**: Missing or invalid calculations |
| 101 | + |
| 102 | +This is normal for sparse data or oversized maps. |
| 103 | + |
| 104 | +How do I interpret the distance map (D-Matrix)? |
| 105 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 106 | + |
| 107 | +In the D-Matrix: |
| 108 | + |
| 109 | +- **Light areas**: High distances between neighboring neurons (cluster boundaries) |
| 110 | +- **Dark areas**: Low distances (within clusters) |
| 111 | +- **Patterns**: Reveal cluster structure and boundaries |
| 112 | + |
| 113 | +Can I customize the visualization colors? |
| 114 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 115 | + |
| 116 | +Yes! Use the VisualizationConfig: |
| 117 | + |
| 118 | +.. code-block:: python |
| 119 | +
|
| 120 | + from torchsom.visualization.config import VisualizationConfig |
| 121 | + |
| 122 | + config = VisualizationConfig( |
| 123 | + cmap="plasma", # Use a different colormap |
| 124 | + figsize=(15, 10), # Set larger figure size |
| 125 | + dpi=300 # Set higher resolution |
| 126 | + ) |
| 127 | +
|
| 128 | +Advanced Topics |
| 129 | +-------------- |
| 130 | + |
| 131 | +Can I use TorchSOM for time series data? |
| 132 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 133 | + |
| 134 | +TorchSOM is designed to work with tabular data, meaning any data type—such as time series, images, or text—can be used as long as it is represented in a tabular (2D array) format. |
| 135 | +This typically means that each sample should be a fixed-length feature vector. |
| 136 | + |
| 137 | +For time series or other complex data types, you can preprocess your data to obtain such representations. |
| 138 | +Common approaches include extracting statistical features, flattening fixed-length windows, or generating embeddings (e.g., using autoencoders or other neural networks) before projecting them onto the SOM map. |
| 139 | +As long as your data can be converted into a matrix of shape `[n_samples, n_features]`, it can be used with TorchSOM. |
| 140 | + |
| 141 | +How do I implement custom distance functions? |
| 142 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 143 | + |
| 144 | +Create a function following the signature: |
| 145 | + |
| 146 | +.. code-block:: python |
| 147 | +
|
| 148 | + def custom_distance(data, weights): |
| 149 | + """ |
| 150 | + Args: |
| 151 | + data: [batch_size, 1, 1, n_features] |
| 152 | + weights: [1, row_neurons, col_neurons, n_features] |
| 153 | + Returns: |
| 154 | + distances: [batch_size, row_neurons, col_neurons] |
| 155 | + """ |
| 156 | + # Your custom distance calculation |
| 157 | + return distances |
| 158 | +
|
| 159 | +Can I save and load trained SOMs? |
| 160 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 161 | + |
| 162 | +Yes, use PyTorch's standard mechanisms: |
| 163 | + |
| 164 | +.. code-block:: python |
| 165 | +
|
| 166 | + # Save |
| 167 | + torch.save(som.state_dict(), 'som_weights.pth') |
| 168 | + |
| 169 | + # Load |
| 170 | + som = SOM(x=10, y=10, num_features=4) |
| 171 | + som.load_state_dict(torch.load('som_weights.pth')) |
| 172 | +
|
| 173 | +Integration Questions |
| 174 | +-------------------- |
| 175 | + |
| 176 | +How do I cite TorchSOM in my research? |
| 177 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 178 | + |
| 179 | +Please cite TorchSOM as: |
| 180 | + |
| 181 | +.. code-block:: bibtex |
| 182 | +
|
| 183 | + # GitHub Repository |
| 184 | + @software{Berthier_TorchSOM_The_Reference_2025, |
| 185 | + author={Berthier, Louis}, |
| 186 | + title={TorchSOM: The Reference PyTorch Library for Self-Organizing Maps}, |
| 187 | + url={https://github.com/michelin/TorchSOM}, |
| 188 | + version={1.0.0}, |
| 189 | + year={2025} |
| 190 | + } |
| 191 | + |
| 192 | + # Conference Paper |
| 193 | + @inproceedings{Berthier2025TorchSOM, |
| 194 | + title={TorchSOM: A Scalable PyTorch-Compatible Library for Self-Organizing Maps}, |
| 195 | + author={Berthier, Louis}, |
| 196 | + booktitle={Conference Name}, |
| 197 | + year={2025} |
| 198 | + } |
| 199 | +
|
| 200 | +Getting Help |
| 201 | +----------- |
| 202 | + |
| 203 | +Where can I get more help? |
| 204 | +~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 205 | + |
| 206 | +1. **`Documentation <https://opensource.michelin.io/TorchSOM/>`_**: Check our comprehensive guides |
| 207 | +2. **`GitHub Issues <https://github.com/michelin/TorchSOM/issues>`_**: Report bugs and request features |
| 208 | +3. **`Notebooks <https://github.com/michelin/TorchSOM/tree/main/notebooks>`_**: See our tutorial notebooks. |
| 209 | + |
| 210 | +How do I report a bug? |
| 211 | +~~~~~~~~~~~~~~~~~~~~ |
| 212 | + |
| 213 | +Please include: |
| 214 | + |
| 215 | +1. **TorchSOM version**: ``torchsom.__version__`` |
| 216 | +2. **Python version**: ``python --version`` |
| 217 | +3. **PyTorch version**: ``torch.__version__`` |
| 218 | +4. **Operating system**: Linux/macOS/Windows |
| 219 | +5. **Minimal reproduction example** |
| 220 | +6. **Full error traceback** |
| 221 | + |
| 222 | +Can I contribute to TorchSOM? |
| 223 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 224 | + |
| 225 | +Yes! We welcome contributions: |
| 226 | + |
| 227 | +1. **Fork** the repository |
| 228 | +2. **Create** a feature branch |
| 229 | +3. **Add tests** for new functionality |
| 230 | +4. **Submit** a pull request |
| 231 | +5. **Follow** our coding standards |
| 232 | + |
| 233 | +See our `contributing guide <https://github.com/michelin/TorchSOM/blob/main/CONTRIBUTING.md>`_ for details. |
0 commit comments