Skip to content

Commit cff8e60

Browse files
committed
Add training error visualization assets and comprehensive FAQ documentation
- Introduced a new image asset, `michelin_training_errors.png`, to visualize training errors in the SOM. - Added a detailed `faq.rst` file covering frequently asked questions about TorchSOM, including installation, data preprocessing, performance optimization, and visualization issues. - The FAQ enhances user understanding of the library's capabilities and provides guidance on common challenges and best practices for effective usage.
1 parent 8e69b1d commit cff8e60

File tree

3 files changed

+617
-0
lines changed

3 files changed

+617
-0
lines changed
236 KB
Loading

docs/source/faq.rst

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
Frequently Asked Questions
2+
==========================
3+
4+
General Questions
5+
-----------------
6+
7+
What is TorchSOM?
8+
~~~~~~~~~~~~~~~~
9+
10+
TorchSOM is a modern PyTorch-based implementation of Self-Organizing Maps (SOMs), designed for efficient training and comprehensive visualization of high-dimensional data clustering and analysis.
11+
12+
How does TorchSOM differ from other SOM implementations?
13+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14+
15+
TorchSOM offers several advantages:
16+
17+
- **GPU acceleration** through PyTorch
18+
- **Modern Python practices** with type hints and Pydantic validation
19+
- **Comprehensive visualization suite** with matplotlib integration
20+
- **Flexible architecture** supporting multiple SOM variants
21+
22+
Installation and Setup
23+
----------------------
24+
25+
Which Python versions are supported?
26+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
27+
28+
We recommend using Python 3.9+.
29+
30+
Do I need a GPU to use TorchSOM?
31+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32+
33+
No, TorchSOM works on both CPU and GPU.
34+
However, GPU acceleration significantly improves training speed for large datasets and maps.
35+
We recommend using a GPU for training.
36+
37+
Data Preprocessing
38+
-----------------
39+
40+
Should I always normalize my data?
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
43+
Yes, normalization is crucial because:
44+
45+
- Features with larger scales dominate the distance calculation
46+
- SOM learning is sensitive to feature magnitudes
47+
- StandardScaler or MinMaxScaler from scikit-learn both work well
48+
49+
What about categorical features?
50+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51+
52+
SOMs operate exclusively on numerical data. Therefore, it is essential to convert any categorical features into a numerical format before using them with TorchSOM. Common strategies include:
53+
54+
1. **One-hot encoding** for nominal (unordered) categories
55+
2. **Ordinal encoding** for ordered categories
56+
3. **Target or frequency encoding** for high-cardinality categories
57+
58+
If your dataset contains a mix of numerical and categorical features, ensure all features are numerically encoded prior to training.
59+
60+
Similarly, when visualizing classification or label maps, assign numerical levels to each class or category to enable proper mapping and interpretation in the visualization outputs.
61+
62+
Performance and Optimization
63+
----------------------------
64+
65+
My training is very slow. How can I speed it up?
66+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+
Try these optimizations:
69+
70+
1. **Enable GPU**: Use ``device="cuda"`` if available
71+
2. **Increase batch size**: Try 64, 128, or 256
72+
3. **Reduce map size**: Start smaller and scale up
73+
4. **Use PCA initialization**: ``initialization_mode="pca"``
74+
5. **Reduce epochs**: Monitor convergence and stop early
75+
76+
How much memory does TorchSOM use?
77+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
78+
79+
Memory usage depends on:
80+
81+
- **Map size**: O(x × y × num_features)
82+
- **Batch size**: Larger batches use more memory
83+
- **Data size**: Keep datasets in reasonable sizes
84+
85+
For large datasets, consider:
86+
- Processing in batches
87+
- Using CPU instead of GPU
88+
- Reducing precision (float32 vs float64)
89+
90+
Visualization Issues
91+
-------------------
92+
93+
Why are some neurons white in my visualizations?
94+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95+
96+
White neurons typically indicate:
97+
98+
- **Unactivated neurons**: No data points assigned as BMU
99+
- **Zero values**: In some visualizations, zero values appear white
100+
- **NaN values**: Missing or invalid calculations
101+
102+
This is normal for sparse data or oversized maps.
103+
104+
How do I interpret the distance map (D-Matrix)?
105+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106+
107+
In the D-Matrix:
108+
109+
- **Light areas**: High distances between neighboring neurons (cluster boundaries)
110+
- **Dark areas**: Low distances (within clusters)
111+
- **Patterns**: Reveal cluster structure and boundaries
112+
113+
Can I customize the visualization colors?
114+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115+
116+
Yes! Use the VisualizationConfig:
117+
118+
.. code-block:: python
119+
120+
from torchsom.visualization.config import VisualizationConfig
121+
122+
config = VisualizationConfig(
123+
cmap="plasma", # Use a different colormap
124+
figsize=(15, 10), # Set larger figure size
125+
dpi=300 # Set higher resolution
126+
)
127+
128+
Advanced Topics
129+
--------------
130+
131+
Can I use TorchSOM for time series data?
132+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133+
134+
TorchSOM is designed to work with tabular data, meaning any data type—such as time series, images, or text—can be used as long as it is represented in a tabular (2D array) format.
135+
This typically means that each sample should be a fixed-length feature vector.
136+
137+
For time series or other complex data types, you can preprocess your data to obtain such representations.
138+
Common approaches include extracting statistical features, flattening fixed-length windows, or generating embeddings (e.g., using autoencoders or other neural networks) before projecting them onto the SOM map.
139+
As long as your data can be converted into a matrix of shape `[n_samples, n_features]`, it can be used with TorchSOM.
140+
141+
How do I implement custom distance functions?
142+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143+
144+
Create a function following the signature:
145+
146+
.. code-block:: python
147+
148+
def custom_distance(data, weights):
149+
"""
150+
Args:
151+
data: [batch_size, 1, 1, n_features]
152+
weights: [1, row_neurons, col_neurons, n_features]
153+
Returns:
154+
distances: [batch_size, row_neurons, col_neurons]
155+
"""
156+
# Your custom distance calculation
157+
return distances
158+
159+
Can I save and load trained SOMs?
160+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161+
162+
Yes, use PyTorch's standard mechanisms:
163+
164+
.. code-block:: python
165+
166+
# Save
167+
torch.save(som.state_dict(), 'som_weights.pth')
168+
169+
# Load
170+
som = SOM(x=10, y=10, num_features=4)
171+
som.load_state_dict(torch.load('som_weights.pth'))
172+
173+
Integration Questions
174+
--------------------
175+
176+
How do I cite TorchSOM in my research?
177+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
178+
179+
Please cite TorchSOM as:
180+
181+
.. code-block:: bibtex
182+
183+
# GitHub Repository
184+
@software{Berthier_TorchSOM_The_Reference_2025,
185+
author={Berthier, Louis},
186+
title={TorchSOM: The Reference PyTorch Library for Self-Organizing Maps},
187+
url={https://github.com/michelin/TorchSOM},
188+
version={1.0.0},
189+
year={2025}
190+
}
191+
192+
# Conference Paper
193+
@inproceedings{Berthier2025TorchSOM,
194+
title={TorchSOM: A Scalable PyTorch-Compatible Library for Self-Organizing Maps},
195+
author={Berthier, Louis},
196+
booktitle={Conference Name},
197+
year={2025}
198+
}
199+
200+
Getting Help
201+
-----------
202+
203+
Where can I get more help?
204+
~~~~~~~~~~~~~~~~~~~~~~~~~
205+
206+
1. **`Documentation <https://opensource.michelin.io/TorchSOM/>`_**: Check our comprehensive guides
207+
2. **`GitHub Issues <https://github.com/michelin/TorchSOM/issues>`_**: Report bugs and request features
208+
3. **`Notebooks <https://github.com/michelin/TorchSOM/tree/main/notebooks>`_**: See our tutorial notebooks.
209+
210+
How do I report a bug?
211+
~~~~~~~~~~~~~~~~~~~~
212+
213+
Please include:
214+
215+
1. **TorchSOM version**: ``torchsom.__version__``
216+
2. **Python version**: ``python --version``
217+
3. **PyTorch version**: ``torch.__version__``
218+
4. **Operating system**: Linux/macOS/Windows
219+
5. **Minimal reproduction example**
220+
6. **Full error traceback**
221+
222+
Can I contribute to TorchSOM?
223+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
224+
225+
Yes! We welcome contributions:
226+
227+
1. **Fork** the repository
228+
2. **Create** a feature branch
229+
3. **Add tests** for new functionality
230+
4. **Submit** a pull request
231+
5. **Follow** our coding standards
232+
233+
See our `contributing guide <https://github.com/michelin/TorchSOM/blob/main/CONTRIBUTING.md>`_ for details.

0 commit comments

Comments
 (0)