-
Notifications
You must be signed in to change notification settings - Fork 1
BatchSOM
(syntaxis changed as of version 1.2)
This program implements the well-known Kohonen Self-Organizing Map using a training variant name "Batch training". It maps a set of high dimensional input vectors into a two-dimensional grid. For details seeT. Kohonen, Self-Organizing Maps, Second Edition, Springer-Verlag (1997)
$ classify_batch_som -i ...
Parameters
- ``The input data file (raw file). It should be a text file with each row representing the data items and each column representing the variables. It should have the following format:
3 1000 12 34 54 -12 45 76 ... 32 45 76
The first line indicates the dimension of the vectors (in this case 3) and the number of vectors (in this case 1000). Please note that vector components (variables) are separated by empty spaces. Additionally, the last column can also be used as a label for the vector. Example:
3 1000 12 34 54 labelA -12 45 76 labelB ... 32 45 76 labelN
- `` The output code vectors file. This parameter will set the base name for the generated output files.BatchSOM produces several files with different information and all of them will use this name but with different extensions. The generated files will be:
-
[basename].codresulting code vectors. The generated code vectors also follows the same format as the input data, except that a few extra information is also stored in the first line of the file. Example:
-
3 rect 10 7 gaussian 11 31 52 labelA -10 43 71 labelB ... 29 39 71 labelN
The first line first indicates the dimension of the vectors (in this case 3), the topology of the map (in this case rectangular), the XY dimension (in this case 10x7) and the "gaussian" label that is only there to be fully compatible with the Kohonen's SOM_PAKHttp://www.cis.hut.fi/research/som_lvq_pak.shtmlPackage
-
[basename].infInformation file about the parameters used and the resulting quantification error. It will look like this:
Kohonen BatchSOM algorithm Input data file : test.dat Code vectors output file : test.cod Algorithm information output file : test.inf Number of feature vectors: 150 Number of variables: 4 Horizontal dimension (Xdim) = 10 Vertical dimension (Ydim) = 7 Hexagonal topology Initial neighborhood radius (radius) = 10 Total number of iterations = 1000 Input data not normalized Quantization error : 0.349357
-
[basename].hisInformation about the number of input vectors assigned to each code vector. It is like an histogram of the resulting code vectors. The file contains two columns, the first column is the number of the code vector and the second column is the number of input vectors assigned to it -
[basename].errAverage quantization error for each code vector. The file contains two columns, the first column is the number of the code vector and the second column is the average quantization error for each codevector - `` The input code vectors file. This parameter is optional and it is useful when the code vectors are going to be initialized with a set of predefined values. Usually when a several runs of the algorithm are going to be used and the output of one run is going to be used as input to the next one.
- `` Save a file for each code vector with a list of the input items that were assigned to it. It will generate a file for each codevector containing a list of the indexes of the input vectors assigned to it. Example: If a 10x7 map is used, then 70 files named
[basename].[Codevector Index](`baseneme.0`,`basename.1`, etc) will be generated. - `` Horizontal size of the map
- `` Vertical size of the map
- `` Rectangular Topology (Default)
- `` Hexagonal Topology. The following picture will help in inderstanding the differences between both topologies and the map axis convention: Xdim is ------> HEXAGONAL: O O O O O O O O O
O O O & & & O O O O O & @ @ & O O O O O & @ + @ & O O O O & @ @ & O O O O O O & & & O O O O O O O O O O O O
RECTANGULAR: O O O O O O O O O 0 O O O & O O O O O O O & @ & O O O O O & @ + @ & O O O O O & @ & O O O O O O O & O O O O O O O O O O O O O
- `` Initial neighborhood radius (default = max(xdim, ydim)). This represent the set of neighboors that are going to be updated along with the winning node during training. It is is decreased during training. As default it will use the maximum value of the map dimensions (the whole map)
- `` Iterations number (Default = 1000)
- `` Normalize input data (Default = No)
- `` Information level while running:
- `` No information (default)
- `` Progress bar with the elapsed time and estimated time to finish
- `` Code vectors changes between iterations
Example 1: Maps a set of data stored in "test.dat" file into a 10x7 hexagonal map
$ classify_batch_som -i test.dat -o test -xdim 10 -ydim 7
In this case the following parameters are set by default:
Input data file : test.dat
Output file name : test
Horizontal dimension (Xdim) = 10
Vertical dimension (Ydim) = 7
Hexagonal topology
Initial neighborhood radius (radius) = 10
Total number of iterations = 10000
verbosity level = 0
Do not normalize input data
So, we are going to generate an 10x7 (-xdim 10 and-ydim 7) output map using 10000 iterations (-iter 10000). An hexagonal topology is going to be used (-hexa). An initial neighborhood radius of 10 (-radius 10). In this case no textual information will be given in the output console (-verb 0).
As results, theBatchSOM application will generate the following output files:
-
test.codThe final code vector file in the format described above -
test.infInformation file about the parameters used and the resulting quantification error -
test.hisInformation about the number of input vectors assigned to each code vector. It is like an histogram -
test.errAverage quantization error for each code vector
Example 2: Maps a set of data stored in "test.dat" file into a 10x7 rectangular map with other initialization values
$ classify_batch_som -i test.dat -o test -xdim 10 -ydim 7 -rect -radius 5 -norm -verb 1 -saveclusters
In this case the following parameters are set by default:
Input data file : test.dat
Output file name : test
Horizontal dimension (Xdim) = 10
Vertical dimension (Ydim) = 7
Rectangular topology
Initial neighborhood radius (radius) = 5
Total number of iterations = 10000
verbosity level = 1
Normalize input data
In this case we are going to generate an 10x7 (-xdim 10 and-ydim 7) output map using 10000 iterations (-iter 10000). A rectangular topology is going to be used (-rect). An initial neighborhood radius of 5 (-radius 5). In this case a progress bar and elpased/estimated time will be shown in the output console (-verb 1). Since the-saveclusters parameter is used, a list of input data assigned to each code vector is stored in thetest.0 totest.69 files
The following files are going to be generated:
-
test.codThe final code vector file in the format described above -
test.infInformation file about the parameters used and the resulting quantification error -
test.hisInformation about the number of input vectors assigned to each code vector. It is like an histogram -
test.errAverage quantization error for each code vector -
test.0totest.69Each file is a list of the input data vectors assigned to each codevector
--Main.AlfredoSolano - 24 Jan 2007