Singular Value Decomposition is the method of finding the three components
MocUT SVD is an efficient, easy-to-use and highly portable stand-alone-implementation for C or C++ programs. It was completely redesigned from scratch and contains several improvements compared to traditional implementations. It is specially optimized for modern CPU architectures.
You only need the two files mocutsvd.h and mocutsvd.c.
Include mocutsvd.h in your program (C or C++ code) and compile mocutsvd.c among your sources.
#include "mocutsvd.h"
....
// Creates (20 x 10)-matrix A
mocut_mat_s* a = mocut_mat_s_create_alloc( 20, 10 );
// Fills matrix A with data ...
for( size_t i = 0; i < a->rows; i++ )
for( size_t j = 0; j < a->cols; j++ )
mocut_mat_s_set( a, i, j, /* your value at position [i][j] */ );
mocut_mat_s* u = mocut_mat_s_create(); // Matrix U*
mocut_mat_s* v = mocut_mat_s_create(); // Matrix V*
// SVD: A -> U*, Σ, V*; computing singular vectors U*, V* is optional; pass NULL where not needed.
mocut_svd( a, u, v );
// At this point:
// - 'a' represents Σ. The diagonal elements of 'a' contain singular values in descending order.
// - 'u', 'v', represent the matrices U*, V*:
// - U*: (10 x 20); its row vectors represent the left singular vectors
// - V*: (10 x 10); its row vectors represent the right singular vectors
// Access any matrix element via function mocut_mat_s_get or mocut_mat_s_ptr
....
// memory cleanup
mocut_mat_s_discard( a );
mocut_mat_s_discard( u );
mocut_mat_s_discard( v );
// For more details: See example.c
// For a comprehensive performance test: See test.c-
Compile with gcc and flags below or use compatible flags of your preferred compiler-tool-chain.
-
Compiler/Linker Flags (gcc):
- Required:
-lm. - Recommended for general optimization:
-O3 - Recommended for advanced CPU specific optimization:
-march=native. - Recommended to unlock outer parallelity:
-fopenmp(both: compiler and linker need this flag)
- Required:
-
Example:
bash $ gcc -o mocutsvd_example example.c mocutsvd.c -fopenmp -march=native -O3 -lm -
There is also a simple makefile, which you might want to inspect.
example.c contains a short and easy to learn example application. It outputs the decomposition to stdout:
$ make mocutsvd_example
$ # run on 5 x 10 matrix
$ ./mocutsvd_example 5 10test.c can be used for performance testing on larger matrices. It creates a random matrix, decomposes it with a time measurement, reconstructs the original form the decomposed factors and computes the RMS-error of the reconstruction:
$ make mocutsvd_test
$ # running without arguments outputs all options:
$ ./mocutsvd_test
$ # runs a test on a 10000 x 10000 matrix:
$ ./mocutsvd_test 10000 10000 - Stable: MocUT SVD has same or better convergence stability as the widely used Golub-Kahan-Reinsch SVD approach.
- True-Scalable: Fast and efficient computation for small up to very large matrices.
- Parallel: With (inner and outer) parallelity. By default all available CPU-cores and CPU-threads are used.
- Memory Efficient: Operates within the matrix-provided memory space and allocates no extra heap memory.
- Alignment: The Matrix Interface takes care of optimal data alignment.
- Portable: Platform-agnostic design.
MocUT SVD is an algorithm for singular value decomposition. Given a matrix
MocUT is a shortcut for Monoclinic Unitary Transformation, which is a special kind of recurring transformation pattern I designed for this SVD solution. More details can be found in the whitepaper: MocUT SVD: Singular Value Decomposition via Monoclinic Unitary Transformations.
MocUT SVD is adapted to optimally utilize modern CPU architectures. Specifically: Multiple-cores, multi-layered caching, intrinsic vectorization (inner paralelity) and hyper-threading.
At the same time the code maintains high portability: It only requires compliance to the C11 (or later) standard. Hence, the code is compilable by todays compiler-tool-chains for most operating systems and platforms.
Portability is achieved by utilizing generic coding paradigms that help the compiler applying platform specific optimizations. Outer parallelity is achieved via Open MP.
MocUT SVD generates the matrices of singular vectors in their transposed form: mocut_mat_s_copy_transposed.
An error condition is normally communicated by an integer error code as return value of a function. Returning 0 means No Error. Other conditions could be
- Incorrect usage of parameters.
- Not enough memory to allocate a matrix.
- (Warning) SVD function did not converge.
The following function converts an error value to a descriptive text.
const char* mocut_err_text( int err_code )- Converts an error code to a descriptive 0-terminated text string.
The matrix is represented by structure mocut_mat_s, which contains the following user-readable elements:
size_t rows: Number of rows of the matrixsize_t cols: Number of columns of the matrixsize_t stride: Number of elements between the start of two successive rows. It isstride >= cols.double* data: Location of matrix data in memory.
The matrix uses a 'strided row-major' data-layout. This means that the element [i][j] is accessed as data[ i * stride + j ];
The stride value is set automatically for optimal alignment during matrix-allocation. The user may also setup a matrix manually, referencing external data via matrix-setup function.
-
mocut_mat_s* mocut_mat_s_create( void );-
Constructs an empty instance. For destruction, see
mocut_mat_s_discard. -
Return:
- Pointer to constructed matrix.
- NULL in case matrix could not be constructed.
-
-
void mocut_mat_s_discard( mocut_mat_s* );- Destructs a matrix. Frees all memory this matrix owns. For construction, see
mocut_mat_s_create.
- Destructs a matrix. Frees all memory this matrix owns. For construction, see
-
void mocut_mat_s_init( mocut_mat_s* );- Initializes an empty matrix instance that was created on the stack.
- To tear it down, call
mocut_mat_s_down. - Do not use this function when you create a matrix via
mocut_mat_s_create.
-
void mocut_mat_s_down( mocut_mat_s* );- Tears down a matrix that was initialized by
mocut_mat_s_init. - Do not use this function when the matrix was created via
mocut_mat_s_create.
- Tears down a matrix that was initialized by
-
int mocut_mat_s_alloc( mocut_mat_s* o, size_t rows, size_t cols );- Allocates a (rows x cols )-matrix and initializes all values to zero.
- This function takes care of proper data-alignment for optimal SVD-performance.
- Return:
0: Success>0: Error Code (Allocation failed).
-
int mocut_mat_s_set( mocut_mat_s* o, size_t row, size_t col, double value );- Sets element value at position
[row][col]. - Return:
0: Success>0: Error (Index out of rage).
- Sets element value at position
-
double mocut_mat_s_get( const mocut_mat_s* o, size_t row, size_t col );- Returns element value at position
[row][col]. Ifroworcolis out of range,0is returned.
- Returns element value at position
-
double* mocut_mat_s_ptr( const mocut_mat_s* o, size_t row, size_t col );- Returns a pointer to element value at position
[row][col]for fast, compiler-optimizable read or write access. - Same as:
o->data[ i * o->stride + j ] - Note: This function has no boundary check for
row,col!
- Returns a pointer to element value at position
-
void mocut_mat_s_clear( mocut_mat_s* o );- Clears memory from the latest allocation without destroying the matrix.
-
int mocut_mat_s_setup( mocut_mat_s* o, size_t rows, size_t cols, size_t stride, double* data )- Makes the matrix use external data. The external matrix layout must be compatible to the
mocutlayout. The external matrix data is neither copied nor owned bymocut_mat_s. It will not be freed on matrix destruction. The external data must stay alive and valid during the lifetime of themocut_mat_sinstance. - The purpose of this function is to simplify the integration of
mocutsvdinto a codebase that prefers using its own matrix representation. - Note: Using
mocut_mat_s_setupwith unaligned external data is allowed but can limit the effectiveness of cache and vectorization in functionmocut_svd. - Return:
0: Success>0: Error Code.
- Makes the matrix use external data. The external matrix layout must be compatible to the
-
mocut_mat_s* mocut_mat_s_create_alloc( size_t rows, size_t cols )- Convenience function: Combination of
mocut_mat_s_createandmocut_mat_s_alloc. Returns NULL in case of error.
- Convenience function: Combination of
-
mocut_mat_s* mocut_mat_s_create_setup( size_t rows, size_t cols, size_t stride, double* data )- Convenience function: Combination of
mocut_mat_s_createandmocut_mat_s_setup. Returns NULL in case of error.
- Convenience function: Combination of
-
mocut_mat_s* mocut_mat_s_create_setup( size_t rows, size_t cols, size_t stride, double* data )- Convenience function: Combination of
mocut_mat_s_createandmocut_mat_s_setup. Returns NULL in case of error.
- Convenience function: Combination of
-
int mocut_mat_s_copy( mocut_mat_s* o, const mocut_mat_s* m )- Copies the matrix data from
mtoo. - Both matrices must be allocated to the same size:
(o.rows==m.rows) && (o.cols==m.cols) - Return:
0: Success>0: Error Code.
- Copies the matrix data from
-
int mocut_mat_s_copy_transposed( mocut_mat_s* o, const mocut_mat_s* m )- Copies the transposed matrix data from
mtoo. - Both matrices must be allocated to the respective transposed size:
(o.rows==m.cols) && (o.cols==m.rows) - In case of a square matrix
(rows == cols),oandmmay reference the same matrix: In-place transposition. - Return:
0: Success>0: Error Code.
- Copies the transposed matrix data from
int mocut_svd( mocut_mat_s* a, mocut_mat_s* u, mocut_mat_s* v );
mocut_svd performs the thin SVD on a (m x n)-Matrix:
The matrices are being modified during execution. Matrix a must be initialized as
Arguments u, v are optional. They represent NULL as argument when not needed.
If matrix-alloc or matrix-setup).
If it is empty, mocut_svd will allocate it to the correct size. If it is non-empty, the size must be (k x m) for u and (k x n) for v with k = min(m,n).
The final phase of the SVD is an iterative process that converges to the correct result. In the rare case of non-convrgence, SVD returns the warning MOCUT_WRN_CONVERGENCE. It indicates that the reconstruction
0: Success- MOCUT_WRN_CONVERGENCE: Insufficient convergence
Other >0: See Error Code.
MocUT-functions are thread-safe, provided the data passed as argument is not shared with other threads while the function is running.
Although mocut_svd may spawn its own threads, those are completely shielded from any multi-threaded environment you might be using in your application.
MocUT SVD offers some customization around memory handling.
Function mocut_svd uses only the memory space provided by the matrices. Per default the matrix interface uses stdlib functions aligned_alloc and free for allocation and destruction. Alternatively, you can declare your custom memory functions or supply external memory to a matrix via function mocut_mat_s_setup. You can also control memory alignment.
This opens possibilities for platforms with limited or non-standard memory management:
If you wish to use a specific memory manager, define macros MOCUT_MEM_ALLOC( size ) and MOCUT_MEM_FREE( data ) before including mocutsvd.h.
MOCUT_MEM_ALLOC( size ) should represent a dynamic allocation function similar to stdlib's malloc.
MOCUT_MEM_FREE( data ) should represent a memory release function similar to stdlib's free.
In the example below, the memory manager TBMAN would be used:
#define MOCUT_MEM_ALLOC( size ) tbman_malloc( size )
#define MOCUT_MEM_FREE( data ) tbman_free( data )
#include "mocutsvd.h"If you wish to prevent MocUT SVD from using any memory management at all, define MOCUT_NO_MEM_ALLOC before including mocutsvd.h.
#define MOCUT_NO_MEM_ALLOC
#include "mocutsvd.h"In this case you cannot use functions mocut_mat_s_create, mocut_mat_s_discard, mocut_mat_s_alloc.
Instead, place the matrix instance on the stack and use functions mocut_mat_s_init, mocut_mat_down, mocut_mat_setup.
mocut_mat_s a; // instance a placed on the stack
mocut_mat_s_init( &a ); // instance a is initialized
mocut_mat_s_setup( &a, rows, cols, stride, data ); // assigning an external matrix data area
... // working with a
mocut_mat_s_down( &a ); // instance a is cleaned upNote: Without memory management, alignment is your responsibility.
MocUT SVD aligns matrix data to improve on inner parallelity and cache usage for most processors. If you wish to experiment with your own custom alignment, define MOCUT_VAL_ALIGN with a constant indicating the alignment in bytes before including mucutsvd.h:
#define MOCUT_VAL_ALIGN 32 // aligns matrix rows to multiple of 32 double values (32*8 bytes)By default MocUT SVD spawns multiple threads according to the number of logical CPUs on the platform.
-
If you wish to run it only in the caller thread, avoid using the flag
-fopenmpwith compiler and linker. -
If you want to prevent MocUT SVD using OpenMP at all, comment out the compiler directives
#pragma omp parallel for
in mocutsvd.h and mocutsvd.c
- If you wish more specific thread controls, look up the Open MP documentation: https://www.openmp.org
On very large matrices, the function mocut_svd will put prolonged load on the CPU, running it near its rated power limits. Correctly configured machines can handle this type of load. Some platforms permit overriding safety limits at the users own risk (e.g. via overclocking, overvolting). Be aware that inadequately configured safety measures might cause malfunction or damage under high CPU load.
Advanced debugging tools like valgrind analyze the instructions and memory usage of a program:
-
Valgrind might not recognize newer native CPU instructions, which occur with certain native compiler optimizations (such as
-march=native). -
Valgrind might not be compatible to a compilers integration of OpenMP (
-fopenmp).
To analyze code with valgrind, do not use these compiler flags.
Absolute processing time in seconds for a full decomposition of a (
The following CPUs were used:
TR 7960: AMD Ryzen™ Threadripper™ 7960X, containing 24 coresRZ 5900: AMD Ryzen™ 9 5900X, containing 12 coresi7 7700: Intel® Core™ i7-7700, containing 4 cores
