Skip to content

OldCrow/libstats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

libstats - Modern C++20 Statistical Distributions Library

CI codecov C++20 CMake License Safety Performance

A modern, high-performance C++20 statistical distributions library providing comprehensive statistical functionality with enterprise-grade safety features and zero external dependencies.

πŸ“– Complete Documentation: For detailed information about building, architecture, parallel processing, and platform support, see the comprehensive guides below.

Features

🎯 Complete Statistical Interface

  • PDF/CDF/Quantiles: Full probability density, cumulative distribution, and quantile functions
  • Statistical Moments: Mean, variance, skewness, kurtosis with thread-safe access
  • Random Sampling: Integration with std:: distributions for high-quality random number generation
  • Parameter Estimation: Maximum Likelihood Estimation (MLE) with comprehensive diagnostics
  • Ststistical Validation: KS and AD Goodness-of-Fit, model selection

πŸ“Š Available Distributions

  • Gaussian (Normal): N(ΞΌ, σ²) - The cornerstone of statistics βœ…
  • Exponential: Exp(Ξ») - Waiting times and reliability analysis βœ…
  • Uniform: U(a, b) - Continuous uniform random variables βœ…
  • Poisson: P(Ξ») - Count data and rare events βœ…
  • Discrete: Custom discrete distributions with arbitrary support βœ…
  • Gamma: Ξ“(Ξ±, Ξ²) - Positive continuous variables βœ…

⚑ Modern C++20 Design

  • Thread-Safe: Concurrent read access with safe cache management
  • Zero Dependencies: Only standard library required
  • SIMD Optimized: Vectorized operations for bulk calculations
  • Memory Safe: RAII principles and smart pointer usage
  • Exception Safe: Robust error handling throughout
  • C++20 Concepts: Type-safe mathematical function interfaces
  • Parallel Processing: Traditional and work-stealing thread pools

πŸ›‘οΈ Safety & Numerical Stability

  • Memory Safety: Comprehensive bounds checking and overflow protection
  • Numerical Stability: Safe mathematical operations and edge case handling
  • Error Recovery: Multiple strategies for handling numerical failures
  • Convergence Detection: Advanced monitoring for iterative algorithms
  • Diagnostics: Automated numerical health assessment

πŸ§ͺ Statistical Validation

  • Goodness-of-Fit Tests: Kolmogorov-Smirnov, Anderson-Darling (βœ… implemented)
  • Model Selection: AIC/BIC information criteria (βœ… implemented)
  • Residual Analysis: Standardized residuals and diagnostics (βœ… implemented)
  • Cross-Validation: K-fold validation framework (βœ… implemented)

πŸš€ Performance Features

  • SIMD Operations: Vectorized statistical computations with cross-platform detection
  • Parallel Processing: Both traditional and work-stealing thread pools
  • C++20 Parallel Algorithms: Safe wrappers for std::execution policies
  • Cache Optimization: Thread-safe caching with lock-free fast paths

πŸ“– Cross-Platform SIMD Support: Automatic detection and optimization for SSE2/AVX/AVX2/NEON instruction sets with runtime safety verification.

Quick Start

Quick Build

git clone https://github.com/OldCrow/libstats.git
cd libstats
mkdir build && cd build
cmake ..                    # Auto-detects optimal configuration
make -j$(nproc)            # Parallel build with auto-detected core count
ctest --output-on-failure  # Run tests

πŸ“– For complete build information, including cross-platform support, SIMD optimization, and advanced configuration options, see docs/BUILD_SYSTEM_GUIDE.md.

Basic Usage

#include "libstats.h"
#include <iostream>
#include <vector>

int main() {
    // Initialize performance systems (recommended)
    libstats::initialize_performance_systems();

    // Create distributions with safe factory methods
    auto gaussian_result = libstats::GaussianDistribution::create(0.0, 1.0);
    if (gaussian_result.isOk()) {
        auto& gaussian = gaussian_result.value;

        // Single-value operations
        std::cout << "PDF at 1.0: " << gaussian.getProbability(1.0) << std::endl;
        std::cout << "CDF at 1.0: " << gaussian.getCumulativeProbability(1.0) << std::endl;

        // High-performance batch operations (auto-optimized)
        std::vector<double> values(10000);
        std::vector<double> results(10000);
        std::iota(values.begin(), values.end(), -5.0);

        gaussian.getProbability(std::span<const double>(values),
                               std::span<double>(results));

        std::cout << "Processed " << values.size() << " values with auto-optimization" << std::endl;
    }
    return 0;
}

πŸ“– For comprehensive parallel processing and batch operation guides, see docs/PARALLEL_BATCH_PROCESSING_GUIDE.md.

Project Structure

libstats/
β”œβ”€β”€ include/           # Modular header architecture
β”‚   β”œβ”€β”€ libstats.h        # Complete library (single include)
β”‚   β”œβ”€β”€ core/             # Core mathematical and statistical components
β”‚   β”œβ”€β”€ distributions/    # Statistical distributions (Gaussian, Exponential, etc.)
β”‚   └── platform/         # SIMD, threading, and platform optimizations
β”œβ”€β”€ src/              # Implementation files
β”œβ”€β”€ tests/            # Comprehensive unit and integration tests
β”œβ”€β”€ examples/         # Usage demonstrations
β”œβ”€β”€ tools/            # Performance analysis and optimization utilities
β”œβ”€β”€ docs/             # Complete documentation guides
└── scripts/          # Build and development scripts

πŸ“– For detailed header organization and dependency management, see docs/HEADER_ARCHITECTURE_GUIDE.md.

Key Features Summary

🎯 Statistical Completeness

  • PDF, CDF, quantiles, parameter estimation, and validation
  • 6 distributions: Gaussian, Exponential, Uniform, Poisson, Discrete, Gamma
  • Beyond std:: distributions with full statistical interfaces

⚑ High Performance

  • Automatic SIMD optimization (SSE2, AVX, AVX2, NEON)
  • Intelligent parallel processing with auto-dispatch
  • Thread-safe batch operations with work-stealing pools
  • Smart caching and adaptive algorithm selection

πŸ›‘οΈ Enterprise Safety

  • Memory-safe operations with comprehensive bounds checking
  • Exception-safe error handling with safe factory methods
  • Thread-safe concurrent access with reader-writer locks
  • Numerical stability with log-space arithmetic

πŸ”§ Modern C++20 Design

  • Zero external dependencies (standard library only)
  • C++20 concepts, std::span, and execution policies
  • Cross-platform: Windows, macOS, Linux with automatic optimization

Comparison with std:: Library

Feature std:: distributions libstats
Random Sampling βœ… Excellent βœ… Uses std:: internally
PDF Evaluation ❌ Not available βœ… Complete implementation
CDF Evaluation ❌ Not available βœ… Complete implementation
Quantile Functions ❌ Not available βœ… Complete implementation
Parameter Fitting ❌ Not available βœ… MLE with diagnostics
Statistical Tests ❌ Not available βœ… Comprehensive validation
Thread Safety ⚠️ Limited βœ… Full concurrent access

Examples and Tools

πŸ“š Examples (examples/ directory)

  • basic_usage.cpp - Core functionality demonstration
  • statistical_validation_demo.cpp - Advanced validation and testing
  • parallel_execution_demo.cpp - High-performance batch processing
  • Performance benchmarks for each distribution type

πŸ”§ Analysis Tools (tools/ directory)

  • system_inspector - CPU capabilities and system information
  • parallel_threshold_benchmark - Optimal parallel threshold analysis
  • performance_dispatcher_tool - Algorithm performance comparison
  • simd_verification - SIMD correctness and performance testing

Testing

# Run all tests
ctest --output-on-failure

# Run specific test categories
ctest -R "test_gaussian"
ctest -R "test_performance"

# Run examples
./examples/basic_usage
./examples/parallel_execution_demo

System Requirements

  • C++20 compatible compiler: GCC 10+, Clang 14+, MSVC 2019+
  • CMake: 3.20 or later
  • Platform: Windows, macOS, Linux (automatic detection and optimization)

Common Build Configurations

Configuration Command Use Case
Development (default) cmake .. Daily development with light optimization
Release cmake -DCMAKE_BUILD_TYPE=Release .. Production builds with maximum optimization
Debug cmake -DCMAKE_BUILD_TYPE=Debug .. Full debugging support

Documentation

For complete information about libstats, refer to these comprehensive guides:

Complete build system documentation covering:

  • Cross-platform build instructions (Windows, macOS, Linux)
  • SIMD detection and optimization
  • Parallel build configuration
  • Advanced CMake options
  • Troubleshooting and manual builds

Header organization and dependency management:

  • Modular header architecture
  • Consolidated vs individual includes
  • Development patterns for distributions, tools, and tests
  • Performance optimization through header design

High-performance parallel and batch processing:

  • Auto-dispatch vs explicit strategy control
  • SIMD and parallel processing APIs
  • Performance optimization guidelines
  • Thread safety and memory management

Windows development environment support:

  • Visual Studio and MSVC configuration
  • Windows-specific SIMD optimization
  • Build instructions for Windows platforms

Roadmap

Phase 1: Core Infrastructure βœ…

  • Enhanced base class with thread safety
  • Basic distribution set (5 distributions)
  • Build system and project structure
  • C++20 upgrade with concepts and spans
  • Memory safety and numerical stability framework
  • Parallel processing capabilities (traditional and work-stealing)
  • Enterprise-grade safety features

Phase 2: Statistical Validation βœ…

  • Goodness-of-fit tests (KS, AD, Chi-squared)
  • Information criteria (AIC/BIC)
  • Residual analysis
  • Cross-validation framework
  • Bootstrap confidence intervals

Phase 3: Performance Optimization βœ…

  • SIMD bulk operations with cross-platform detection
  • Parallel algorithm implementations
  • Performance benchmarking tools
  • Grain size optimization tools
  • CPU feature detection and adaptive constants

Phase 4: Tools and Utilities βœ…

  • Comprehensive performance benchmarks
  • CPU information and feature detection tools
  • Constants inspector for mathematical verification
  • Grain size optimizer for parallel performance tuning
  • Parallel threshold benchmarking

Phase 5: Optimization and Cross-Platform Tuning (In Progress) πŸ”§

  • Core performance analysis tools delivered
  • Parallel optimization with grain size tuning
  • SIMD acceleration with runtime detection
  • Cross-platform testing (Linux, Windows)
  • Compiler compatibility testing (GCC, MSVC)
  • Memory usage optimization
  • Cache efficiency improvements
  • Build system packaging

Phase 6: Future Enhancements (Planned)

  • Additional distributions (Beta, Chi-squared, Student's t)
  • Automatic distribution selection
  • Comprehensive API documentation
  • Real-world usage examples
  • Header-only distribution option

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project builds upon concepts and components from libhmm, adapting them for general-purpose statistical computing while maintaining the focus on modern C++ design and performance.


libstats - Bringing comprehensive statistical computing to modern C++

About

Modern C++20 Statistical Distributions Library

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •