This experiment provides a comprehensive comparison of different memory profiling techniques for Python applications, focusing on the accuracy and reliability of various measurement tools. It serves as the primary empirical study for evaluating memory profiling approaches in the Memory-Aware Chunking thesis.
The primary goal is to systematically compare memory profiling tools and techniques, specifically:
- Cross-validation of memory measurements between different profiling tools
- Accuracy assessment of internal vs. external memory profiling approaches
- Performance impact analysis of profiling overhead on application execution
- TraceQ framework evaluation as a unified profiling solution
- Best practices identification for memory profiling in scientific computing
This experiment uses a controlled comparative approach where the same computational workload (seismic envelope calculation) is executed multiple times using different memory profiling techniques. The results are then analyzed to identify discrepancies, patterns, and reliability characteristics.
- Synthetic Data Generation: Creates consistent seismic datasets for reproducible testing
- Multi-tool Profiling: Implements 8 different profiling approaches (4 direct + 4 via TraceQ)
- Statistical Analysis: Performs comprehensive statistical comparison of profiling results
- Visualization Suite: Generates detailed charts and reports for result interpretation
| Tool | Type | Implementation | Advantages | Limitations |
|---|---|---|---|---|
| psutil | External | Direct + TraceQ | Cross-platform, process-level | OS-dependent accuracy |
| resource | External | Direct + TraceQ | POSIX standard | Limited granularity |
| tracemalloc | Internal | Direct + TraceQ | Fine-grained Python tracking | Python-only allocations |
| kernel | External | Direct + TraceQ | Most accurate system view | Linux-specific |
- Multiple runs: Each profiling tool executes 5 independent runs for statistical validity
- Controlled environment: Docker containers with fixed CPU allocation and isolated execution
- Consistent workload: Same synthetic seismic data processed by all tools
- Comprehensive metrics: Memory usage over time, peak consumption, and execution statistics
The experiment follows a modular pipeline architecture:
experiment/
├── generate_data.py # Synthetic seismic data generation
├── measure_with_psutil.py # Direct psutil profiling
├── measure_with_resource.py # Direct resource module profiling
├── measure_with_tracemalloc.py # Direct tracemalloc profiling
├── measure_with_kernel.py # Direct kernel-level profiling
├── measure_with_traceq.py # Unified TraceQ profiling
├── collect_results.py # Data aggregation and preprocessing
└── analyze_results.py # Statistical analysis and visualization
- Data Generation: Creates synthetic seismic datasets with configurable dimensions
- Profiling Phase: Executes each tool multiple times with identical inputs
- Collection Phase: Aggregates results from all profiling runs into unified datasets
- Analysis Phase: Performs statistical analysis and generates comprehensive visualizations
- Docker with BuildKit support
- Linux system (recommended for kernel-level profiling)
- Sufficient disk space for results (varies with dataset size)
Run the complete experiment pipeline:
cd experiments/01-measuring-memory-usage-of-python-programs
./scripts/experiment.shKey environment variables for customization:
# Dataset configuration
export DATASET_INLINES=600 # Number of inline traces
export DATASET_XLINES=600 # Number of crossline traces
export DATASET_SAMPLES=600 # Number of time samples
# Experiment configuration
export EXPERIMENT_N_RUNS=5 # Number of runs per tool
export CPUSET_CPUS=0 # CPU core allocation
# Output configuration
export OUTPUT_DIR="./out/results/$(date +%Y%m%d%H%M%S)"python experiment/generate_data.pyEnvironment variables:
OUTPUT_DIR: Output directory for generated data (default:./out/inputs)DATASET_INLINES: Number of inline traces (default: 100)DATASET_XLINES: Number of crossline traces (default: 100)DATASET_SAMPLES: Number of time samples (default: 100)
Direct tool profiling:
# psutil profiling
python experiment/measure_with_psutil.py
# resource module profiling
python experiment/measure_with_resource.py
# tracemalloc profiling
python experiment/measure_with_tracemalloc.py
# kernel-level profiling
python experiment/measure_with_kernel.pyTraceQ unified profiling:
# TraceQ with different backends
python experiment/measure_with_traceq.pyEnvironment variables for profiling:
SEGY_FILEPATH: Path to input seismic data fileOUTPUT_RESULT_PATH: Output file for profiling resultsPOLL_INTERVAL: Sampling interval in seconds (default: 0.05)APPEND_TIMESTAMP: Whether to append timestamp to output filesTRACEQ_BACKEND: Backend for TraceQ (psutil, resource, tracemalloc, kernel)SESSION_ID: Session identifier for TraceQ profiling
python experiment/collect_results.pyAggregates individual profiling results into unified CSV files:
profiles_detail.csv: Time-series memory usage dataprofiles_summary.csv: Peak memory usage statistics
python experiment/analyze_results.pyGenerates comprehensive analysis including:
- Memory usage timeline charts
- Statistical comparison tables
- Distribution analysis (KDE, CDF)
- Tool comparison visualizations
The experiment produces a comprehensive set of outputs organized in the following structure:
out/
├── inputs/ # Generated synthetic data
│ └── {inlines}-{xlines}-{samples}.segy
├── profiles/ # Raw profiling results
│ ├── psutil-{timestamp}.txt
│ ├── resource-{timestamp}.txt
│ ├── tracemalloc-{timestamp}.txt
│ ├── kernel-{timestamp}.txt
│ ├── traceq_psutil-{run_id}.prof
│ ├── traceq_resource-{run_id}.prof
│ ├── traceq_tracemalloc-{run_id}.prof
│ └── traceq_kernel-{run_id}.prof
├── results/ # Aggregated data
│ ├── profiles_detail.csv
│ └── profiles_summary.csv
└── analysis/ # Visualizations and reports
├── memory_history_no_traceq.pdf
├── compare_orig_vs_traceq_{tool}.pdf
├── kde_base_vs_traceq_{tool}.pdf
├── cdf_base_vs_traceq_{tool}.pdf
├── max_memory_orig_vs_traceq.pdf
├── boxplot_base_vs_traceq.pdf
├── points_collected.pdf
├── phase_comparison.csv
└── traceq_comparison.csv
- Individual tool charts: Memory consumption over time for each profiling tool
- Comparative charts: Side-by-side comparison of original tools vs. TraceQ implementations
- Aggregate timeline: Combined view of all non-TraceQ tools
- Kernel Density Estimation (KDE): Probability density of memory usage values
- Cumulative Distribution Function (CDF): Cumulative probability distributions
- Box plots: Distribution comparison with quartiles and outliers
- TraceQ comparison: Detailed comparison of original tools vs. TraceQ implementations
- Phase analysis: Memory usage statistics across different execution phases
- Peak memory analysis: Maximum memory consumption comparison
- Data collection efficiency: Number of measurement points collected per tool
- Profiling overhead: Impact of profiling on execution performance
- Measurement consistency: Variance analysis across multiple runs
The experiment uses a sophisticated Docker-based execution environment to ensure reproducibility and isolation:
- Builder stage: Compiles TraceQ (Rust components) and installs Python dependencies
- Final stage: Creates minimal runtime environment with user permissions
- Docker-in-Docker (DinD): Provides isolated container execution for each profiling run
- Volume management: Persistent storage for results across container lifecycles
- Resource constraints: Fixed CPU allocation and memory limits for consistent measurements
- User permission mapping: Maintains host user permissions for output files
- Build context sharing: Efficient sharing of common libraries (TraceQ, common utilities)
- Environment isolation: Each profiling run executes in a fresh container instance
This experiment reveals important insights about memory profiling tool accuracy and reliability:
- Kernel-level monitoring provides the most accurate system-wide memory measurements
- psutil shows good correlation with kernel measurements but with some overhead
- tracemalloc captures Python-specific allocations but misses C-library memory usage
- resource module provides limited granularity but consistent peak measurements
- Measurement consistency: TraceQ implementations show high correlation with direct tool usage
- Overhead analysis: TraceQ introduces minimal additional profiling overhead
- Unified interface: Provides consistent API across different profiling backends
- Data quality: Maintains measurement accuracy while improving data collection efficiency
- Multiple tool validation: Cross-validation between tools improves measurement confidence
- Statistical significance: Multiple runs essential for reliable memory profiling
- Environment control: Containerized execution critical for reproducible results
- Sampling frequency: Higher sampling rates improve temporal resolution but increase overhead
Core dependencies (see requirements.txt):
- matplotlib: Visualization and chart generation
- pandas: Data manipulation and analysis
- psutil: System and process monitoring
- seaborn: Statistical data visualization
- traceq: Unified memory profiling framework (built from source)
- common: Shared utilities for seismic data processing (built from source)
This experiment supports the theoretical framework presented in the Memory-Aware Chunking thesis:
- Chapter 2: Memory Profiling Techniques and Tools
- Chapter 3: Empirical Evaluation of Memory Measurement Approaches
- Appendix A: Detailed Memory Profiling Methodology
- Appendix B: Statistical Analysis of Profiling Tool Accuracy
Create datasets with specific characteristics:
# Large dataset for stress testing
export DATASET_INLINES=1000
export DATASET_XLINES=1000
export DATASET_SAMPLES=1000
# Small dataset for quick testing
export DATASET_INLINES=100
export DATASET_XLINES=100
export DATASET_SAMPLES=100Adjust profiling parameters:
# High-frequency sampling
export POLL_INTERVAL=0.01
# Extended profiling runs
export EXPERIMENT_N_RUNS=10
# Custom output location
export OUTPUT_DIR="/path/to/custom/output"Configure container resources:
# CPU allocation
export CPUSET_CPUS="0,1" # Use specific CPU cores
# Memory limits (handled by Docker)
export MEMORY_LIMIT="4g"When modifying this experiment:
- Maintain statistical validity: Ensure changes preserve the ability to perform meaningful statistical analysis
- Update documentation: Modify this README and any relevant analysis scripts
- Test thoroughly: Validate changes across different system configurations and dataset sizes
- Follow conventions: Use existing code style and naming patterns
- Preserve reproducibility: Ensure all changes maintain deterministic behavior
To add a new profiling tool:
- Create
measure_with_{tool}.pyfollowing existing patterns - Update
collect_results.pyto handle the new tool's output format - Modify
analyze_results.pyto include the new tool in comparisons - Update the experiment shell script to include the new tool in the pipeline
This experiment is part of the Memory-Aware Chunking thesis research project. Please refer to the main repository license for usage terms.