Purpose: Advanced statistical analysis, performance benchmarking, and complexity metrics calculation for GNN models
Pipeline Step: Step 16: Analysis (16_analysis.py)
Category: Statistical Analysis / Performance Evaluation
Status: ✅ Production Ready
Version: 1.0.0
Last Updated: 2026-01-21
- Perform comprehensive statistical analysis on GNN model structures
- Calculate complexity metrics and maintainability indices
- Generate performance benchmarks and comparison reports
- Extract and analyze variable distributions and correlations
- Provide technical debt assessment and optimization recommendations
- Generate ALL PyMDP visualizations from execution raw data (moved from Execute step)
- Statistical analysis of model variables and connections
- Complexity metrics calculation (cyclomatic, cognitive, structural)
- Performance benchmarking and profiling
- Model comparison and differential analysis
- Distribution analysis and correlation studies
- PyMDP Visualization - belief evolution, state sequences, performance metrics plots
- Cross-framework comparison - uses whatever execution (Step 12) produced; frameworks with no simulation data are reported as "[framework] No simulation data found". Python backends are in core
uv sync; Julia coverage needs Julia + packages installed, then re-run Step 12.
process_analysis(target_dir: Path, output_dir: Path, logger: Optional[logging.Logger] = None, **kwargs) -> bool
Description: Main analysis processing function called by orchestrator (16_analysis.py). Performs comprehensive statistical analysis, complexity metrics, and performance benchmarking.
Parameters:
target_dir(Path): Directory containing GNN files to analyzeoutput_dir(Path): Output directory for analysis resultslogger(Optional[logging.Logger]): Logger instance for progress reporting (default: None)analysis_type(str, optional): Type of analysis ("comprehensive", "statistical", "performance", "complexity") (default: "comprehensive")include_performance(bool, optional): Include performance benchmarking (default: True)include_complexity(bool, optional): Include complexity metrics (default: True)include_quality(bool, optional): Include quality assessment (default: True)benchmark_iterations(int, optional): Number of benchmark iterations (default: 5)**kwargs: Additional analysis options
Returns: bool - True if analysis succeeded, False otherwise
Example:
from analysis import process_analysis
from pathlib import Path
import logging
logger = logging.getLogger(__name__)
success = process_analysis(
target_dir=Path("input/gnn_files"),
output_dir=Path("output/16_analysis_output"),
logger=logger,
analysis_type="comprehensive",
include_performance=True,
benchmark_iterations=10
)Description: Perform comprehensive statistical analysis on a GNN file.
Parameters:
file_path(Path): Path to the GNN file to analyzeverbose(bool, optional): Enable verbose output (default: False)
Returns: Dict[str, Any] - Statistical analysis results with:
variable_count(int): Total number of variablesconnection_count(int): Total number of connectionstype_distribution(Dict[str, int]): Distribution of variable typesdimension_statistics(Dict[str, Any]): Dimension statisticsdensity_metrics(Dict[str, float]): Connection density metrics
calculate_complexity_metrics(model_data: Dict[str, Any], variables: List[Dict[str, Any]] = None, connections: List[Dict[str, Any]] = None) -> Dict[str, Any]
Description: Calculate various complexity metrics for GNN models.
Parameters:
model_data(Dict[str, Any]): Parsed GNN model datavariables(List[Dict[str, Any]], optional): Model variables (extracted if not provided)connections(List[Dict[str, Any]], optional): Model connections (extracted if not provided)
Returns: Dict[str, Any] - Complexity metrics with:
cyclomatic_complexity(float): Cyclomatic complexity scorecognitive_complexity(float): Cognitive complexity scorestructural_complexity(float): Structural complexity scoremaintainability_index(float): Maintainability index (0-100)technical_debt(float): Technical debt score
Returns: Dictionary with complexity metrics (cyclomatic, cognitive, structural)
numpy- Numerical computations and statistical analysispandas- Data manipulation and analysisscipy- Advanced statistical functions
matplotlib- Statistical visualization (recovery: text-based reports)seaborn- Enhanced statistical plots (recovery: matplotlib)
utils.pipeline_template- Standardized pipeline processing patternspipeline.config- Pipeline configuration management
ANALYSIS_PERFORMANCE_MODE- Performance analysis mode ("fast", "comprehensive")ANALYSIS_TIMEOUT- Maximum analysis time per model (default: 300 seconds)
analysis_config.yaml- Custom analysis parameters and thresholds
DEFAULT_COMPLEXITY_THRESHOLDS = {
'cyclomatic_complexity': {'low': 10, 'medium': 20, 'high': 50},
'cognitive_complexity': {'low': 5, 'medium': 15, 'high': 35},
'structural_complexity': {'low': 100, 'medium': 500, 'high': 1000}
}from analysis.processor import process_analysis
success = process_analysis(
target_dir=Path("input/gnn_files"),
output_dir=Path("output/16_analysis_output"),
logger=logger,
analysis_type="comprehensive"
)from analysis.analyzer import perform_statistical_analysis
stats = perform_statistical_analysis(variables, connections)
print(f"Variable count: {stats['variable_statistics']['count']}")
print(f"Connection density: {stats['connection_statistics']['density']}")from analysis.analyzer import calculate_complexity_metrics
metrics = calculate_complexity_metrics(parsed_model)
print(f"Cyclomatic complexity: {metrics['cyclomatic_complexity']}")
print(f"Maintainability index: {metrics['maintainability_index']}"){model}_statistical_analysis.json- Comprehensive statistical analysis{model}_complexity_metrics.json- Complexity assessment results{model}_performance_benchmarks.json- Performance profiling data{model}_analysis_summary.md- Human-readable analysis reportanalysis_processing_summary.json- Pipeline step summary
output/16_analysis_output/
├── model_name_statistical_analysis.json
├── model_name_complexity_metrics.json
├── model_name_performance_benchmarks.json
├── model_name_analysis_summary.md
├── analysis_processing_summary.json
├── pymdp_visualizations/ # NEW: All PyMDP visualizations
│ └── {model_name}/
│ ├── discrete_states.png
│ ├── belief_evolution.png
│ ├── performance_metrics.png
│ └── action_sequence.png
└── comprehensive_visualizations/
- Duration: ~2-5 seconds per model
- Memory: ~50-100MB for large models
- Status: ✅ Production Ready
- Fast Path: ~1-2s for basic statistical analysis
- Slow Path: ~5-10s for comprehensive complexity analysis
- Memory: ~20-50MB for typical models, ~100MB for large models
- No scipy: Simplified statistical analysis using numpy
- No matplotlib: Text-based statistical reports
- Large models: Sampling-based analysis with warnings
- Statistical Errors: Invalid data types or missing values
- Complexity Errors: Model structure too complex for analysis
- Performance Errors: Timeout or resource exhaustion
- Script:
16_analysis.py(Step 16) - Function:
process_analysis()
utils.pipeline_template- Standardized processing patternspipeline.config- Configuration management
tests.test_analysis_integration.py- Integration testsreport.generator- Report generation uses analysis results
GNN Files → Analysis → Statistical Reports → Model Comparisons → Optimization Recommendations
src/tests/test_analysis_overall.py- Module-level testssrc/tests/test_analysis_post_simulation.py- Post-simulation analysis testssrc/tests/test_analysis_extraction.py- Result extraction tests
- Current: 80%
- Target: 90%+
- Statistical analysis with various model sizes
- Complexity metric calculation accuracy
- Performance benchmarking under load
- Error handling with malformed data
process_analysis- Process analysis for GNN files in a directory
@mcp_tool("process_analysis")
def process_analysis_mcp(target_directory: str, output_directory: str, verbose: bool = False):
"""Process Analysis for GNN files. Exposed via MCP."""
# Implementationsrc/analysis/mcp.py- MCP tool registrations
Symptom: Analysis times out or runs out of memory
Cause: Model too complex for comprehensive analysis
Solution:
- Use specific analysis types instead of "comprehensive"
- Disable performance benchmarking for large models
- Process models individually instead of batch
- Increase system memory or use sampling
Symptom: Complexity calculations return zero or invalid values
Cause: Model structure not properly extracted or missing components
Solution:
- Verify GNN processing (step 3) completed successfully
- Check that model has variables and connections
- Use
--verboseflag for detailed extraction logs
Symptom: Cross-framework comparison reports errors
Cause: Execution results (step 12) not available or incomplete
Solution:
- Ensure execution step (12) completed successfully
- Verify framework outputs exist in execution results
- Check execution results format matches expected structure
Features:
- Statistical analysis
- Complexity metrics calculation
- Performance benchmarking
- Model comparison
- Framework output analysis
Known Issues:
- None currently
- Next Version: Enhanced visualization of analysis results
- Future: Real-time analysis dashboard
Last Updated: 2026-01-21 Maintainer: GNN Pipeline Team Status: ✅ Production Ready Version: 1.0.0 Architecture Compliance: ✅ 100% Thin Orchestrator Pattern