This guide details the architecture of the Generalized Notation Notation (GNN) system. It complements DOCS.md and doc/pipeline/README.md with an implementation-oriented perspective for developers.
Last Updated: 2026-03-23 Version: 1.3.0 Status: Maintained Pipeline Steps: 25 (0-24)
- Thin Orchestrators: Numbered scripts delegate to modules, maintaining clear separation of concerns
- Explicit Dependencies: All module dependencies are explicitly declared and validated
- Deterministic Outputs: Pipeline runs produce identical results given identical inputs
- Reproducible Runs: Complete audit trail and environment capture for scientific reproducibility
- Standardized Interfaces: Consistent exit codes, logging, and configuration patterns across all modules
- No Mock Testing: All tests use real code paths and actual data dependencies
- Comprehensive Coverage: >95% test coverage with performance and integration validation
- Real Data Processing: No synthetic or placeholder data in tests or examples
- Performance Standards: Sub-30-minute execution time, <2GB memory usage for standard workloads
- Error Rate Targets: <1% critical failure rate, >99% step completion success rate
- Specialized Capabilities: Each module provides distinct, well-defined agent capabilities
- Stateless Design: Prefer stateless agents for better testability and reliability
- Resource Awareness: Proper resource management with cleanup and monitoring
- Configuration Flexibility: Environment-driven configuration with validation
- Integration Ready: All modules integrate seamlessly with the 25-step pipeline
The GNN system implements a comprehensive 25-step pipeline that transforms GNN model specifications into executable simulations, visualizations, and analysis. The architecture follows the thin orchestrator pattern with complete separation between pipeline orchestration and domain-specific implementations.
graph TB
A["User/Researcher"] --> B["src/main.py<br/>Pipeline Orchestrator"]
B --> C["25 Numbered Scripts<br/>(0_template.py → 24_intelligent_analysis.py)"]
C --> D["Module set<br/>(see src/AGENTS.md)"]
D --> E["Structured Outputs<br/>(output/step_N_output/)"]
B --> F["Infrastructure Layer<br/>(utils/, pipeline/)"]
F --> B
D --> G["External Integrations<br/>(PyMDP, RxInfer.jl, JAX, ActiveInference.jl, PyTorch, NumPyro)"]
D --> H["AI Services<br/>(OpenAI, Anthropic, Ollama)"]
D --> I["Scientific Frameworks<br/>(JAX, DisCoPy, NetworkX)"]
%% styling intentionally omitted (theme-controlled)
The GNN pipeline implements a sophisticated execution model with comprehensive monitoring, error recovery, and performance optimization.
sequenceDiagram
participant U as User/Researcher
participant M as src/main.py
participant S as Step N Script
participant Mod as Agent Module
participant Out as output/*
participant Ext as External Services
U->>M: CLI arguments & config
M->>S: Invoke with parsed args + context
S->>Mod: Call domain-specific APIs
Mod->>Ext: Optional external integrations
Ext-->>Mod: Results or service responses
Mod-->>S: Structured results + diagnostics
S-->>M: Exit codes + performance metrics
M->>Out: Comprehensive output artifacts
M->>M: Update pipeline summary & health report
- Orchestrators handle: argument parsing, logging setup, output dir, calling module APIs, summarizing results
- Modules implement: domain logic, IO, validations, transformations, rendering, execution
graph LR
O["Orchestrator N_step.py"] -->|calls| API[("Module Public API")]
API --> Impl["Internal Logic"]
API --> IO["IO / Files"]
Impl --> Artifacts["Artifacts in output/"]
The GNN pipeline implements a sophisticated dependency graph that ensures proper execution order and data flow between modules.
graph TD
GNN[gnn] --> TYPE[type_checker]
GNN --> REG[model_registry]
GNN --> VAL[validation]
GNN --> EXP[export]
GNN --> ONT[ontology]
GNN --> RENDER[render]
GNN --> LLM[llm]
GNN --> AUDIO[audio]
GNN --> RES[research]
GNN --> GUI[gui]
TYPE --> VAL
VAL --> EXP
EXP --> VIS[visualization]
EXP --> AV[advanced_visualization]
RENDER --> EXEC[execute]
RENDER --> SEC[security]
EXEC --> ANA[analysis]
VIS --> ANA
LLM --> ANA
ML[ml_integration] --> ANA
ANA --> REP[report]
REP --> IA[intelligent_analysis]
GNN --> INT[integration]
GNN --> WEB[website]
GNN --> MCP[mcp]
%% styling intentionally omitted (theme-controlled)
Core Infrastructure:
src/main.py- Main pipeline orchestrator with comprehensive monitoringsrc/utils/- Complete utility library with logging, validation, and monitoringsrc/pipeline/- Full pipeline configuration and management systemsrc/tests/- Comprehensive test suite with real data validation
Agent Modules:
- All 25 pipeline steps (0-24) implemented with thin orchestrator pattern
- Agent/module coverage is tracked in
src/AGENTS.md - Complete MCP integration across all applicable modules
- Full test coverage with >95% coverage for all modules
Documentation:
AGENTS.md- Master agent scaffolding documentationAGENTS_TEMPLATE.md- Template for new modules.agent_rules- Development guidelines.env(optional local file) - local environment values when needed.gitignore- Comprehensive ignore patterns for scientific computing
- The architecture uses thin orchestrators over modular implementations.
- Pipeline, docs, and tests are maintained together.
- Use current runs and reports for exact performance and pass metrics.
flowchart TD
A["setup_step_logging"] --> B["Module Logger"]
B --> C["Per-step Logs"]
C --> D["Aggregated Pipeline Logs"]
flowchart LR
CLI["CLI Args"] --> Merge
CFG["config.yaml"] --> Merge
ENV["Env Vars"] --> Merge
Merge["Configuration Resolver"] --> Eff["Effective Config"]
Eff --> Steps["Steps 0..24"]
- Each step writes to
output/<step_subdir>/ get_output_dir_for_script()ensures consistent paths- Site and reports summarize artifacts across steps
- Exit codes: 0=success, 1=critical error, 2=success with warnings
- Continuation policy controlled via config (fail-fast vs continue)
- Rich diagnostics persisted alongside artifacts
flowchart LR
S["Step Start"] --> V{Valid?}
V -- no --> E["Log + Diagnostics"]
E --> P{Fail-fast?}
P -- yes --> STOP["Abort"]
P -- no --> CONT["Continue"]
V -- yes --> RUN["Run"]
RUN --> X{Exit Code}
X -->|0| OK["Success"]
X -->|2| WARN["Success+Warnings"]
X -->|1| E
Adding new pipeline steps and modules follows a well-established pattern that ensures consistency and maintainability:
- Define the module's purpose and integration points
- Identify dependencies on existing modules
- Determine resource requirements and performance characteristics
- Plan comprehensive test coverage and documentation
- Implement
src/N_newstep.pyfollowing the thin orchestrator pattern - Handle argument parsing, logging setup, and output directory management
- Delegate all domain logic to the module implementation
- Return standardized exit codes (0=success, 1=critical error, 2=success with warnings)
- Create
src/newstep/directory with complete module structure - Implement
__init__.pywith public API exports - Create core logic in
processor.pyor appropriately named files - Add
mcp.pyfor Model Context Protocol integration (if applicable) - Include comprehensive error handling and resource management
- Create integration tests in
src/tests/test_newstep_integration.py - Implement unit tests for all public functions
- Add performance tests with timing and memory validation
- Include error scenario testing with real failure modes
- Ensure >95% test coverage
- Create comprehensive
AGENTS.mdusing the enhanced template - Document all public APIs with examples and error conditions
- Include troubleshooting guide and performance characteristics
- Add usage examples for common scenarios
- Update pipeline documentation and cross-references
- Test the complete integration with existing pipeline
- Validate performance against established standards
- Ensure compliance with all coding standards and patterns
- Review security implications and access controls
- Update configuration files and environment templates as needed
The GNN system implements a sophisticated multi-agent architecture where each module provides specialized capabilities:
Processing Agents (Steps 3-9):
- GNN Agent: Multi-format parsing and semantic analysis
- Type Checker Agent: Static validation and resource estimation
- Validation Agent: Consistency checking and constraint verification
- Export Agent: Multi-format data transformation and serialization
- Visualization Agent: Graph generation and matrix visualization
- Advanced Visualization Agent: Interactive plots and 3D graphics
Simulation Agents (Steps 10-16):
- Ontology Agent: Active Inference knowledge processing and mapping
- Render Agent: Multi-framework code generation and optimization
- Execute Agent: Cross-platform simulation execution and monitoring
- LLM Agent: AI-enhanced analysis and natural language processing
- ML Integration Agent: Machine learning model training and evaluation
- Audio Agent: Multi-backend audio generation and sonification
- Analysis Agent: Advanced statistical processing and performance analysis
Integration Agents (Steps 17-23):
- Integration Agent: Cross-module coordination and data flow management
- Security Agent: Input validation, access control, and threat detection
- Research Agent: Experimental tools and workflow management
- Website Agent: Static site generation and documentation compilation
- MCP Agent: Protocol compliance and tool registration
- GUI Agent: Interactive interface generation and user experience
- Report Agent: Comprehensive analysis reporting and visualization
Dependency Resolution: Automatic dependency detection and inclusion Resource Management: Coordinated resource allocation and cleanup Error Propagation: Structured error handling with graceful degradation Performance Monitoring: Cross-agent performance tracking and optimization Configuration Management: Centralized configuration with module-specific overrides
Each agent implements comprehensive performance monitoring:
- Resource Tracking: Memory, CPU, and disk usage monitoring
- Execution Timing: Detailed timing for all major operations
- Error Metrics: Success rates, failure modes, and recovery statistics
- Integration Health: Dependency health and communication status
- Scalability Metrics: Performance characteristics across different input sizes
- Main Documentation: README.md — Project overview and quick start
- Pipeline Documentation: doc/PIPELINE_SCRIPTS.md — Detailed step-by-step descriptions
- Development Rules: .agent_rules — Canonical rules for scripts and modules
- Agent Registry: AGENTS.md — Master agent scaffolding and module registry
- Template Guide: AGENTS_TEMPLATE.md — Template for new modules
- Doc link checks: DOCS.md — “Documentation maintenance” (
doc/development/docs_audit.py)
Architecture Version: 1.3.0 Last Updated: 2026-03-23 Status: ✅ Production Ready Compliance: Thin orchestrator pattern Latest Validation: See current test and pipeline runs