Purpose: System integration and consistency validation using Graph Theory (NetworkX) and cross-module reference checking.
Pipeline Step: Step 17: Integration (17_integration.py)
Category: System Integration / Coordination
Status: ✅ Production Ready
Version: 1.0.0
Last Updated: 2026-01-21
- Coordinate cross-module interactions and data flow
- Provide recovery implementations for missing dependencies
- Manage system-wide configuration and state
- Enable seamless integration between pipeline steps
- Handle inter-module communication and data exchange
- Dependency Graph Construction: Uses
networkxto build a directed graph of system components. - Cycle Detection: Identifies circular dependencies that could cause infinite loops or initialization errors.
- Cross-Reference Validation: Ensures all referenced components are defined using explicit filename checks.
- System Stats: Reports node/edge counts and graph density.
process_integration(target_dir: Path, output_dir: Path, verbose: bool = False, logger: Optional[logging.Logger] = None, **kwargs) -> bool
Description: Main integration processing function called by orchestrator (17_integration.py). Coordinates cross-module interactions and validates system consistency using graph theory.
Parameters:
target_dir(Path): Directory containing pipeline outputs to integrateoutput_dir(Path): Output directory for integration resultsverbose(bool): Enable verbose logging (default: False)logger(Optional[logging.Logger]): Logger instance for progress reporting (default: None)integration_mode(str, optional): Integration mode ("coordinated", "standalone", "recovery") (default: "coordinated")system_coordination(bool, optional): Enable system-wide coordination (default: True)validate_dependencies(bool, optional): Validate module dependencies (default: True)detect_cycles(bool, optional): Detect circular dependencies (default: True)**kwargs: Additional integration options
Returns: bool - True if integration processing succeeded, False otherwise
Example:
from integration import process_integration
from pathlib import Path
import logging
logger = logging.getLogger(__name__)
success = process_integration(
target_dir=Path("output"),
output_dir=Path("output/17_integration_output"),
logger=logger,
verbose=True,
integration_mode="coordinated",
validate_dependencies=True
)Description: Coordinates all pipeline modules for integrated operation using dependency graph analysis.
Returns: Dict[str, Any] - Coordination results with:
modules(List[str]): List of coordinated modulesdependency_graph(Dict): Module dependency graphcycles_detected(List[List[str]]): Detected circular dependenciesstatus(str): Coordination status ("success", "partial", "failed")statistics(Dict[str, Any]): Graph statistics (nodes, edges, density)
pathlib- Path manipulation and file system operationstyping- Type hints and annotationslogging- Logging and progress reporting
psutil- System resource monitoring (recovery: basic monitoring)requests- HTTP communication (recovery: local only)
utils.pipeline_template- Standardized pipeline processing patternspipeline.config- Pipeline configuration management
INTEGRATION_MODE- Integration coordination mode ("coordinated", "standalone")INTEGRATION_TIMEOUT- Maximum integration processing time (default: 60 seconds)INTEGRATION_VERBOSE- Enable verbose integration logging
integration_config.yaml- Integration-specific settings
DEFAULT_INTEGRATION_SETTINGS = {
'coordination_enabled': True,
'fallback_mode': True,
'timeout': 60,
'retry_attempts': 3,
'parallel_processing': False
}from integration.processor import process_integration
success = process_integration(
target_dir=Path("input/gnn_files"),
output_dir=Path("output/17_integration_output"),
logger=logger,
integration_mode="coordinated"
)integration_processing_summary.json- Integration processing summarysystem_coordination_report.json- Cross-module coordination statusintegration_status.json- Current integration state
output/17_integration_output/
├── integration_processing_summary.json
├── system_coordination_report.json
└── integration_status.json
- Fast Path: <1s for basic graph validation
- Analysis Depth: O(N+E) complexity for cycle detection
- Memory: Proportional to graph size (Node/Edge count)
- No external dependencies: Local-only integration mode
- Module unavailable: Skip integration for that module
- Network issues: Recovery to local coordination only
- Coordination Errors: Unable to coordinate between modules
- Dependency Errors: Missing required integration dependencies
- Configuration Errors: Invalid integration settings
- Script:
17_integration.py(Step 17) - Function:
process_integration()
utils.pipeline_template- Standardized processing patternspipeline.config- Configuration management
src/tests/test_integration_overall.py- Module-level integration testsmain.py- Pipeline orchestration
Pipeline Steps → Integration Coordination → System State → Cross-Module Communication → Unified Output
src/tests/test_integration_functional.py- Functional integration testssrc/tests/test_integration_processor.py- Processor-level integration tests
- Current: 83%
- Target: 90%+
- Cross-module coordination with various step combinations
- Recovery mode operation when dependencies unavailable
- System state synchronization accuracy
- Error handling with partial module failures
integration_status- Check integration system statusintegration_coordinate- Coordinate pipeline step execution
@mcp_tool("integration_status")
def get_integration_status():
"""Get current integration system status"""
# Implementationsrc/integration/mcp.py- MCP tool registrations
Symptom: Cycle detection reports false positives or misses cycles
Cause: Dependency graph construction incomplete or incorrect
Solution:
- Verify all modules are properly discovered
- Check module import statements are correct
- Use
--verboseflag for detailed dependency graph - Review dependency graph visualization
Symptom: Valid references reported as missing
Cause: File path resolution issues or incorrect reference format
Solution:
- Verify file paths are relative to project root
- Check reference format matches expected pattern
- Ensure referenced files exist in expected locations
- Review cross-reference validation logs
Features:
- Cross-module coordination
- Dependency graph construction
- Cycle detection
- Cross-reference validation
- System-wide configuration
Known Issues:
- None currently
- Next Version: Enhanced dependency analysis
- Future: Real-time integration monitoring
Last Updated: 2026-01-21 Maintainer: GNN Pipeline Team Status: ✅ Production Ready Version: 1.0.0 Architecture Compliance: ✅ 100% Thin Orchestrator Pattern