Skip to content

Latest commit

 

History

History
400 lines (264 loc) · 9.44 KB

File metadata and controls

400 lines (264 loc) · 9.44 KB

Export Module - Agent Scaffolding

Module Overview

Purpose: Multi-format export generation (JSON, XML, GraphML, GEXF, Pickle) from parsed GNN models

Pipeline Step: Step 7: Multi-format export (7_export.py)

Category: Data Export / Transformation

Status: ✅ Production Ready

Version: 1.1.3

Last Updated: 2026-01-21


Core Functionality

Primary Responsibilities

  1. Export parsed GNN models to multiple formats
  2. Generate graph-based representations (GraphML, GEXF)
  3. Create portable serializations (JSON, XML, Pickle)
  4. Validate export integrity
  5. Provide format-specific documentation

Key Capabilities

  • JSON export with schema validation
  • XML export with DTD/XSD
  • GraphML for network analysis tools
  • GEXF for Gephi visualization
  • Pickle for Python persistence

API Reference

Public Functions

generate_exports(target_dir, output_dir, verbose=False) -> bool

Description: Main export processing function. Exports all GNN .md files in target_dir to multiple formats.

Parameters:

  • target_dir (Path): Directory containing GNN files
  • output_dir (Path): Output directory for exports
  • verbose (bool): Enable verbose logging (default: False)

Returns: True if all exports succeeded

Example:

from export import generate_exports

success = generate_exports(
    target_dir=Path("input/gnn_files"),
    output_dir=Path("output/7_export_output"),
    verbose=True
)

Supported Export Formats

Standard Formats

  1. JSON: Human-readable, widely compatible
  2. XML: Schema-validated, industry standard

Graph Formats

  1. GraphML: Standard graph format (Cytoscape, yEd)
  2. GEXF: Gephi visualization format

Text Formats

  1. Plaintext Summary: Human-readable model overview
  2. Plaintext DSL: Round-trip GNN-like text

Binary Formats

  1. Pickle: Fast Python serialization

Configuration

Configuration Options

Export Format Selection

  • export_formats (List[str]): Formats to export (default: ["json", "xml", "graphml", "gexf", "pickle"])
    • "json": JSON export (human-readable)
    • "xml": XML export (schema-validated)
    • "graphml": GraphML format (Cytoscape, yEd)
    • "gexf": GEXF format (Gephi)
    • "pickle": Python pickle serialization
    • "txt": Plaintext summary
    • "dsl": Plaintext DSL

Export Options

  • include_metadata (bool): Include metadata in exports (default: True)
  • validate_schema (bool): Validate XML schema (default: True)
  • pretty_print (bool): Pretty-print JSON/XML (default: True)
  • compress (bool): Compress large exports (default: False)

Graph Export Configuration

  • graph_layout (str): Graph layout algorithm (default: "force")
    • Options: "force", "hierarchical", "circular"
  • include_weights (bool): Include edge weights in graphs (default: True)
  • node_attributes (List[str]): Node attributes to include (default: all)

Dependencies

Required Dependencies

  • json - JSON export
  • xml.etree.ElementTree - XML export
  • pickle - Pickle serialization

Optional Dependencies

  • networkx - Graph format export (recovery: basic XML-based export)

Usage Examples

Basic Usage

from export import generate_exports

success = generate_exports(
    target_dir=Path("input/gnn_files"),
    output_dir=Path("output/7_export_output"),
    verbose=True
)

Specific Formats

from export import export_model

results = export_model(
    model_data=parsed_data,
    output_dir=Path("output/7_export_output"),
    formats=["json", "graphml", "gexf"]
)

Output Specification

Output Products

  • {model}_export.json - JSON export
  • {model}_export.xml - XML export
  • {model}_graph.graphml - GraphML graph
  • {model}_graph.gexf - GEXF graph
  • {model}_model.pkl - Pickle serialization
  • export_summary.json - Export summary

Output Directory Structure

output/7_export_output/
├── model_name_export.json
├── model_name_export.xml
├── model_name_graph.graphml
├── model_name_graph.gexf
├── model_name_model.pkl
└── export_summary.json

Performance Characteristics

Latest Execution

  • Duration: 61ms
  • Memory: 28.7 MB
  • Status: SUCCESS_WITH_WARNINGS
  • Formats Generated: 5

Error Handling

Graceful Degradation

  • Format Unavailable: Skip unavailable format, log warning, continue with others
  • Schema Validation Failure: Export without validation, log warning
  • Large Model: Use compression or split exports, provide warnings
  • Invalid GNN Model: Return structured error, skip model

Error Categories

  1. Format Errors: Format not supported or unavailable (recovery: skip format)
  2. Validation Errors: Schema validation fails (recovery: export without validation)
  3. Serialization Errors: Cannot serialize model (return error)
  4. File I/O Errors: Cannot write export files (return error)

Error Recovery

  • Format Recovery: Automatically skip unavailable formats
  • Partial Export: Export what's possible, report failures
  • Resource Cleanup: Proper cleanup of export resources on errors

Integration Points

Pipeline Integration

  • Input: Receives parsed GNN models from Step 3 (gnn processing)
  • Output: Generates exports consumed by Step 8 (visualization), Step 11 (render), and Step 20 (website generation)
  • Dependencies: Requires GNN parsing results from 3_gnn.py output

Module Dependencies

  • gnn/: Reads parsed GNN model data for export
  • visualization/: Provides graph formats for visualization
  • render/: Provides model data for code generation
  • website/: Provides export data for website generation

External Integration

  • Cytoscape: GraphML format for network analysis
  • Gephi: GEXF format for graph visualization
  • NetworkX: Graph format conversion and analysis

Data Flow

3_gnn.py (GNN parsing)
  ↓
7_export.py (Multi-format export)
  ↓
  ├→ 8_visualization.py (Graph visualization)
  ├→ 11_render.py (Code generation)
  ├→ 20_website.py (Website integration)
  └→ output/7_export_output/ (Standalone exports)

Testing

Test Files

  • src/tests/test_export_overall.py

Test Coverage

  • Current: 86%
  • Target: 90%+

Key Test Scenarios

  1. Multi-format export generation
  2. Format validation and error handling
  3. Graph format conversion
  4. Export integrity verification

MCP Integration

Tools Registered

  • generate_exports — Generate multi-format exports for GNN files in a directory
  • export_single_gnn_file — Export a single GNN file to all supported formats
  • export.list_functions — List callable functions in the export module
  • export.call_function — Call any public function by name with keyword arguments

MCP File Location

  • src/export/mcp.py — Tool registrations and MCP wrappers

Troubleshooting

Common Issues

Issue 1: Export fails for specific format

Symptom: Export succeeds for some formats but fails for others
Cause: Missing optional dependency (networkx) or format-specific errors
Solution:

  • Check that required dependencies are installed: uv pip install networkx
  • Use --verbose flag to see detailed error messages
  • Check format-specific requirements in documentation

Issue 2: GraphML/GEXF export fails

Symptom: Graph formats fail to generate
Cause: Missing networkx dependency or invalid graph structure
Solution:

  • Install networkx: uv pip install networkx
  • Verify GNN model has valid connections section
  • Check that graph data is properly structured

Issue 3: Large model export timeout

Symptom: Export times out or runs out of memory
Cause: Model too large for single export operation
Solution:

  • Use compress=True option to reduce file size
  • Export formats individually instead of all at once
  • Process models in smaller batches

Performance Issues

Slow Export Performance

Symptoms: Export takes longer than expected
Diagnosis:

# Enable verbose logging
python src/7_export.py --target-dir input/ --verbose

Solutions:

  • Export only needed formats (don't export all formats if not needed)
  • Use pickle format for fastest serialization
  • Disable schema validation for faster XML export

Version History

Current Version: 1.1.3

Features:

  • Multi-format export (JSON, XML, GraphML, GEXF, Pickle, Plaintext Summary, Plaintext DSL)
  • Format validation and error handling
  • Graph format conversion via NetworkX
  • Export integrity verification
  • MCP tool integration

Known Issues:

  • None currently

Roadmap

  • Future: Streaming export for very large models

References

Related Documentation

External Resources


Last Updated: 2026-01-21 Maintainer: GNN Pipeline Team Status: ✅ Production Ready Version: 1.1.3 Architecture Compliance: ✅ 100% Thin Orchestrator Pattern


Documentation

  • README: Module Overview
  • AGENTS: Agentic Workflows
  • SPEC: Architectural Specification
  • SKILL: Capability API