Skip to content

Adriel007/matrioska

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Matrioska v2 - LLM Orchestration System with File-Based Architecture

Matrioska


๐Ÿ‘ค Author: Adriel D. S. Andrade

๐Ÿ“‹ Overview

Matrioska v2 is an advanced orchestration system for large language models (LLMs) that implements a modular architecture based on files with shared state. Inspired by the concept of Russian nesting dolls, the system decomposes complex tasks into specialized files that communicate via a shared whiteboard (shared_state).


๐ŸŽฏ Key Features

  • ๐Ÿ“ File-Based Architecture: Automatic decomposition of projects into ordered files.
  • ๐Ÿง  Shared State: Communication system between files via shared_state.
  • ๐Ÿ’พ Full Persistence: Checkpoints of architecture and state between executions.
  • โšก Sequential Generation: Each file is generated in dependency order.
  • ๐Ÿ”— Selective Context: Files access only relevant information from predecessors.
  • ๐Ÿ“ฆ Optimized Code: Focus on minimal, complete, and efficient code using CDNs.

๐Ÿ—๏ธ Architecture

Core Components

  • LocalLLM - Wrapper for Mistral models with 4-bit quantization.
  • MatrioskaOrchestrator - Main pipeline orchestrator.
  • ContextManager - Manages shared state and persistence.
  • Architecture - Data structure for file-based planning.
  • FileSpec - Individual file specification.
  • FileArtifact - Generated file artifact.

Execution Flow

$$ \begin{array}{ccc} \text{PHASE 1: ARCHITECTURE} & \rightarrow & \text{PHASE 2: CODE GENERATION} \\ \downarrow & & \downarrow \\ \text{File Decomposition} & & \text{Sequential Generation} \\ & & \text{by Order/Dependency} \end{array} $$


๐Ÿš€ How to Use

Installation

pip install -q json-repair transformers accelerate bitsandbytes torch sentencepiece protobuf

Environment Cleanup (Optional)

!rm -rf /content/log
!rm -rf /content/matrioska_artifacts
!rm -rf /content/matrioska_checkpoints

Basic Execution

from matrioska_v2 import LocalLLM, MatrioskaOrchestrator

# Initialize model
llm = LocalLLM("mistralai/Mistral-7B-Instruct-v0.3")
orchestrator = MatrioskaOrchestrator(llm, base_path="/content")

# Execute task
result = orchestrator.run("Create a library management system with authentication and dashboard")

Directory Structure

/content/
โ”œโ”€โ”€ log/                        # Prompt and response logs
โ”‚   โ””โ”€โ”€ log.txt                # Complete generation history
โ”œโ”€โ”€ matrioska_artifacts/        # Generated files
โ”‚   โ”œโ”€โ”€ index.html
โ”‚   โ”œโ”€โ”€ styles.css
โ”‚   โ””โ”€โ”€ app.js
โ””โ”€โ”€ matrioska_checkpoints/      # State and architecture
    โ”œโ”€โ”€ shared_state.json       # Shared whiteboard
    โ””โ”€โ”€ architecture.json       # Architectural plan

๐Ÿ“– File System

File Specification (FileSpec)

@dataclass
class FileSpec:
    name: str                          # Name without extension
    extension: str                     # File extension
    order: int                         # Creation order (1, 2, 3...)
    shared_state_writes: List[str]     # Info this file defines
    shared_state_reads: List[str]      # Info this file needs
    content: str                       # Code generation prompt
    details: str                       # Functional requirements

Architecture Example

{
  "instructs": {
    "files": [
      {
        "name": "index",
        "extension": "html",
        "order": 1,
        "shared_state_writes": ["element_ids", "page_structure"],
        "shared_state_reads": [],
        "content": "Generate complete HTML structure for library system...",
        "details": "Responsive layout, login form, book catalog, dashboard"
      },
      {
        "name": "styles",
        "extension": "css",
        "order": 2,
        "shared_state_writes": ["css_classes", "color_scheme"],
        "shared_state_reads": ["element_ids", "page_structure"],
        "content": "Generate complete CSS using Tailwind CDN...",
        "details": "Modern design, dark mode, mobile-first"
      },
      {
        "name": "app",
        "extension": "js",
        "order": 3,
        "shared_state_writes": ["api_endpoints", "storage_keys"],
        "shared_state_reads": ["element_ids", "css_classes"],
        "content": "Generate JavaScript with authentication logic...",
        "details": "JWT auth, localStorage, CRUD operations"
      }
    ]
  }
}

Shared State Communication Example

# File 1 (HTML) generates IDs
SHARED_STATE_UPDATE:
{
  "element_ids": ["#loginForm", "#bookList", "#dashboardStats"],
  "page_structure": {
    "login": "section#login",
    "catalog": "section#catalog",
    "dashboard": "section#dashboard"
  }
}

# File 2 (CSS) automatically consumes IDs
# The ContextManager provides only the keys specified in shared_state_reads

๐Ÿ”ง Model Configuration

4-bit Quantization

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

Generation Parameters

  • max_new_tokens: 20,000 (configurable via _MAX_TOKEN_)
  • temperature: 0.3
  • top_p: 0.85
  • do_sample: True
  • pad_token_id: Auto (eos_token_id)

๐Ÿ“Š Architecture Prompt

The system uses ARCHITECT_SYSTEM_PROMPT which instructs the LLM to:

  • Decompose the task into independent files
  • Define creation order based on dependencies
  • Specify contracts via shared_state_reads/writes
  • Generate complete prompts for each file
  • Focus on minimal code and use of CDNs/libraries

Mandatory Prompt Rules

  • Strict JSON structure with instructs root
  • order field defining creation sequence
  • shared_state_writes: information the file defines
  • shared_state_reads: information the file needs
  • content: complete code generation prompt
  • details: functional and non-functional requirements

๐Ÿ’ก Use Cases

Complete Web System

result = orchestrator.run('''
Create a complete e-commerce system with:
- Product catalog with search
- Shopping cart functionality
- User authentication
- Admin dashboard
- Responsive design with Tailwind CDN
''')

React/Vue Application

result = orchestrator.run('''
Build a task management app using React CDN with:
- Component-based architecture
- State management
- CRUD operations
- LocalStorage persistence
''')

Data Dashboard

result = orchestrator.run('''
Create an analytics dashboard with:
- Chart.js for visualizations
- Real-time data updates
- Export to CSV functionality
- Responsive grid layout
''')

๐ŸŽจ Output Example

================================================================================
๐Ÿช† MATRIOSKA ORCHESTRATOR - File-Based Architecture
================================================================================

๐Ÿ—๏ธ  PHASE 1: ARCHITECTURE
--------------------------------------------------------------------------------
๐Ÿ“‹ Task: 'Create a library management system with authentication and dashboard'

โœ“ Project: Project_3_Files
โœ“ Files: 3
   1. index.html ๐Ÿ“–[] โœ๏ธ['element_ids', 'page_structure']
   2. styles.css ๐Ÿ“–['element_ids', 'page_structure'] โœ๏ธ['css_classes', 'color_scheme']
   3. app.js ๐Ÿ“–['element_ids', 'css_classes'] โœ๏ธ['api_endpoints', 'storage_keys']

โšก PHASE 2: CODE GENERATION
--------------------------------------------------------------------------------

๐ŸŽฏ Generating: index.html (Order: 1)
๐Ÿ’พ index.html โ†’ /content/matrioska_artifacts/index.html
๐Ÿง  [SHARED STATE] Updated: ['element_ids', 'page_structure']
   โœ๏ธ Wrote: ['element_ids', 'page_structure']
   โœ“ Generated (2847 chars)

๐ŸŽฏ Generating: styles.css (Order: 2)
   ๐Ÿ“– Reading context: ['element_ids', 'page_structure']
๐Ÿ’พ styles.css โ†’ /content/matrioska_artifacts/styles.css
๐Ÿง  [SHARED STATE] Updated: ['css_classes', 'color_scheme']
   โœ๏ธ Wrote: ['css_classes', 'color_scheme']
   โœ“ Generated (1923 chars)

๐ŸŽฏ Generating: app.js (Order: 3)
   ๐Ÿ“– Reading context: ['element_ids', 'css_classes']
๐Ÿ’พ app.js โ†’ /content/matrioska_artifacts/app.js
๐Ÿง  [SHARED STATE] Updated: ['api_endpoints', 'storage_keys']
   โœ๏ธ Wrote: ['api_endpoints', 'storage_keys']
   โœ“ Generated (3456 chars)

โœ… FINAL RESULT
================================================================================

๐Ÿ“ฆ Project_3_Files

๐Ÿ“‚ Generated Files: 3
   1. index.html
   2. styles.css
   3. app.js

๐Ÿง  SharedState Keys: ['element_ids', 'page_structure', 'css_classes', 'color_scheme', 'api_endpoints', 'storage_keys']
================================================================================

๐Ÿ“ Artifacts: /content/matrioska_artifacts
๐Ÿง  SharedState: /content/matrioska_checkpoints/shared_state.json

๐Ÿ”„ State Management

Shared State

  • Persistent: Saved in shared_state.json between executions.
  • Structured: JSON-serializable dictionary.
  • Selective: Files access only keys specified in shared_state_reads.
  • Incremental: Updated during the generation of each file.

Checkpoints

  • Architecture: architecture.json - Complete project plan
  • SharedState: shared_state.json - Current shared state
  • Artifacts: Individual files in matrioska_artifacts/
  • Logs: Complete history of prompts and responses in log/log.txt

Shared State Example (shared_state.json)

{
  "element_ids": ["#loginForm", "#bookList", "#dashboard"],
  "page_structure": {
    "login": "section#login",
    "catalog": "section#catalog"
  },
  "css_classes": ["btn-primary", "card", "nav-item"],
  "color_scheme": {
    "primary": "#3b82f6",
    "secondary": "#8b5cf6"
  },
  "api_endpoints": {
    "login": "/api/auth/login",
    "books": "/api/books"
  },
  "storage_keys": ["authToken", "currentUser"]
}

๐Ÿ“ฆ SharedState Updates Extraction

The system automatically detects updates in the format:

// At the end of the generated code
SHARED_STATE_UPDATE:
{
  "key1": "value1",
  "key2": ["item1", "item2"]
}

This marker is:

  • Extracted and processed by the ContextManager
  • Removed from the final code
  • Persisted in shared_state.json

๐Ÿ“„ Returned API

result = orchestrator.run("Create app...")

# Returns a dictionary with:
{
  "architecture": Architecture,     # Object with the project plan
  "artifacts": List[FileArtifact], # List of generated files
  "shared_state": Dict[str, Any]   # Final shared state
}

๐Ÿ› ๏ธ Technical Requirements

  • GPU: NVIDIA T4 (8GB VRAM) or superior
  • RAM: 12GB+ recommended
  • Python: 3.8+
  • Libraries:
    • transformers (Hugging Face)
    • torch (PyTorch)
    • bitsandbytes (Quantization)
    • accelerate (Optimization)
    • json-repair (Robust Parsing)
    • sentencepiece, protobuf (Tokenization)

๐Ÿ” Logging and Debug

All prompts and responses are saved in /content/log/log.txt:

PROMPT:
==========================================
[Complete prompt sent to LLM]
==========================================
RESULT:
==========================================
[LLM Response]

๐ŸŽฏ Best Practices

  • File Order: HTML/DB first โ†’ CSS/Styles โ†’ JS/Logic โ†’ API/Backend
  • SharedState: Define clear contracts between files (IDs, classes, routes)
  • Detailed Prompts: The content field must be a complete generation prompt
  • CDNs: Prioritize libraries via CDN to reduce complexity
  • Minimal Code: Focus on minimal and functional code

๐Ÿ”ฎ Differences from v1

Aspect v1 (Modules) v2 (Files)
Basic Unit ModuleSpec FileSpec
Final Integration Artifact assembly Independent files
Structure 3 phases 2 phases
Focus Conceptual modularity Practical code generation
Output Integrated result Separate files

๐Ÿ“„ License

This project is intended for research and educational development purposes.

Matrioska v2: Transforming ideas into structured code ๐Ÿช†โœจ

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published