Skip to content

Latest commit

 

History

History
350 lines (266 loc) · 8.96 KB

File metadata and controls

350 lines (266 loc) · 8.96 KB

Remote FlockParser Setup Guide

Date: 2025-10-16 Status: ✅ IMPLEMENTED

Overview

SynapticLlamas can now access FlockParser document knowledge base from remote machines via HTTP API. This allows distributed deployments where FlockParser and SynapticLlamas run on separate servers.

Architecture

┌─────────────────────┐         HTTP API          ┌─────────────────────┐
│  SynapticLlamas     │◄──────────────────────────►│    FlockParser      │
│  (Machine A)        │    Port 8765 (default)     │    (Machine B)      │
│                     │                            │                     │
│  - Agents           │                            │  - Document Index   │
│  - RAG queries      │                            │  - Knowledge Base   │
│  - Citation inject  │                            │  - 41 PDFs indexed  │
└─────────────────────┘                            └─────────────────────┘

Setup Instructions

1. Start FlockParser API Server

On the FlockParser machine, run:

cd /home/joker/FlockParser
python flockparser_api_server.py --host 0.0.0.0 --port 8765

Options:

  • --host: Host to bind to (default: 0.0.0.0 for external access)
  • --port: Port to listen on (default: 8765)
  • --path: Path to FlockParser installation (default: /home/joker/FlockParser)

Example output:

2025-10-16 14:12:36 - INFO - ✅ FlockParser API initialized (41 documents)
2025-10-16 14:12:36 - INFO - 🚀 Starting FlockParser API server on 0.0.0.0:8765
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)

2. Configure SynapticLlamas for Remote Mode

On the SynapticLlamas machine, update initialization:

Before (Local Mode):

from flockparser_adapter import FlockParserAdapter

adapter = FlockParserAdapter(
    flockparser_path="/home/joker/FlockParser"  # Local filesystem
)

After (Remote Mode):

from flockparser_adapter import FlockParserAdapter

adapter = FlockParserAdapter(
    flockparser_path="http://remote-host:8765"  # HTTP API URL
)

The adapter automatically detects the mode:

  • Local mode: If path is a filesystem path
  • Remote mode: If path starts with http:// or https://

3. Configuration File

Update ~/.synapticllamas.json or environment variable:

{
    "flockparser_url": "http://10.9.66.45:8765",
    "embedding_model": "mxbai-embed-large",
    "ollama_url": "http://localhost:11434"
}

Or set environment variable:

export FLOCKPARSER_URL="http://10.9.66.45:8765"

API Endpoints

The FlockParser API server provides these endpoints:

GET /health

Health check endpoint.

Response:

{
    "status": "healthy",
    "available": true,
    "document_index_exists": true
}

GET /stats

Get knowledge base statistics.

Response:

{
    "available": true,
    "documents": 41,
    "chunks": 6141,
    "document_names": ["quantum_mechanics_intro.pdf", ...]
}

POST /query

Query documents with pre-computed embedding.

Request:

{
    "query": "quantum entanglement",
    "query_embedding": [0.123, -0.456, ...],
    "top_k": 15,
    "min_similarity": 0.5
}

Response:

{
    "chunks": [
        {
            "text": "Entanglement generation...",
            "doc_name": "document_22.pdf",
            "similarity": 0.839,
            "doc_id": "doc_001"
        }
    ],
    "total_found": 25
}

GET /documents

Get complete document index (metadata only, not full text).

GET /chunk/{chunk_id}

Get specific chunk data by ID.

Testing

Test Local Mode

python test_remote_flockparser.py

Test Remote Mode

# Start API server first
cd /home/joker/FlockParser
python flockparser_api_server.py --port 8765 &

# Run test
python test_remote_flockparser.py http://localhost:8765

Expected output:

======================================================================
TEST SUMMARY
======================================================================
Local mode:  ✅ WORKING
Remote mode: ✅ WORKING
======================================================================

Network Configuration

Firewall Rules

Allow incoming connections on port 8765:

# UFW (Ubuntu/Debian)
sudo ufw allow 8765/tcp

# iptables
sudo iptables -A INPUT -p tcp --dport 8765 -j ACCEPT

Security Considerations

Current Implementation:

  • No authentication (suitable for trusted networks)
  • CORS enabled for all origins
  • HTTP (not HTTPS)

Production Recommendations:

  1. Add API key authentication
  2. Enable HTTPS with SSL/TLS
  3. Restrict CORS to specific origins
  4. Use reverse proxy (nginx/traefik)
  5. Rate limiting

Example with API key:

# flockparser_api_server.py (add this)
from fastapi import Header, HTTPException

API_KEY = os.getenv("FLOCKPARSER_API_KEY", "your-secret-key")

async def verify_api_key(x_api_key: str = Header(...)):
    if x_api_key != API_KEY:
        raise HTTPException(status_code=403, detail="Invalid API Key")
    return x_api_key

# Then add dependency to endpoints:
@app.post("/query", dependencies=[Depends(verify_api_key)])
async def query_documents(request: QueryRequest):
    ...

Performance

Benchmarks

Local Mode:

  • Query latency: ~3.3s (embedding generation + local file I/O)
  • Throughput: ~0.3 queries/sec

Remote Mode:

  • Query latency: ~2.5s (embedding generation + HTTP + remote file I/O)
  • Throughput: ~0.4 queries/sec
  • Network overhead: ~200ms

Notes:

  • Embedding generation (via Ollama) dominates latency
  • Remote mode slightly faster due to local Ollama access
  • HTTP overhead is minimal (~200ms)

Optimization Tips

  1. Use distributed embeddings: Pass hybrid_router_sync to route embeddings to nearest Ollama instance
  2. Increase top_k: Fetch more chunks in single request
  3. Cache embeddings: Store query embeddings to avoid recomputation
  4. Connection pooling: Reuse HTTP connections

Troubleshooting

Issue: "Remote FlockParser API not available"

Check:

  1. API server is running: curl http://remote-host:8765/health
  2. Firewall allows port 8765
  3. Network connectivity: ping remote-host
  4. API logs: /tmp/flockparser_api.log

Issue: "Failed to generate query embedding"

Check:

  1. Ollama is running: ollama list
  2. Embedding model is available: ollama pull mxbai-embed-large
  3. Ollama URL is correct in adapter config

Issue: Slow queries

Optimize:

  1. Reduce top_k from 15 to 5-10
  2. Increase min_similarity from 0.5 to 0.6
  3. Use local Ollama instance for embeddings
  4. Check network latency: ping remote-host

Integration Examples

Example 1: Distributed Research

# Machine A: SynapticLlamas
from distributed_orchestrator import DistributedOrchestrator

orchestrator = DistributedOrchestrator(
    model="llama3.2",
    enable_collaborative=True,
    flockparser_url="http://10.9.66.45:8765"  # Remote FlockParser
)

result = await orchestrator.run_collaborative(
    "Explain quantum entanglement with citations"
)

Example 2: Multi-Region Setup

# Region 1: US East (SynapticLlamas)
adapter_us = FlockParserAdapter("http://flockparser-us.example.com:8765")

# Region 2: EU West (SynapticLlamas)
adapter_eu = FlockParserAdapter("http://flockparser-eu.example.com:8765")

# Load balancing between regions
adapters = [adapter_us, adapter_eu]
current_adapter = random.choice(adapters)
results = current_adapter.query_documents(query)

Files Modified

New Files Created

  1. /home/joker/FlockParser/flockparser_api_server.py

    • FastAPI HTTP server
    • Endpoints: /health, /stats, /query, /documents, /chunk/{id}
    • CORS enabled for remote access
  2. /home/joker/SynapticLlamas/test_remote_flockparser.py

    • Test script for both local and remote modes
    • Usage: python test_remote_flockparser.py [api_url]

Modified Files

  1. /home/joker/SynapticLlamas/flockparser_adapter.py
    • Added remote_mode detection
    • New methods: _check_availability(), _query_remote(), _query_local()
    • Updated: __init__(), query_documents(), get_statistics(), _count_documents()

Summary

Remote FlockParser access implemented

  • SynapticLlamas can run on separate machine from FlockParser
  • HTTP API provides document query capabilities
  • Automatic mode detection (local vs remote)
  • Compatible with existing FlockParser JSON storage
  • No re-indexing required

Usage:

# Local
adapter = FlockParserAdapter("/home/joker/FlockParser")

# Remote
adapter = FlockParserAdapter("http://remote-host:8765")

Both modes support identical API - just change the path!