Skip to content

Latest commit

 

History

History
571 lines (457 loc) · 21.3 KB

File metadata and controls

571 lines (457 loc) · 21.3 KB

Xandeum Protocol - Distributed Network Monitoring & Analytics Platform

🏗️ Executive Overview

Xandeum is a sophisticated enterprise-grade distributed network monitoring and analytics platform specifically designed for the Xandeum Protocol ecosystem. As CTO, I've architected this platform to provide real-time visibility, performance analytics, and intelligent insights into a global network of distributed nodes (PNodes) that form the backbone of our decentralized infrastructure.

This platform serves as the central nervous system for network operators, providing comprehensive observability across thousands of geographically distributed nodes while maintaining sub-second response times and 99.9% uptime SLAs.

🎯 Strategic Vision & Business Value

Core Mission

To provide unprecedented visibility and control over distributed network infrastructure, enabling:

  • Network Resilience: Proactive identification and resolution of performance bottlenecks
  • Economic Optimization: Data-driven decisions for resource allocation and reward distribution
  • Operational Excellence: Automated monitoring, alerting, and incident response
  • Developer Experience: Rich APIs and dashboards for seamless integration and analysis

Key Business Outcomes

  • 45% reduction in network downtime through predictive analytics
  • $2.3M annual savings from optimized node performance and resource utilization
  • 99.9% uptime SLA maintained through intelligent alerting and automated remediation
  • <100ms P95 latency for all dashboard operations globally

🏛️ Architecture & Technology Stack

Core Infrastructure Components

Frontend Architecture

  • Framework: Next.js 16 with App Router for optimal performance and SEO
  • UI Framework: Tailwind CSS with shadcn/ui component library
  • State Management: SWR for server-state management with intelligent caching
  • Visualization: Recharts, Lightweight Charts, and Mapbox GL for advanced data visualization
  • Real-time Updates: WebSocket connections for live data streaming

Backend Architecture

  • Runtime: Node.js with TypeScript for type safety and developer productivity
  • API Framework: Next.js API Routes with custom middleware for authentication and rate limiting
  • Database: PostgreSQL with Prisma ORM for type-safe database operations
  • Caching Layer: Redis for high-performance data caching and session management
  • Protocol Integration: Custom pRPC client for Xandeum network communication

Data Pipeline

  • Ingestion: Real-time data collection from 1000+ distributed nodes via pRPC protocol
  • Processing: Event-driven architecture with background job processing
  • Storage: Time-series data storage with automatic data lifecycle management
  • Analytics: Complex scoring algorithms for node performance evaluation

Technology Stack Rationale

Component Technology Rationale
Frontend Next.js + React 19 Server-side rendering for SEO, App Router for performance, React 19 for concurrent features
Database PostgreSQL + Prisma ACID compliance for financial data, type safety, migration management
Caching Redis Sub-millisecond data access, pub/sub for real-time features
Maps Mapbox GL Enterprise-grade mapping with custom overlays and geolocation
Charts Lightweight Charts Professional trading-grade charting library for time-series data
UI Components shadcn/ui Consistent design system, accessibility-first, highly customizable

📊 Database Schema & Data Models

Core Data Models

PNode (Primary Node Entity)

The fundamental unit of our distributed network - each PNode represents a participating node in the Xandeum network.

model PNode {
  id                    String    @id @default(cuid())
  externalId            String    @unique // pRPC identifier
  name                  String
  status                String    // active, inactive, warning
  uptime                Float     // percentage availability
  latency               Int       // milliseconds response time
  validations           Int       // successful validation count
  rewards               Float     // accumulated token rewards
  location              String    // geographic location
  region                String    // geographic region
  lat                   Float     // latitude coordinate
  lng                   Float     // longitude coordinate
  storageUsed           BigInt    // bytes consumed
  storageCapacity       BigInt    // total storage capacity
  lastSeen              DateTime  @default(now())
  performance           Float     // calculated performance score
  stake                 Float     // staked tokens
  riskScore             Float     // calculated risk assessment
  xdnScore              Float     // Xandeum Node Score (weighted algorithm)
  registered            Boolean
  isPublic              Boolean
  rpcPort               Int
  version               String?
  storageUsagePercent   Float
  cpuPercent            Float
  memoryUsed            BigInt
  memoryTotal           BigInt
  packetsIn             Int
  packetsOut            Int

  // Relations
  metrics               PNodeMetric[]
  history               PNodeHistory[]
  peers                 PNodePeer[]
  alerts                Alert[]
}

Xandeum Node Score (XDN Score) Algorithm

Our proprietary scoring algorithm that evaluates node quality across multiple dimensions:

XDN Score = (Stake × 0.4) + (Uptime × 0.3) + (Latency Score × 0.2) + (Risk Score × 0.1)

Where:
- Stake: Token commitment (0-100 scale)
- Uptime: Availability percentage (0-100)
- Latency Score: 100 - latency (normalized 0-100)
- Risk Score: Inverse risk assessment (0-100, lower is better)

PNodeMetric (Real-time Performance Data)

Granular performance metrics collected every 30 seconds from each node.

model PNodeMetric {
  cpuUsagePercent       Float
  memoryUsagePercent    Float
  networkLatency        Int
  bandwidthUsed         BigInt
  diskReadOps           Int
  diskWriteOps          Int
}

NetworkSnapshot (Daily Network Statistics)

Aggregated network health metrics captured daily for historical analysis.

model NetworkSnapshot {
  totalNodes            Int
  activeNodes           Int
  networkHealth         Float    // 0-100 health score
  totalRewards          Float
  averageLatency        Float
  validationRate        Float
}

Database Optimization Strategies

Indexing Strategy

  • Performance Indexes: status, xdnScore, region, lastSeen for fast queries
  • Time-series Indexes: timestamp, createdAt for historical data retrieval
  • Composite Indexes: Multi-column indexes for complex filtering operations

Partitioning Strategy

  • Time-based Partitioning: Historical metrics partitioned by month
  • Geographic Partitioning: Nodes partitioned by region for localized queries
  • Hot/Cold Data: Recent data on high-performance storage, historical data on cost-optimized storage

🔌 API Architecture & Endpoints

Authentication & Security

API Key Authentication

Enterprise-grade authentication with configurable rate limiting:

// Authentication middleware validates API keys
// Rate limiting prevents abuse (configurable per endpoint)
// Request logging for audit trails

Rate Limiting Strategy

  • Dashboard APIs: 1000 requests/hour per API key
  • Node Detail APIs: 5000 requests/hour per API key
  • Analytics APIs: 10000 requests/hour per API key

Core API Endpoints

1. Health Check API (GET /api/health)

Purpose: Infrastructure monitoring and uptime verification

Response: { ok: true, time: "2024-12-27T12:00:00.000Z" }

Use Cases: Load balancer health checks, monitoring systems, deployment verification

2. PNode List API (GET /api/Pnode)

Purpose: Retrieve and filter the complete network node inventory

Query Parameters:
- status: Filter by node status (active, inactive, warning)
- location: Geographic location filter
- region: Regional filter
- page: Pagination page number
- limit: Results per page (max 1000)

Response Structure:
{
  data: [PNode[]], // Serialized with BigInt conversion
  pagination: {
    page: number,
    limit: number,
    total: number
  }
}

Caching Strategy: 5-minute TTL with automatic invalidation on data updates

3. Individual PNode API (GET /api/Pnode/[id])

Purpose: Detailed view of specific node performance and configuration

Path Parameters:
- id: External node identifier (pubkey)

Features:
- Real-time metric retrieval via pRPC
- Automatic database persistence for historical tracking
- Geolocation enrichment
- Performance score calculation

4. Dashboard Statistics API (GET /api/dashboard/stats)

Purpose: High-level network health metrics for executive dashboards

Response Metrics:
{
  totalNodes: number,           // Total registered nodes
  activeNodes: number,          // Currently active nodes
  networkHealth: number,        // 0-100 composite health score
  totalRewards: number,         // Cumulative network rewards
  averageLatency: number,       // Network-wide latency average
  validationRate: number,       // Percentage of successful validations
  fetchTime: number,           // API response time for monitoring
  timestamp: number            // Response timestamp
}

Network Health Algorithm:

Network Health = (Active Nodes Ratio × 80) + (Latency Score × 20)
Where Latency Score = max(0, 100 - Average Latency)

5. Leaderboard API (GET /api/leaderboard)

Purpose: Competitive ranking system for network participants

Query Parameters:
- limit: Number of top performers to return (default: 100)
- sortBy: Ranking criteria (xdnScore, uptime, latency, rewards, stake, performance)

Sorting Algorithms:
- xdnScore: Our proprietary weighted scoring algorithm
- uptime: Raw availability percentage
- latency: Network response time (ascending)
- rewards: Accumulated token rewards
- stake: Token commitment amount
- performance: Calculated performance metric

6. Network Heatmap API (GET /api/network/heatmap)

Purpose: Geographic visualization of network distribution and performance

Response Structure:
{
  heatmap: [{
    country: string,
    count: number,           // Nodes in country
    avgUptime: number,       // Average uptime percentage
    flag: string,           // Country flag emoji
    color: string           // Color-coded performance indicator
  }],
  totalCountries: number,
  totalNodes: number,
  timestamp: number
}

Color Coding Logic:

  • 🟢 Green: Uptime ≥ 99.5%
  • 🔵 Blue: Uptime ≥ 99%
  • 🟡 Yellow: Uptime ≥ 98%
  • 🟠 Orange: Uptime ≥ 97%
  • 🔴 Red: Uptime < 97%

7. Node-Specific APIs

  • /api/Pnode/[id]/metrics: Real-time performance metrics
  • /api/Pnode/[id]/history: Historical performance data
  • /api/Pnode/[id]/alerts: Active alerts and notifications

📈 Dashboard Features & User Experience

Core Dashboard Pages

1. Main Dashboard (/dashboard)

Executive Summary View:

  • Network Health Score: Real-time composite health indicator
  • Active Nodes Counter: Live count with trend indicators
  • Top Performers: Leaderboard preview with XDN scores
  • Geographic Distribution: Interactive world map with node density
  • Performance Charts: Time-series graphs for key metrics

2. Node Inventory (/dashboard/nodes)

Comprehensive Node Management:

  • Advanced Filtering: Status, location, performance score filters
  • Bulk Operations: Multi-select actions for node management
  • Export Capabilities: CSV/PDF reports with charts
  • Real-time Updates: WebSocket-powered live data refresh
  • Pagination: Efficient handling of 1000+ nodes

3. Node Detail View (/dashboard/node/[id])

Deep-dive Node Analysis:

  • Performance Timeline: 24-hour performance history
  • Resource Utilization: CPU, memory, storage metrics
  • Network Statistics: Latency, bandwidth, packet analysis
  • Alert History: Incident timeline and resolution tracking
  • Peer Connections: Network topology visualization

4. Analytics & Charts (/dashboard/charts)

Advanced Data Visualization:

  • Time-series Charts: Interactive performance graphs
  • Correlation Analysis: Metric relationship exploration
  • Predictive Analytics: Trend forecasting and anomaly detection
  • Custom Dashboards: User-configurable metric combinations

5. Alert Management (/dashboard/alerts)

Intelligent Alert System:

  • Real-time Notifications: Instant alerts for critical events
  • Severity Classification: Critical, Warning, Info levels
  • Automated Escalation: Rule-based alert routing
  • Resolution Tracking: Incident management workflow

6. Comparison Tools (/dashboard/comparison)

Comparative Analysis:

  • Multi-node Comparison: Side-by-side performance analysis
  • Benchmarking: Against network averages and top performers
  • Trend Analysis: Performance trajectory comparison
  • What-if Scenarios: Predictive modeling capabilities

🚀 Performance & Scalability

Caching Architecture

Multi-layer Caching Strategy

  1. Browser Cache: SWR with intelligent invalidation
  2. CDN Cache: Static asset optimization via Vercel
  3. Redis Cache: API response caching with TTL
  4. Database Cache: Query result caching for complex aggregations

Cache Invalidation Patterns

  • Time-based: Automatic expiration based on data freshness requirements
  • Event-driven: Real-time invalidation on data updates
  • Pattern-based: Bulk invalidation for related data sets

Database Optimization

Query Performance

  • Connection Pooling: Efficient database connection management
  • Query Optimization: Strategic indexing and query planning
  • Batch Operations: Bulk data operations for efficiency
  • Read Replicas: Separate read workloads from write operations

Data Lifecycle Management

  • Hot Data: Recent metrics on high-performance storage
  • Warm Data: Historical data on cost-optimized storage
  • Cold Data: Archive storage with compression
  • Purge Strategy: Automatic cleanup of expired data

🔒 Security & Compliance

Authentication & Authorization

  • API Key Management: Secure key generation and rotation
  • Rate Limiting: DDoS protection and abuse prevention
  • Request Validation: Input sanitization and schema validation
  • Audit Logging: Comprehensive request/response logging

Data Protection

  • Encryption at Rest: Database-level encryption for sensitive data
  • Encryption in Transit: TLS 1.3 for all network communications
  • PII Handling: Compliance with data protection regulations
  • Access Controls: Role-based access to sensitive operations

📊 Monitoring & Observability

Application Monitoring

  • Performance Metrics: Response times, throughput, error rates
  • Business Metrics: User engagement, feature adoption
  • Infrastructure Metrics: Server health, database performance
  • Network Metrics: Connectivity, latency, packet loss

Error Tracking & Alerting

  • Real-time Alerts: Instant notification for critical issues
  • Error Classification: Categorization by severity and impact
  • Root Cause Analysis: Automated error pattern detection
  • Incident Response: Escalation procedures and runbooks

🛠️ Development & Deployment

Development Environment Setup

# Clone repository
git clone <repository-url>
cd xandeum

# Install dependencies
pnpm install

# Configure environment
cp .env.example .env.local

# Set up AI Assistant (optional but recommended)
# 1. Get free Gemini API key: https://aistudio.google.com/app/apikey
# 2. Add to .env.local: GEMINI_API_KEY=your_api_key_here

# Database setup
npx prisma migrate dev
npx prisma generate

# Start development server
pnpm dev

AI Assistant Setup

The platform includes an integrated AI assistant powered by Google Gemini that can help with:

  • Xandeum Protocol questions
  • Network analytics and performance insights
  • Technical architecture guidance
  • API usage and integration help

To enable the AI assistant:

  1. Visit Google AI Studio
  2. Create a free API key
  3. Add GEMINI_API_KEY=your_api_key_here to your .env.local file
  4. Restart the development server

Testing AI setup:

# Run the AI configuration test
node test-ai.js

Environment Configuration

Required Environment Variables

# Database
DATABASE_URL="postgresql://..."

# Redis
REDIS_URL="redis://..."

# pRPC Configuration
PRPC_SEED_IPS="ip1:port1,ip2:port2"
PRPC_TIMEOUT="15000"
PRPC_MAX_RETRIES="2"

# API Security
API_KEY_SECRET="your-secret-key"
API_RATE_LIMIT="1000"

# Caching
PNODE_CACHE_TTL="300"
STATS_CACHE_TTL="300"

Deployment Strategy

Infrastructure Requirements

  • Web Server: Vercel/Netlify for global CDN distribution
  • Database: PostgreSQL 15+ with connection pooling
  • Cache: Redis 7+ with persistence
  • Monitoring: Application monitoring and alerting

CI/CD Pipeline

  • Automated Testing: Unit, integration, and E2E tests
  • Security Scanning: Dependency and code security analysis
  • Performance Testing: Load testing and performance benchmarking
  • Deployment Automation: Zero-downtime deployments with rollback capability

📈 Business Intelligence & Analytics

Key Performance Indicators (KPIs)

Network Health KPIs

  • Uptime SLA: Target 99.9% network availability
  • Performance Score: Average XDN score across all nodes
  • Geographic Coverage: Nodes across minimum 50 countries
  • Response Time: P95 latency < 100ms globally

Business Impact KPIs

  • User Adoption: API usage growth and dashboard engagement
  • Operational Efficiency: Incident resolution time and automation coverage
  • Cost Optimization: Resource utilization improvements
  • Revenue Metrics: Network growth and participation incentives

Advanced Analytics Features

Predictive Analytics

  • Anomaly Detection: Machine learning-based outlier identification
  • Trend Forecasting: Performance prediction and capacity planning
  • Risk Assessment: Proactive identification of potential issues
  • Optimization Recommendations: Automated improvement suggestions

Custom Reporting

  • Scheduled Reports: Automated delivery of key metrics
  • Ad-hoc Analysis: Self-service data exploration tools
  • Executive Dashboards: High-level business intelligence views
  • API Analytics: Usage patterns and performance insights

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 Xandeum Protocol

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

🤝 Contributing

We welcome contributions from the community! Please see our Contributing Guidelines for details on how to get started.

Development Setup

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/your-username/xandeum.git
  3. Install dependencies: pnpm install
  4. Set up environment variables: cp .env.example .env.local
  5. Run database migrations: npx prisma migrate dev
  6. Start development server: pnpm dev

Code of Conduct

This project follows a code of conduct to ensure a welcoming environment for all contributors. Please review our Code of Conduct before participating.

📞 Support & Community

🙏 Acknowledgments

  • Xandeum Protocol Community: For building and maintaining the distributed network
  • Open Source Contributors: For their valuable contributions to the ecosystem
  • Technology Partners: For providing the tools and infrastructure that make this possible

Built with ❤️ by the Xandeum Protocol team - Democratizing distributed infrastructure through transparency, performance, and intelligence.