A distributed, cross-platform monitoring system built in Go that enables centralized management and monitoring of multiple agent daemons across Windows, Linux, and macOS environments.
The central control plane responsible for:
- Real-time Agent Monitoring: Tracks agent health, system metrics, and connection status
- WebSocket Hub: Maintains persistent bidirectional connections with all agents
- REST API: Handles callback responses and administrative operations
- In-Memory State: Maintains current agent status and metrics in memory
Lightweight, cross-platform daemons that run persistently on host machines:
- System Metrics Collection: CPU usage, disk space, memory, network stats, uptime, OS information
- Heartbeat Mechanism: Sends health signals every 5 minutes to confirm operational status
- Local Log Files: Maintains rotated logs locally for diagnosing failures and downtime events
- Bidirectional Communication: Receives commands via WebSocket, responds via REST callbacks
- Process Persistence: Managed by Kardianos service wrapper to ensure daemon survives system operations
- Initial connection and registration
- Periodic heartbeat (every 5 minutes)
- System metrics streaming
- Real-time status updates
- On-demand status requests (callback triggers)
- Log file retrieval requests
- Configuration updates
- Command execution requests
- Remote control signals
- Callback responses (when master requests immediate status before next heartbeat)
- Log file uploads (upon server request)
- Large payload transfers
- β Cross-platform agent support (Windows, Linux, macOS)
- β WebSocket-based real-time communication
- β Daemon process management via Kardianos
- β System metrics collection and reporting
- β Heartbeat monitoring (5-minute intervals)
- β Local log rotation on agents
- β Callback mechanism for on-demand queries
- β Manual installation and uninstallation
- β In-memory state management
- π² Authentication and authorization layer
- π² Remote agent installation/uninstallation
- π² Encrypted WebSocket connections (WSS)
- π² Agent auto-discovery and registration
- π² Alert system for agent failures
- π² Dashboard UI for visual monitoring
- π² Agent command execution framework
- π² Multi-tenancy support
- π² Persistent storage layer (database)
- Go 1.21 or higher
- Supported OS: Windows, Linux, macOS
- Network connectivity between agents and master server
# Clone repository
git clone <your-repo>
cd master-agent
# Build master
cd master
go build -o master
# Run master
./master# Build agent
cd agent
go build -o agent
# Install as service (requires root/admin privileges)
sudo ./agent install
# Start service
sudo ./agent start# config/master.yaml
server:
host: "0.0.0.0"
port: 8080
ws_path: "/ws"
heartbeat:
timeout: 600 # seconds (10 minutes)
cleanup_interval: 300 # seconds# config/agent.yaml
master:
url: "ws://master-server:8080/ws"
api_url: "http://master-server:8080/api"
metrics:
interval: 300 # seconds (5 minutes)
logging:
level: "info"
rotation_size: 10 # MB
max_backups: 5The callback system enables the master to request immediate agent status outside the regular heartbeat cycle:
- Master sends callback request via WebSocket
- Agent receives request, gathers current metrics
- Agent POSTs response to Master's REST API endpoint
- Master processes and updates in-memory state
This allows for:
- Immediate health checks on-demand
- Quick response to administrative queries
- Reduced latency for critical operations
Each agent maintains its own local log files with automatic rotation:
- Local Storage: Logs are written to disk on the agent's host machine
- Log Rotation: Automatic rotation based on file size/age to prevent disk overflow
- On-Demand Upload: Master server can request log files via WebSocket
- Transmission: Agent sends requested log files back to master via REST POST
This architecture ensures:
- β Minimal network overhead (logs only sent when needed)
- β Local debugging capability even when disconnected
- β Centralized log analysis when required by admin
- β Efficient storage management on agent hosts
- Admin requests logs for specific agent via Master
- Master sends log request to agent via WebSocket
- Agent reads requested log file from local disk
- Agent POSTs log file to Master's REST API endpoint
- Master serves logs to admin for analysis
When an agent experiences downtime or failures:
- Events are captured in local rotated log files
- Master detects missing heartbeats and flags the agent
- Admin can request log files from the agent (if it comes back online)
- Logs help diagnose root causes: network issues, system crashes, resource exhaustion, etc.
- For permanent failures, logs remain on agent host for manual retrieval
agent.exe stop
agent.exe uninstallsudo ./agent stop
sudo ./agent uninstallContributions are welcome! Please feel free to submit a Pull Request.
[Your License Here]
Note: This is an active development project. Authentication, persistent storage, and remote management features are planned for upcoming releases.