Network Observability Stack

One-command ELK + Prometheus + Grafana stack for a single host. Tested with Docker Engine and snap-docker.

🚀 Quick Start

Get the stack running in 5 minutes:

# 1. Clone and enter the repo
git clone https://github.com/<your-username>/observability-stack-master.git
cd observability-stack-master

# 2. Run setup (creates directories, sets permissions, generates .env)
./manage.sh setup

# 3. Start all services
./manage.sh start

# 4. Check status
./manage.sh status

Access the dashboards:

Grafana: http://<your-server-ip>:3000 (default: admin / admin)
Kibana: http://<your-server-ip>:5601
Prometheus: http://<your-server-ip>:9090
Elasticsearch: http://<your-server-ip>:9200

🏗️ Architecture Overview

This repository wires together an end-to-end observability stack for a single host, combining log collection, metrics, and alerting.

Components

Filebeat - Collects system and Docker logs
Logstash - Processes and enriches log events
Elasticsearch - Stores log data
Kibana - Log exploration and visualization
Prometheus - Metrics collection and alerting
Alertmanager - Alert routing and notifications
Grafana - Unified dashboards for metrics and logs
cAdvisor - Container metrics exporter
Node Exporter - Host metrics exporter
Blackbox Exporter - Ping/uptime monitoring for remote hosts

Data Flow

System Logs → Filebeat → Logstash → Elasticsearch → Kibana
Docker Logs ↗                                    ↘ Grafana

Host Metrics → Node Exporter → Prometheus → Grafana
Container Metrics → cAdvisor ↗            ↘ Alertmanager

How Everything Fits Together

Filebeat collects system and Docker logs and ships them to Logstash
Logstash processes the events and indexes them in Elasticsearch
Kibana queries Elasticsearch to explore and visualize log data
Prometheus scrapes metrics from exporters and evaluates alert rules
Alertmanager receives alerts from Prometheus and dispatches notifications
Grafana dashboards read from both Elasticsearch and Prometheus for a unified view

📦 Installation

Requirements

Ubuntu (root or sudo access)
Docker Engine with Docker Compose v2 (docker compose)
For snap-docker, Filebeat needs the real Docker root dir (handled automatically by manage.sh setup)

Manual Setup (Alternative)

If you prefer manual setup instead of ./manage.sh setup:

# 1. Create .env file with Docker root directory
echo "DOCKER_ROOTDIR=$(docker info --format '{{.DockerRootDir}}')" > .env
echo "GRAFANA_ADMIN_USER=admin" >> .env
echo "GRAFANA_ADMIN_PASSWORD=admin" >> .env

# 2. Create data directories
sudo mkdir -p data/{elasticsearch,kibana,logstash,filebeat,prometheus,alertmanager,grafana}

# 3. Set permissions
sudo chown -R 1000:1000 data/elasticsearch data/kibana
sudo chown -R 1000:1000 data/logstash
sudo chown -R root:root data/filebeat
sudo chown -R 65534:65534 data/prometheus data/alertmanager
sudo chown -R 472:472 data/grafana

# 4. Set Filebeat config permissions
sudo chown root:root filebeat/filebeat.yml
sudo chmod 0640 filebeat/filebeat.yml

⚙️ Configuration

Environment Variables

Create a .env file (or use .env.example as a template):

# Docker root directory (required for snap-docker)
DOCKER_ROOTDIR=/var/lib/docker

# Grafana admin credentials (optional, defaults shown)
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=admin

Observability Stack Host IP

In this lab, the full observability stack (Prometheus, Logstash, Kibana, Grafana, etc.) runs on ubuntuvm at:

192.168.4.28/22 (primary host IP)

This IP is referenced in:

install-node-exporter.sh (default OBSERVABILITY_STACK_HOST)
ansible/linux-monitoring.yml and ansible/windows-exporter.yml
Remote VM examples in this README

If you move the stack to a different host, search/replace 192.168.4.28 in these files and examples so remote agents and docs stay consistent.

Service Configuration Files

filebeat/filebeat.yml - Log collection configuration
logstash/pipeline/logstash.conf - Log processing pipeline
prometheus/prometheus.yml - Metrics scraping and alerting rules
prometheus/alerts.yml - Alert definitions
alertmanager/alertmanager.yml - Alert routing configuration

Docker Compose Features

The docker-compose.yml includes several enhancements for better reliability and monitoring:

Health Checks:

All services have health checks configured
Docker automatically monitors and restarts unhealthy containers
Services wait for dependencies to be healthy before starting

Logging:

Automatic log rotation (10MB per file, max 3 files per service)
Prevents disk space issues from log growth
Logs accessible via docker logs or ./manage.sh logs <service>

Resource Limits:

Memory limits set for exporters (cAdvisor, Node Exporter)
Prevents resource exhaustion

Service Labels:

Services labeled with com.observability.service and com.observability.role
Helps organize and filter containers

Data Collection

Filebeat collects:

Ubuntu system logs (/var/log/syslog, /var/log/auth.log, /var/log/kern.log)
Docker container logs (JSON format with metadata)

Prometheus scrapes:

Node Exporter (host metrics)
cAdvisor (container metrics)
Prometheus itself (self-monitoring)

🎮 Usage

Management Script

Use ./manage.sh for all operations:

./manage.sh [command] [options]

Commands:
  setup      - Initial setup (permissions, directories, .env)
  start      - Start all services
  stop       - Stop all services
  restart    - Restart all services
  status     - Show service status and health
  logs       - View logs for a service (e.g., ./manage.sh logs filebeat)
  fix        - Fix common issues (lock files, permissions, stuck containers)
  clean      - Clean up containers/volumes (with options)
  health     - Run health checks on all services
  user       - Manage Grafana users (create, list, delete, change-password)

Examples

# Start services
./manage.sh start

# Check what's running
./manage.sh status

# View Filebeat logs
./manage.sh logs filebeat

# Fix Filebeat lock file issue
./manage.sh fix

# Clean up everything (keeps volumes)
./manage.sh clean

# Run health checks
./manage.sh health

# Create a Grafana user
./manage.sh user create john john@example.com password123 Editor

# List Grafana users
./manage.sh user list

# Change user password
./manage.sh user change-password john newpassword456

# Delete a Grafana user
./manage.sh user delete john

Manual Docker Compose Commands

You can also use docker compose directly:

# Start all services
docker compose up -d

# Stop all services
docker compose down

# View logs
docker compose logs -f [service-name]

# Restart a specific service
docker compose restart [service-name]

📊 Dashboard Setup

Grafana Dashboards

Step 1: Configure Data Sources

Add Prometheus Data Source:

Log into Grafana at http://<your-server-ip>:3000
Go to Configuration → Data sources → Add data source
Select Prometheus
Configure:
- Name: Prometheus
- URL: http://prometheus:9090 (use Docker service name)
- Access: Server (default)
Click Save & test

Add Elasticsearch Data Source (Optional - for log visualization):

Go to Configuration → Data sources → Add data source
Select Elasticsearch
Configure:
- Name: Elasticsearch
- URL: http://elasticsearch:9200
- Access: Server
- Index name: filebeat-*
- Time field name: @timestamp
Click Save & test (may show error until data is indexed - see Troubleshooting)

Add AWS CloudWatch Data Source (Optional - for AWS S3 monitoring):

Go to Configuration → Data sources → Add data source
Select CloudWatch
Configure AWS credentials:
- Auth Provider: Choose one:
  - Access & Secret Key: Enter AWS Access Key ID and Secret Access Key
  - Credentials File: Path to AWS credentials file (e.g., /root/.aws/credentials)
  - Default: Use IAM role if running on EC2
- Default Region: Select your AWS region (e.g., us-east-1)
Click Save & test
Import dashboard 22632 (AWS S3 CloudWatch) to monitor S3 buckets, storage metrics, and request metrics

Note: CloudWatch data source requires AWS credentials with appropriate permissions (CloudWatch read access, S3 metrics access).

Step 2: Import Pre-built Dashboards

Grafana has thousands of community dashboards. Recommended ones:

Host Monitoring:

Dashboard ID: 1860 (Node Exporter Full) - Comprehensive host metrics

Container Monitoring:

Dashboard ID: 11074 (Node Exporter for Prometheus) - Modern React-based dashboard
Dashboard ID: 6417 (Docker Container Stats) - Container metrics dashboard
Note: Dashboard 179 uses deprecated Angular and may not work in newer Grafana versions

Prometheus Stats:

Dashboard ID: 893 (Prometheus 2.0 Stats) - Prometheus monitoring dashboard
Note: Dashboard 3662 uses deprecated Angular and may not work in newer Grafana versions

Search Tips:

Look for dashboards marked as "React" or "Modern" in Grafana dashboard library
Filter by "Updated" date to find recently maintained dashboards
Avoid dashboards marked as "Angular" or "Legacy"

To import:

Go to Dashboards → Import
Enter the dashboard ID
Select Prometheus data source
Click Import

Browse more at: https://grafana.com/grafana/dashboards/

Step 2.5: Built-in Network Dashboards (Auto-Provisioned)

This repo also ships a few ready-to-use Grafana dashboards that are auto-provisioned when Grafana starts:

Network Devices - SNMP (uid: network-devices-snmp)
- Visualizes snmp job targets and basic interface traffic from snmp_exporter.
Network Uptime - Blackbox (uid: network-uptime-blackbox)
- Shows ICMP and HTTP uptime/latency from blackbox_icmp and blackbox_http jobs.
Netgear ProSAFE Switches (uid: netgear-prosafe-switches)
- Uses prosafe_exporter metrics for your Netgear Plus switches.

They live under:

grafana/dashboards/*.json
Provisioning config: grafana/provisioning/dashboards/network-observability.yml

After (re)starting Grafana, find them in the “Network Observability” folder in the Grafana UI.

Step 3: Create Custom Dashboards

Host System Overview:

CPU Usage: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Disk Usage: 100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
Network Traffic: rate(node_network_receive_bytes_total[5m]) and rate(node_network_transmit_bytes_total[5m])
Load Average: node_load1, node_load5, node_load15

Container Metrics:

Container CPU: rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100
Container Memory: container_memory_usage_bytes{name!=""}
Container Network I/O: rate(container_network_receive_bytes_total[5m]) and rate(container_network_transmit_bytes_total[5m])
Running Containers: count(container_last_seen{name=~".+"})

Useful Prometheus Queries:

# CPU usage percentage
100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Container CPU usage
rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100

# Top 10 containers by CPU
topk(10, rate(container_cpu_usage_seconds_total{name!=""}[5m]))

Kibana Dashboards

Step 1: Create Data View

Open Kibana at http://<your-server-ip>:5601
Go to Analytics → Discover
Create data view:
- Name: filebeat-*
- Time field: @timestamp
Click Create data view

Step 2: Create Visualizations

Log Volume Over Time:

Visualization: Line chart
Y-axis: Count
X-axis: @timestamp (Date Histogram)
Split by: Optional - log.level or container.name

Top Container Logs:

Visualization: Data table
Metrics: Count
Buckets: Terms aggregation on container.name
Size: 10

Error Logs by Level:

Visualization: Pie chart
Slice by: Terms on log.level
Filter: log.level: error OR log.level: warning

Recent Error Logs:

Visualization: Data table
Filter: log.level: error
Columns: @timestamp, container.name, message
Sort: @timestamp descending

Alerting

Alert rules are configured in prometheus/alerts.yml and automatically loaded. The file includes:

Host Alerts: High CPU, memory, disk usage
Container Alerts: High CPU/memory, container restarts
Prometheus Alerts: Service down, scrape failures

To customize alerts, edit prometheus/alerts.yml and restart Prometheus:

docker compose restart prometheus

🌐 Monitoring Remote Linux VMs

This stack can monitor remote Linux VMs for resource utilization, log errors, and uptime (ping monitoring).

Quick Start

On the remote Linux VM:

Copy the install-node-exporter.sh script to your VM:

scp install-node-exporter.sh user@your-vm-ip:/tmp/

SSH into the VM and run the installation script:
```
ssh user@your-vm-ip
sudo bash /tmp/install-node-exporter.sh
```
The script will:
- Install node_exporter for system metrics (CPU, memory, disk, network)
- Install Filebeat to send logs to your observability stack
- Configure firewall rules (if applicable)
- Set up systemd services

On the observability stack host (192.168.4.28):

Edit prometheus/prometheus.yml and add your VM to the monitoring targets:

For metrics (node_exporter):

- job_name: 'remote_node_exporter'
  scrape_interval: 15s
  static_configs:
    - targets: ['192.168.5.8:9100']  # Replace with your VM IP
      labels:
        host: 'ansible-puppet'  # Replace with your VM hostname
        env: 'production'

For uptime/ping monitoring:

- job_name: 'blackbox_icmp'
  metrics_path: /probe
  params:
    module: [icmp]
  scrape_interval: 30s
  static_configs:
    - targets:
        - '192.168.5.8'  # Replace with your VM IP or hostname
      labels:
        host: 'ansible-puppet'  # Replace with your VM hostname
        env: 'production'

Reload Prometheus configuration:

curl -X POST http://localhost:9090/-/reload

Verify targets are up in Prometheus:
- Visit http://192.168.4.28:9090/targets
- Check that remote_node_exporter and blackbox_icmp targets show as "UP"

What Gets Monitored

System Metrics (via node_exporter):

CPU usage, load average
Memory usage (total, available, swap)
Disk usage and I/O
Network traffic (bytes sent/received)
System uptime

Logs (via Filebeat):

System logs (/var/log/syslog, /var/log/messages, /var/log/auth.log)
Application logs (/var/log/*.log)
All logs are sent to Logstash and indexed in Elasticsearch
View logs in Kibana: http://192.168.4.28:5601

Uptime/Ping (via blackbox_exporter):

ICMP ping monitoring
Latency metrics
Availability status
Alerts when host is down

Grafana Dashboards

After adding your VM, import these dashboards in Grafana:

Host Metrics:

Dashboard ID: 1860 (Node Exporter Full) - Shows CPU, memory, disk, network
Filter by host="your-vm-hostname" to see your VM's metrics

Uptime Monitoring:

Create a custom dashboard with:
- Uptime Panel: probe_success{host="your-vm-hostname"}
- Latency Panel: probe_duration_seconds{host="your-vm-hostname"}
- Status Stat: probe_success{host="your-vm-hostname"} (1 = up, 0 = down)

Logs:

In Kibana, create a data view for filebeat-*
Filter by hostname: "your-vm-hostname" to see logs from your VM

Alerts

The following alerts are automatically configured for remote VMs:

HostDown: Triggers when ping fails for 2 minutes
HostHighLatency: Triggers when ping latency > 1 second for 5 minutes
RemoteNodeExporterDown: Triggers when node_exporter is unreachable for 2 minutes
HighCPUUsage, HighMemoryUsage, DiskSpaceLow: Same as local host alerts

Troubleshooting Remote VMs

VM not showing up in Prometheus:

Check firewall: Ensure port 9100 is accessible from 192.168.4.28
Verify node_exporter is running: systemctl status node_exporter on the VM
Test connectivity: curl http://vm-ip:9100/metrics from the stack host

Logs not appearing in Kibana:

Check Filebeat status: systemctl status filebeat on the VM
Verify connectivity: telnet 192.168.4.28 5044 from the VM
Check Filebeat logs: journalctl -u filebeat -f on the VM

Ping monitoring not working:

Ensure blackbox_exporter is running: docker ps | grep blackbox
Verify ICMP is allowed (blackbox_exporter needs NET_RAW capability)
Check Prometheus targets: http://192.168.4.28:9090/targets

Customizing the Installation Script

You can customize the observability stack host IP by setting an environment variable:

OBSERVABILITY_STACK_HOST=192.168.4.28 sudo bash install-node-exporter.sh

The script supports:

Ubuntu/Debian and RHEL/CentOS/Fedora
Automatic firewall configuration (UFW, firewalld, iptables)
Idempotent installation (safe to run multiple times)

🤖 Ansible Playbooks (Linux & Windows)

For larger environments, you can automate remote agent installation with Ansible.

Inventory Example

An example inventory is provided at ansible/inventory.example.ini:

[linux_hosts]
ansible-puppet ansible_host=192.168.5.8

[linux_hosts:vars]
ansible_user=YOUR_SSH_USER
ansible_become=true

[windows_hosts]
win-desktop-1 ansible_host=192.168.4.210

[windows_hosts:vars]
ansible_user=YOUR_WINDOWS_USER
ansible_password=YOUR_WINDOWS_PASSWORD
ansible_connection=winrm
ansible_winrm_transport=ntlm
ansible_winrm_server_cert_validation=ignore

Copy this file, adjust IPs/credentials, and point Ansible at it with -i.

Linux Monitoring Playbook

ansible/linux-monitoring.yml:

Copies install-node-exporter.sh to each host in the linux_hosts group.
Runs it with OBSERVABILITY_STACK_HOST set (defaults to 192.168.4.28).

Run it:

cd ansible
ansible-playbook -i inventory.example.ini linux-monitoring.yml

Windows Exporter Playbook

ansible/windows-exporter.yml:

Downloads and installs windows_exporter as a Windows service.
Listens on port 9182 for Prometheus scraping.

Run it:

cd ansible
ansible-playbook -i inventory.example.ini windows-exporter.yml

After running these playbooks, add the new targets to the remote_node_exporter (Linux) and a new windows_exporter job (Windows) in prometheus/prometheus.yml, then reload Prometheus.

🔧 Troubleshooting

Common Issues

Filebeat Lock File Error

Symptoms: Filebeat container restarting with error: "data path already locked by another beat"

Solution:

./manage.sh fix

Or manually:

docker stop observability-stack-master-filebeat-1
sudo rm -f ./data/filebeat/filebeat.lock
docker start observability-stack-master-filebeat-1

Logstash Permission Errors

Symptoms: Logstash container failing with permission denied errors

Solution:

sudo chown -R 1000:1000 ./data/logstash
docker compose restart logstash

Elasticsearch "No date field named @timestamp found" in Grafana

Symptoms: Grafana Elasticsearch data source shows error about missing @timestamp field

Causes:

No data indexed yet (most common)
Index pattern mismatch
Services not running

Solution:

Verify services are running: ./manage.sh status

Check if indices exist:

curl -s http://localhost:9200/_cat/indices/filebeat-*?v

Check document count:

curl -s 'http://localhost:9200/filebeat-*/_count'

Wait 2-5 minutes for data to be indexed
Verify Grafana configuration:
- URL: http://elasticsearch:9200 (Docker service name)
- Index name: filebeat-* (with wildcard)
- Time field: @timestamp
Click "Save & test" again in Grafana

If still failing:

Check Filebeat logs: ./manage.sh logs filebeat
Check Logstash logs: ./manage.sh logs logstash
Verify data pipeline: curl -s http://localhost:9600/_node/stats/pipelines?pretty

Containers Stuck in "Created" Status

Symptoms: Containers created but not starting

Solution:

./manage.sh start

This automatically handles stuck containers. Or manually:

docker ps -a --filter "status=created" --format "{{.Names}}" | xargs -r docker start

Elasticsearch Red/Unassigned Shards

Symptoms: Elasticsearch cluster health shows red status

Solution:

# Set replicas to 0 (single node)
curl -s -X PUT "http://localhost:9200/_all/_settings" \
  -H 'Content-Type: application/json' \
  -d '{"index.blocks.read_only_allow_delete": null}'

curl -s -X POST "http://localhost:9200/_cluster/reroute?retry_failed=true"

Permission Denied Errors

Symptoms: Docker commands fail with permission denied

Solution:

# Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in, or use:
newgrp docker

Diagnostic Commands

# Check all services status (via helper script)
./manage.sh status

# List all Docker services for this stack
docker-compose ps

# Check only exporters (handy when troubleshooting metrics)
docker-compose ps | grep exporter

# Tail logs for a specific service (e.g. snmp_exporter, blackbox_exporter, mysql_exporter)
docker-compose logs -f snmp_exporter
docker-compose logs -f blackbox_exporter

# If you prefer container names:
docker ps | grep network-observability-stack
docker logs network-observability-stack_snmp_exporter_1 --tail=50

# Check Elasticsearch health
curl -s http://localhost:9200/_cluster/health?pretty

# Check indices
curl -s http://localhost:9200/_cat/indices/filebeat-*?v

# Check document count
curl -s 'http://localhost:9200/filebeat-*/_count' | jq

# Check Filebeat output
docker exec observability-stack-master-filebeat-1 filebeat test output

# Check Logstash pipeline
curl -s http://localhost:9600/_node/stats/pipelines?pretty

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl,health,lastError}'

🔄 Maintenance

Health Checks

Run comprehensive health checks:

./manage.sh health

This checks:

All containers are running
Elasticsearch cluster health
Prometheus targets
Service endpoints

Note: All services now have health checks configured in docker-compose.yml. Docker automatically monitors service health and can restart unhealthy containers. Health check status is visible in docker ps output.

Restarting Services

# Restart all services (via helper script)
./manage.sh restart

# Restart all services (raw Docker Compose)
docker-compose restart

# Restart a single service (e.g. snmp_exporter, mysql_exporter)
docker-compose restart snmp_exporter
docker-compose restart mysql_exporter

If you see a container crash-looping (e.g. Restarting (1) in docker ps), combine:

docker-compose logs -f <service> to see the error, then
fix the config and run docker-compose restart <service> again.

User Management

Manage Grafana users via CLI:

Create a user:

./manage.sh user create <username> <email> <password> [role]
# Roles: Admin, Editor, Viewer (default: Viewer)
# Example:
./manage.sh user create alice alice@example.com secret123 Editor

List all users:

./manage.sh user list

Change user password:

./manage.sh user change-password <username> <new-password>

Delete a user:

./manage.sh user delete <username>
# Requires confirmation

User Roles:

Admin - Full access to all features and settings
Editor - Can create and edit dashboards, data sources, and alerts
Viewer - Read-only access to dashboards and data sources

Log and Data Retention

All logs and data are automatically rotated/retained to prevent disk space issues:

Docker Container Logs:

Rotation: Automatic via Docker logging driver
Per service: Max 10MB per log file, keeps 3 files
Total per service: ~30MB maximum
Location: Managed by Docker, accessible via docker logs or ./manage.sh logs <service>
Configuration: Set in docker-compose.yml logging section

Elasticsearch Log Data:

Retention: 14 days (configurable via ILM policy)
Auto-deletion: Old indices automatically deleted after retention period
Setup: Run the ILM policy commands below (one-time setup)
Storage: ./data/elasticsearch/ directory

Prometheus Metrics:

Retention: 15 days OR 8GB (whichever comes first)
Auto-cleanup: Prometheus automatically deletes old data
Configuration: Set in docker-compose.yml Prometheus command
Storage: ./data/prometheus/ directory

Filebeat Registry:

Purpose: Tracks file positions to avoid re-reading logs
Growth: Minimal, but can be cleaned if needed
Location: ./data/filebeat/registry/
Cleanup: Only needed if switching log sources or troubleshooting

System Logs (collected by Filebeat):

Rotation: Handled by system logrotate (Ubuntu default)
Location: /var/log/syslog, /var/log/auth.log, etc.
System-managed: Automatically rotated by OS

To check disk usage:

# Check data directory sizes
du -sh ./data/*

# Check Docker log sizes
docker system df

# Check specific service logs
docker inspect <container-name> | grep -A 5 LogPath

Index Management

Set replicas to 0 (single node friendly):

curl -s -X PUT "http://localhost:9200/_index_template/filebeat-template" \
  -H 'Content-Type: application/json' -d '{
    "index_patterns": ["filebeat-*"],
    "template": { "settings": { "index.number_of_replicas": 0 } },
    "priority": 500
  }'

Configure 14-day retention (Recommended - prevents disk space issues):

# Create ILM policy (one-time setup)
curl -s -X PUT "http://localhost:9200/_ilm/policy/filebeat-retain-14d" \
  -H 'Content-Type: application/json' -d '{
    "policy": {
      "phases": {
        "hot": { "actions": {} },
        "delete": { "min_age": "14d", "actions": { "delete": {} } }
      }
    }
  }'

# Apply to template (includes replicas=0 for single node)
curl -s -X PUT "http://localhost:9200/_index_template/filebeat-template" \
  -H 'Content-Type: application/json' -d '{
    "index_patterns": ["filebeat-*"],
    "template": {
      "settings": {
        "index.number_of_replicas": 0,
        "index.lifecycle.name": "filebeat-retain-14d"
      }
    },
    "priority": 500
  }'

Verify retention is working:

# Check ILM policy exists
curl -s http://localhost:9200/_ilm/policy/filebeat-retain-14d?pretty

# Check index template
curl -s http://localhost:9200/_index_template/filebeat-template?pretty

# Monitor index lifecycle status
curl -s 'http://localhost:9200/_cat/indices/filebeat-*?v&h=index,creation.date,status'

Adjust retention period (optional):

Change "min_age": "14d" to your desired retention (e.g., "7d", "30d")
Re-run the ILM policy creation command with new value

Backup and Restore

Backup Elasticsearch indices:

# Create snapshot repository
curl -X PUT "http://localhost:9200/_snapshot/backup" -H 'Content-Type: application/json' -d '{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/backup"
  }
}'

# Create snapshot
curl -X PUT "http://localhost:9200/_snapshot/backup/snapshot_1?wait_for_completion=true"

Backup Grafana dashboards:

Export dashboards as JSON from Grafana UI
Or backup ./data/grafana directory

Cleanup

Clean containers and images (keeps volumes):

./manage.sh clean

Remove everything including volumes (DANGEROUS):

docker compose down -v
docker system prune -a -f
docker volume prune -f

Updates

Update services:

Stop services: ./manage.sh stop
Pull new images: docker compose pull
Start services: ./manage.sh start

🔐 Security

Current Configuration

This stack runs without security for lab/testing use:

Elasticsearch: xpack.security.enabled=false
Grafana: Default admin/admin credentials
No TLS/HTTPS
No authentication on Prometheus/Elasticsearch

Production Hardening

Enable Elasticsearch Security:

Edit docker-compose.yml:

environment:
  - xpack.security.enabled=true
  - ELASTIC_PASSWORD=your-secure-password

Update Kibana environment:

environment:
  - ELASTICSEARCH_USERNAME=elastic
  - ELASTICSEARCH_PASSWORD=your-secure-password

Change Grafana Credentials:

Edit .env file:

GRAFANA_ADMIN_USER=your-username
GRAFANA_ADMIN_PASSWORD=your-secure-password

Add Reverse Proxy:

Use Nginx or Traefik with TLS
Add authentication (OAuth, basic auth)
Restrict access to internal network

Network Security:

Use Docker networks to isolate services
Restrict port exposure to necessary services only
Use firewall rules to limit access

📚 Additional Resources

Service Endpoints

Kibana: http://<IP>:5601
Grafana: http://<IP>:3000 (default admin / admin)
Prometheus: http://<IP>:9090
Elasticsearch: http://<IP>:9200
Alertmanager: http://<IP>:9093
cAdvisor: http://<IP>:8080
Node Exporter: http://<IP>:9100/metrics
Blackbox Exporter: http://<IP>:9115/metrics

Useful Grafana Dashboard IDs

Host Metrics:

1860 - Node Exporter Full (comprehensive host metrics)
11074 - Node Exporter for Prometheus (modern React-based)

Container Metrics:

6417 - Docker Container Stats
11074 - Node Exporter for Prometheus (includes container metrics)

Prometheus Monitoring:

893 - Prometheus 2.0 Stats

AWS S3 Monitoring (requires CloudWatch data source):

22632 - AWS S3 CloudWatch - Monitors S3 buckets, storage metrics, request metrics, and replication metrics
Note: Requires AWS CloudWatch data source configuration and AWS credentials

Note: Dashboards 179 and 3662 use deprecated Angular framework and may not work in Grafana 11+. Use the alternatives listed above instead.

Browse more at: https://grafana.com/grafana/dashboards/

Project Structure

.
├── manage.sh                 # Unified management script
├── install-node-exporter.sh  # Script to install monitoring on remote VMs
├── docker-compose.yml        # Main compose file
├── .env.example             # Environment variable template
├── README.md                # This file
├── alertmanager/
│   └── alertmanager.yml     # Alert routing config
├── blackbox/
│   └── blackbox.yml         # Blackbox exporter config (ping/uptime)
├── filebeat/
│   └── filebeat.yml        # Log collection config
├── logstash/
│   └── pipeline/
│       └── logstash.conf   # Log processing pipeline
└── prometheus/
    ├── prometheus.yml      # Scrape config
    └── alerts.yml          # Alert rules

🗺️ Roadmap

Metricbeat for host/container metrics into Elasticsearch
Curated Kibana dashboards for Docker + Ubuntu auth
Reverse proxy + SSO front-door (e.g., Traefik + OAuth)
Automated backup scripts
Health check automation and alerting

📝 License

See repository license file.

🤝 Contributing

Contributions welcome! Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
alertmanager		alertmanager
ansible		ansible
blackbox		blackbox
filebeat		filebeat
grafana		grafana
hosts		hosts
logstash/pipeline		logstash/pipeline
prometheus		prometheus
prosafe_exporter		prosafe_exporter
snmp-exporter		snmp-exporter
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
install-node-exporter.sh		install-node-exporter.sh
manage.sh		manage.sh
nuke-docker.sh		nuke-docker.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Network Observability Stack

Table of Contents

🚀 Quick Start

🏗️ Architecture Overview

Components

Data Flow

How Everything Fits Together

📦 Installation

Requirements

Manual Setup (Alternative)

⚙️ Configuration

Environment Variables

Observability Stack Host IP

Service Configuration Files

Docker Compose Features

Data Collection

🎮 Usage

Management Script

Examples

Manual Docker Compose Commands

📊 Dashboard Setup

Grafana Dashboards

Step 1: Configure Data Sources

Step 2: Import Pre-built Dashboards

Step 2.5: Built-in Network Dashboards (Auto-Provisioned)

Step 3: Create Custom Dashboards

Kibana Dashboards

Step 1: Create Data View

Step 2: Create Visualizations

Alerting

🌐 Monitoring Remote Linux VMs

Quick Start

What Gets Monitored

Grafana Dashboards

Alerts

Troubleshooting Remote VMs

Customizing the Installation Script

🤖 Ansible Playbooks (Linux & Windows)

Inventory Example

Linux Monitoring Playbook

Windows Exporter Playbook

🔧 Troubleshooting

Common Issues

Filebeat Lock File Error

Logstash Permission Errors

Elasticsearch "No date field named @timestamp found" in Grafana

Containers Stuck in "Created" Status

Elasticsearch Red/Unassigned Shards

Permission Denied Errors

Diagnostic Commands

🔄 Maintenance

Health Checks

Restarting Services

User Management

Log and Data Retention

Index Management

Backup and Restore

Cleanup

Updates

🔐 Security

Current Configuration

Production Hardening

📚 Additional Resources

Service Endpoints

Useful Grafana Dashboard IDs

Project Structure

🗺️ Roadmap

📝 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages