Observability Hub

Observability Hub is a self-hosted platform engineering lab built with Kubernetes, GitOps, Terraform, OpenTelemetry, Prometheus, Grafana, Cilium/eBPF, PostgreSQL, and Go services.

It proves an end-to-end platform ownership loop: declarative infrastructure runs host and cluster services, telemetry exposes behavior, operators and agents diagnose issues, bounded remediation applies fixes, and ADRs/RCAs preserve operational memory.

Project Portal | Full Documentation

Case Studies

Case Study	Problem	How it was diagnosed	Result
Rust Telemetry Summarization Processor	Raw logs and metrics returned too much data for agent workflows	Added a Rust `obs-processor` and validated the language choice with ADR 023 benchmark evidence	Reduced token load while preserving investigation pivots
Worker Ingestion Blocked from MongoDB Atlas	Scheduled ingestion could not reach Atlas	Used worker logs and Cilium policy review to identify blocked egress	Added Atlas egress policy and documented prevention
Loki Gateway DNS Timeout	Grafana and agents could not reliably query logs	Traced the request path through gateway DNS resolution and Loki service routing	Fixed resolver config and added operational checks
SSH Lockout via Cilium IPAM Collision	Host access failed after networking drift	Correlated Cilium/IPAM state, pod readiness, and host reachability	Restored access and documented recovery path

Architecture

The main system flow starts from declarative source, runs through host and cluster runtimes, emits telemetry, drives diagnosis, and feeds remediations and lessons back into source control.

Path	Use case	Flow
Platform reconciliation	Keep host and cluster state aligned with Git	Git/Terraform/Kustomize/systemd -> Argo CD/Proxy -> Kubernetes/systemd runtime
Telemetry pipeline	Capture behavior across services and infrastructure	Go services/Kubernetes/Cilium -> OpenTelemetry/Prometheus/Loki/Tempo/Hubble -> Grafana/MCP
Agent diagnosis	Let operators query and repair live systems through bounded tools	MCP Hub -> telemetry/pod/network providers -> diagnosis or controlled remediation
Batch analytics	Convert runtime metrics and ingestion inputs into stored operational insight	Worker CronJobs -> Prometheus/Postgres/OpenBao -> analytics and ingestion records
Operational memory	Preserve the reasoning behind decisions and failures	Workflows/incidents -> ADRs/RCAs/notes -> future source changes

flowchart TB
    Source["Source of Truth<br/>Git, Terraform, Kustomize"]
    Runtime["Runtime<br/>Kubernetes, host services, databases"]
    Signals["Signals<br/>OTel, Prometheus"]
    Decisions["Decisions<br/>Grafana, MCP tools, workflows"]
    Actions["Actions<br/>GitOps sync, pod repair, service restart"]
    Memory["Memory<br/>ADRs, RCAs, notes, workflows"]

    Source --> Runtime
    Runtime --> Signals
    Signals --> Decisions
    Decisions --> Actions
    Actions --> Source
    Decisions --> Memory
    Memory --> Source

Tech Stack

Layer	Tools
Language	Go, Rust
Infrastructure	Kubernetes, Terraform, Helm, Docker, Argo CD
Data stores	PostgreSQL, Azure Blob Storage
Observability	OpenTelemetry, Prometheus, Grafana, Cilium
Security	Trivy, Tailscale
Testing	Go `testing` package, table-driven tests
CI/CD	GitHub Actions, Argo CD

Documentation

Local Setup

cp .env.example .env
make web-build
make proxy-build
make mcp-build

Run checks:

make test
make lint
make lint-configs

Plan infrastructure:

cd tofu
tofu init
tofu plan

Name		Name	Last commit message	Last commit date
Latest commit History 549 Commits
.github		.github
cmd		cmd
config		config
docker		docker
docs		docs
internal		internal
k3s		k3s
makefiles		makefiles
scripts		scripts
skills		skills
systemd		systemd
tofu		tofu
.env.example		.env.example
.gitignore		.gitignore
.kube-linter.yaml		.kube-linter.yaml
.markdownlint.json		.markdownlint.json
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observability Hub

Case Studies

Architecture

Tech Stack

Documentation

Local Setup

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Observability Hub

Case Studies

Architecture

Tech Stack

Documentation

Local Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages