Kubernetes makes deploying applications easy.
But what about persistent storage — the data that actually matters?
That’s where things get tricky.
If you’ve ever struggled to make databases or stateful workloads reliable on Kubernetes, you’ve probably heard of Ceph — a powerful, distributed storage system that can handle block, file, and object storage at scale.
But deploying Ceph manually is complex.
That’s where Rook comes in — a Kubernetes operator that automates the entire Ceph lifecycle: deployment, scaling, healing, and upgrades — all using native Kubernetes resources.
In this guide, I’ll walk you through deploying Rook + Ceph from scratch and show you how to verify and manage it like a pro.
By the end of this guide, you’ll know how to:
- Deploy the Rook Operator and Ceph Cluster
- Verify Ceph cluster health from inside Kubernetes
- Choose and configure one storage type: RBD, CephFS, or Object Store
- Apply production-grade best practices
You’ll need:
- A running Kubernetes cluster (3+ nodes recommended)
kubectlaccess- Raw disks or block devices attached to your nodes (for OSDs)
- Basic knowledge of YAML and Kubernetes resources
💡 Pro tip: Use Minikube or KIND for learning, and real multi-node clusters with physical disks for production.
Rook extends Kubernetes with Custom Resource Definitions (CRDs) representing Ceph components like clusters, pools, filesystems, and gateways.
We’ll install those first, then deploy the Rook Operator.
kubectl create -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/crds.yamlRegisters Ceph-related APIs with Kubernetes.
kubectl create -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/common.yamlCreates the rook-ceph namespace and RBAC roles required by the operator.
kubectl create -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/operator.yamlCheck the pod:
kubectl -n rook-ceph get podsNAME READY STATUS RESTARTS AGE
rook-ceph-operator-6b5d6bb79f-h8x6j 1/1 Running 0 2m
The operator now watches for Ceph CRDs and orchestrates deployment.
The CephCluster resource defines how Rook should build and configure Ceph — monitors, managers, and OSDs.
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
mgr:
modules:
- name: pg_autoscaler
enabled: true
dashboard:
enabled: true
storage:
useAllNodes: false
⚠️ Important: AvoiduseAllDevices: truein production — it will claim all disks.
Apply:
kubectl apply -f ceph-cluster.yamlCheck pods:
kubectl -n rook-ceph get podsNAME READY STATUS RESTARTS AGE
rook-ceph-mon-a-6f6b5d79f7-pt7qx 1/1 Running 0 3m
rook-ceph-mgr-a-5d79bcbf7c-8b6dh 1/1 Running 0 2m
rook-ceph-osd-0-7cc94b8b9c-hjs2t 1/1 Running 0 1m
rook-ceph-osd-1-7cc94b8b9c-mtlxm 1/1 Running 0 1m
Your Ceph Cluster is now running — let’s verify it.
The Rook Toolbox is an administrative pod that includes the ceph CLI.
Deploy:
kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/toolbox.yamlWhen it’s ready:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bashCheck cluster health:
ceph statuscluster:
id: a1b2c3d4
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c
mgr: active: a
osd: 3 osds: 3 up, 3 in
✅ Your Ceph cluster is healthy and operational!
Once healthy, choose one storage interface to deploy (per cluster).
| Storage Type | Description | Ideal Use Case |
|---|---|---|
| 🧊 RBD (Block Storage) | Dynamic PVCs backed by RBD volumes | Databases, StatefulSets |
| 📁 CephFS (Filesystem) | Shared POSIX-compliant filesystem | Shared app data, Prometheus |
| ☁️ Object Store (RGW) | S3-compatible object storage | Backups, S3 API apps |
⚠️ Note: Deploy only one storage type per cluster based on your application needs.
For separate environments (e.g., production and staging), apply a second cluster manifest:
kubectl apply -f cluster-second.yamlGuidelines:
- Different namespace (e.g.,
rook-ceph-test) - Unique cluster name and FSID
- Separate node and disk sets
Here’s how Rook and Ceph integrate within a Kubernetes cluster 👇
┌────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Rook Operator │ │
│ │ (Manages Ceph CRDs and orchestrates daemons) │ │
│ └───────────────────────┬────────────────────────────┘ │
│ │ │
│ ┌───────────────▼────────────────┐ │
│ │ Ceph Cluster │ │
│ │ (MON / MGR / OSD Pods) │ │
│ └───────────────┬────────────────┘ │
│ │ │
│ Choose One Storage Interface │
│ ┌──────────────────┼───────────────────┐ │
│ │ │ │ │
│ ┌───▼───┐ ┌───▼────┐ ┌───▼────┐ │
│ │ RBD │ │ CephFS │ │ RGW │ │
│ │ Block │ │ File │ │ Object │ │
│ │ Store │ │ System │ │ Store │ │
│ └───────┘ └────────┘ └────────┘ │
└────────────────────────────────────────────────────────────┘
🧩 In short:
- Rook Operator manages Ceph lifecycle via CRDs.
- Ceph Cluster runs daemons (MON, MGR, OSD).
- You can expose one interface — RBD, CephFS, or RGW — depending on your needs.
- Separate roles: Dedicate nodes to OSDs via taints/tolerations.
- Avoid
useAllDevices:truein production. - Enable monitoring: Expose Prometheus metrics to Grafana.
- Replication: 3× replicas for critical data, erasure coding for bulk data.
- Routine checks: Run
ceph health detailregularly.
You’ve built a self-healing, scalable storage system inside Kubernetes — powered by Rook and Ceph.
Rook simplifies Ceph operations; Ceph delivers enterprise-grade resilience and performance.
Together they form a foundation for stateful, cloud-native workloads.
Next steps:
- Enable the Ceph Dashboard
- Connect Prometheus + Grafana
- Explore CephFS or RGW
- Learn backup and recovery workflows
Kubernetes isn’t just for stateless apps anymore — with Rook Ceph, it becomes a true data platform.
If you found this helpful, follow me for future posts in my Rook Ceph Deep Dive Series, where I’ll cover:
- 🧰 Ceph Dashboard setup and metrics
- 💾 CephFS and RGW performance tuning
- 🛡️ Recovery from OSD or MON failures
Hi there! I’m Dhinakaran J, a Site Reliability Engineer at the National Payments Corporation of India (NPCI) — where I help ensure the availability, scalability, and reliability of India’s critical payment infrastructure.
I work extensively with Kubernetes, Rook Ceph, Argo CD, Prometheus, Docker, RKE, Jenkins, MinIO, and Grafana, building automated and observable cloud-native systems that keep payment services running smoothly at scale.
My passion lies in infrastructure reliability, monitoring, and distributed storage systems — especially leveraging Rook Ceph to deliver highly resilient and transparent platforms in Kubernetes.
When I’m not optimizing clusters or debugging production workloads, I enjoy exploring open-source technologies, mentoring engineers, and writing hands-on DevOps guides to make complex systems simple for everyone.
💬 Follow me on Medium for practical guides on Kubernetes, Ceph, Argo CD, and Observability — drawn from real-world SRE experience.
📎 Connect with me on LinkedIn: linkedin.com/in/dhinakaran-j-1b777832b