Skip to content

Commit d8d643f

Browse files
authored
Live PVC storage (#3188)
* Live PVC storage Signed-off-by: kunal-511 <yoyokvunal@gmail.com> * removed tensorboard web app Signed-off-by: kunal-511 <yoyokvunal@gmail.com> * updated Signed-off-by: kunal-511 <yoyokvunal@gmail.com> * Only PVC storage Signed-off-by: kunal-511 <yoyokvunal@gmail.com> * Readme updated with PVC values Signed-off-by: kunal-511 <yoyokvunal@gmail.com> * Readme updated with PVC values Signed-off-by: kunal-511 <yoyokvunal@gmail.com> --------- Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
1 parent 8ba4ce9 commit d8d643f

2 files changed

Lines changed: 75 additions & 22 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,30 +54,30 @@ All components are deployable with `kustomize`. You can choose to deploy the ent
5454

5555
This repository periodically synchronizes all official Kubeflow components from the respective upstream repositories. The following matrix shows the git version included for each component along with the resource requirements for each Kubeflow component, calculated as the maximum of actual usage and configured requests for CPU/memory as well as storage requirements from PVCs:
5656

57-
| Component | Local Manifests Path | Upstream Revision | CPU (millicores) | Memory (Mi) | Storage (GB) |
57+
| Component | Local Manifests Path | Upstream Revision | CPU (millicores) | Memory (Mi) | PVC Storage (GB) |
5858
| - | - | - | - | - | - |
5959
| Training Operator | applications/training-operator/upstream | [v1.9.2](https://github.com/kubeflow/training-operator/tree/v1.9.2/manifests) | 3m | 25Mi | 0GB |
6060
| Notebook Controller | applications/jupyter/notebook-controller/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/notebook-controller/config) | 5m | 93Mi | 0GB |
61-
| PVC Viewer Controller | applications/pvcviewer-controller/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/pvcviewer-controller/config) | 15m | 128Mi | 1GB |
61+
| PVC Viewer Controller | applications/pvcviewer-controller/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/pvcviewer-controller/config) | 15m | 128Mi | 0GB |
6262
| Tensorboard Controller | applications/tensorboard/tensorboard-controller/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/tensorboard-controller/config) | 15m | 128Mi | 0GB |
6363
| Central Dashboard | applications/centraldashboard/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/centraldashboard/manifests) | 2m | 159Mi | 0GB |
6464
| Profiles + KFAM | applications/profiles/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/profile-controller/config) | 7m | 129Mi | 0GB |
6565
| PodDefaults Webhook | applications/admission-webhook/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/admission-webhook/manifests) | 1m | 14Mi | 0GB |
6666
| Jupyter Web Application | applications/jupyter/jupyter-web-app/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/crud-web-apps/jupyter/manifests) | 4m | 231Mi | 0GB |
6767
| Tensorboards Web Application | applications/tensorboard/tensorboards-web-app/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/crud-web-apps/tensorboards/manifests) | | | |
6868
| Volumes Web Application | applications/volumes-web-app/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/crud-web-apps/volumes/manifests) | 4m | 226Mi | 0GB |
69-
| Katib | applications/katib/upstream | [v0.18.0](https://github.com/kubeflow/katib/tree/v0.18.0/manifests/v1beta1) | 13m | 476Mi | 13GB |
69+
| Katib | applications/katib/upstream | [v0.18.0](https://github.com/kubeflow/katib/tree/v0.18.0/manifests/v1beta1) | 13m | 476Mi | 10GB |
7070
| KServe | applications/kserve/kserve | [v0.15.0](https://github.com/kserve/kserve/releases/tag/v0.15.0/install/v0.15.0) | 600m | 1200Mi | 0GB |
7171
| KServe Models Web Application | applications/kserve/models-web-app | [v0.14.0](https://github.com/kserve/models-web-app/tree/v0.14.0/config) | 6m | 259Mi | 0GB |
72-
| Kubeflow Pipelines | applications/pipeline/upstream | [2.5.0](https://github.com/kubeflow/pipelines/tree/2.5.0/manifests/kustomize) | 970m | 3552Mi | 100GB |
72+
| Kubeflow Pipelines | applications/pipeline/upstream | [2.5.0](https://github.com/kubeflow/pipelines/tree/2.5.0/manifests/kustomize) | 970m | 3552Mi | 35GB |
7373
| Kubeflow Model Registry | applications/model-registry/upstream | [v0.2.19](https://github.com/kubeflow/model-registry/tree/v0.2.19/manifests/kustomize) | 510m | 2112Mi | 20GB |
7474
| Spark Operator | applications/spark/spark-operator | [2.2.0](https://github.com/kubeflow/spark-operator/tree/v2.2.0) | 9m | 41Mi | 0GB |
7575
| Istio | common/istio | [1.26.1](https://github.com/istio/istio/releases/tag/1.26.1) | 750m | 2364Mi | 0GB |
7676
| Knative | common/knative/knative-serving <br /> common/knative/knative-eventing | [v1.16.2](https://github.com/knative/serving/releases/tag/knative-v1.16.2) <br /> [v1.16.4](https://github.com/knative/eventing/releases/tag/knative-v1.16.4) | 1450m | 1038Mi | 0GB |
7777
| Cert Manager | common/cert-manager | [1.16.1](https://github.com/cert-manager/cert-manager/releases/tag/v1.16.1) | 3m | 128Mi | 0GB |
7878
| Dex | common/dex | [2.41.1](https://github.com/dexidp/dex/releases/tag/v2.41.1) | 3m | 27Mi | 0GB |
7979
| OAuth2-Proxy | common/oauth2-proxy | [7.7.1](https://github.com/oauth2-proxy/oauth2-proxy/releases/tag/v7.7.1) | 3m | 27Mi | 0GB |
80-
| **Total** | | | **4372m** | **12198Mi** | **134GB** |
80+
| **Total** | | | **4372m** | **12198Mi** | **65GB** |
8181

8282

8383

tests/metrics-server_resource_table.py

Lines changed: 70 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import sys
66
import os
77
import glob
8+
import json
89
from collections import defaultdict
910

1011
try:
@@ -41,9 +42,9 @@
4142
'Jupyter Web Application': {
4243
'keywords': ['jupyter-web-app', 'jupyter']
4344
},
44-
'Tensorboards Web Application': {
45-
'keywords': ['tensorboards-web-app']
46-
},
45+
# 'Tensorboards Web Application': {
46+
# 'keywords': ['tensorboards-web-app']
47+
# },
4748
'Volumes Web Application': {
4849
'keywords': ['volumes-web-app']
4950
},
@@ -98,7 +99,7 @@
9899
'Profiles + KFAM',
99100
'PodDefaults Webhook',
100101
'Jupyter Web Application',
101-
'Tensorboards Web Application',
102+
# 'Tensorboards Web Application',
102103
'Volumes Web Application',
103104
'Katib',
104105
'KServe',
@@ -134,6 +135,42 @@ def run_kubectl_top():
134135
print("Error: kubectl command not found. Please install kubectl")
135136
sys.exit(1)
136137

138+
def get_live_pvcs():
139+
"""Get live PVCs from the cluster and return storage information by component"""
140+
try:
141+
result = subprocess.run(['kubectl', 'get', 'pvc', '--all-namespaces', '-o', 'json'],
142+
capture_output=True, text=True, check=True)
143+
pvc_data = json.loads(result.stdout)
144+
145+
component_storage = defaultdict(int)
146+
147+
for pvc in pvc_data.get('items', []):
148+
metadata = pvc.get('metadata', {})
149+
name = metadata.get('name', 'unknown')
150+
namespace = metadata.get('namespace', 'default')
151+
152+
component = categorize_resource(namespace, name)
153+
if not component:
154+
continue
155+
156+
storage_str = None
157+
if 'status' in pvc and 'capacity' in pvc['status']:
158+
storage_str = pvc['status']['capacity'].get('storage', '0')
159+
else:
160+
storage_str = pvc.get('spec', {}).get('resources', {}).get('requests', {}).get('storage', '0')
161+
162+
storage_gb = parse_resource_value(storage_str, 'storage')
163+
component_storage[component] += storage_gb
164+
165+
return dict(component_storage)
166+
167+
except subprocess.CalledProcessError as e:
168+
print(f"Warning: Error getting live PVCs: {e}")
169+
return {}
170+
except (json.JSONDecodeError, FileNotFoundError) as e:
171+
print(f"Warning: Error parsing PVC data: {e}")
172+
return {}
173+
137174
def parse_resource_value(value_str, resource_type):
138175
"""Parse CPU (to millicores) or memory (to MiB) resource values"""
139176
if not value_str or value_str == '0':
@@ -205,7 +242,7 @@ def categorize_resource(namespace, name, filepath=""):
205242

206243
def parse_kubectl_output(output):
207244
"""Parse kubectl top output and categorize by component"""
208-
lines = output.strip().split('\n')[1:] # Skip header
245+
lines = output.strip().split('\n')[1:]
209246
component_resources = defaultdict(lambda: {'cpu': 0, 'memory': 0})
210247

211248
for line in lines:
@@ -318,34 +355,46 @@ def calculate_max_resources(actual_usage, manifest_requests):
318355

319356
return max_resources
320357

321-
def generate_table(component_resources, storage_map, actual_usage, manifest_requests):
358+
def calculate_max_storage(manifest_storage, live_storage):
359+
"""Calculate maximum of manifest storage and live PVC storage"""
360+
all_components = set(manifest_storage.keys()) | set(live_storage.keys())
361+
max_storage = {}
362+
363+
for component in all_components:
364+
manifest_val = manifest_storage.get(component, 0)
365+
live_val = live_storage.get(component, 0)
366+
max_storage[component] = max(manifest_val, live_val)
367+
368+
return max_storage
369+
370+
def generate_table(component_resources, actual_usage, manifest_requests, live_storage=None):
322371
"""Generate markdown table from component resources"""
323372
print("## Resource Usage by Components")
324373
print()
325374
print("The following table shows the resource requirements for each Kubeflow components:")
326375
print()
327-
print("| Component | CPU (cores) | Memory (Mi) | Storage (GB) |")
328-
print("|-----------|-------------|-------------|--------------|")
376+
print("| Component | CPU (cores) | Memory (Mi) | PVC Storage (GB) |")
377+
print("|-----------|-------------|-------------|------------------|")
329378

330-
totals = {'cpu': 0, 'memory': 0, 'storage': 0}
379+
totals = {'cpu': 0, 'memory': 0, 'live_storage': 0}
331380

332381
for component in COMPONENT_ORDER:
333382
if component in component_resources:
334383
resources = component_resources[component]
335-
storage = storage_map.get(component, 0)
384+
live_stor = live_storage.get(component, 0) if live_storage else 0
336385

337386
totals['cpu'] += resources['cpu']
338387
totals['memory'] += resources['memory']
339-
totals['storage'] += storage
388+
totals['live_storage'] += live_stor
340389

341-
print(f"| {component} | {resources['cpu']}m | {resources['memory']}Mi | {storage}GB |")
390+
print(f"| {component} | {resources['cpu']}m | {resources['memory']}Mi | {live_stor}GB |")
342391

343-
print(f"| **Total** | **{totals['cpu']}m** | **{totals['memory']}Mi** | **{totals['storage']}GB** |")
392+
print(f"| **Total** | **{totals['cpu']}m** | **{totals['memory']}Mi** | **{totals['live_storage']}GB** |")
344393
print()
345394

346395
print("### Notes")
347396
print("- CPU/Memory values are maximum of actual usage and configured requests")
348-
print("- Storage values are total PVC allocations from manifest files")
397+
print("- PVC Storage: Actual storage allocated to PVCs in the cluster")
349398
print("- Components not matching the official list are excluded from the table")
350399
print()
351400

@@ -357,18 +406,22 @@ def main():
357406
kubectl_output = run_kubectl_top()
358407
actual_usage = parse_kubectl_output(kubectl_output)
359408

409+
live_storage = get_live_pvcs()
410+
360411
if YAML_AVAILABLE:
361412
print("Parsing manifest files...")
362413
manifest_files = find_manifest_files()
363-
manifest_requests, storage_map = parse_manifest_resources(manifest_files)
414+
manifest_requests, manifest_storage = parse_manifest_resources(manifest_files)
364415
max_resources = calculate_max_resources(actual_usage, manifest_requests)
416+
max_storage = calculate_max_storage(manifest_storage, live_storage)
365417
else:
366418
print("Using actual usage only (manifest parsing skipped)...")
367419
manifest_requests = defaultdict(lambda: {'cpu': 0, 'memory': 0})
420+
manifest_storage = {}
368421
max_resources = actual_usage
369-
storage_map = STORAGE_FALLBACK
422+
max_storage = live_storage if live_storage else STORAGE_FALLBACK
370423

371-
generate_table(max_resources, storage_map, actual_usage, manifest_requests)
424+
generate_table(max_resources, actual_usage, manifest_requests, live_storage)
372425

373426
if __name__ == "__main__":
374427
main()

0 commit comments

Comments
 (0)