Skip to content

Conversation

@vara-bonthu
Copy link
Contributor

This PR adds a new blog post titled "🚀 Announcing the Kubeflow Spark Operator Benchmarking Results and Toolkit", focusing on performance benchmarking for the Spark Operator on Kubernetes. The blog highlights key scaling challenges, such as CPU saturation, API server slowdowns, and job scheduling inefficiencies, and provides best practices to optimize large-scale Spark workloads. It also introduces a Benchmarking Toolkit and a Grafana Dashboard to help users monitor and improve performance.

The motivation behind this blog is to share benchmarking insights and practical tuning strategies to help users efficiently run thousands of Spark jobs on Kubernetes. By implementing these optimizations, users can improve job throughput, resource utilization, and system stability. This post serves as a valuable resource for the community to enhance Spark Operator deployments.

@vara-bonthu
Copy link
Contributor Author

@andreyvelich FYI

@vara-bonthu vara-bonthu force-pushed the kubeflow-spark-operator-benchmarks branch 2 times, most recently from 6cdf9d1 to 865cdcb Compare March 16, 2025 02:06
@vara-bonthu vara-bonthu force-pushed the kubeflow-spark-operator-benchmarks branch from 865cdcb to 38785aa Compare March 16, 2025 02:10
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @vara-bonthu, thank you for working on this!
@varodrig @franciscojavierarceo @hbelmiro @akgraner @kubeflow/kubeflow-steering-committee @kubeflow/wg-data-leads @jbottum Appreciate your review on this!

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should be good to move this forward.
Let's address changes in the followup PRs if that is needed.
Thanks again for this great work @vara-bonthu and team!
/lgtm
/approve

@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 950964a into kubeflow:master Mar 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants