Skip to content

Commit 1a9253b

Browse files
deploy: c665e2a
1 parent 3666f03 commit 1a9253b

File tree

4 files changed

+16
-8
lines changed

4 files changed

+16
-8
lines changed

assets/css/style.css.map

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

assets/js/search-data.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
,"post1": {
1616
"title": "Kubeflow and Me: A Story Started with Push-based Metrics Collection",
17-
"content": "This summer, I gained a precious opportunity to participate in the Google Summer of Code(GSoC), in which I would contribute to Katib and fulfill a project named “Push-based Metrics Collection in Katib” within 12 weeks. Firstly, I got to know about GSoC and Kubeflow with the recommendation from the former active maintainer Ce Gao(gaocegege)’s personal blog. And I was deeply impressed by the idea of cloud native AI toolkits, I decided to dive into this area and learn some skills to enhance my career and future. In the blog, I’ll provide my personal insight into Katib, for those who are interested in cloud native, AI, and hyperparameters tuning. . Problem . The project aims to provide a Python SDK API interface for users to push metrics to Katib DB directly. . The current implementation of Metrics Collector is pull-based, raising design problems such as determining the frequency at which we scrape the metrics, performance issues like the overhead caused by too many sidecar containers, and restrictions on developing environments that must support sidecar containers and admission webhooks. And also, for data scientists, they need to pay attention to the format of metrics printed in the training scripts, which is error prone and may be hard to recognize. . Solution . We decided to implement a new API for Katib Python SDK to offer users a push-based way to store metrics directly into the Kaitb DB and resolve those issues raised by pull-based metrics collection. . . My Contributions during the GSoC . I raised numerous PRs for the Katib and Training-Operator project. Some of them are related to my GSoC project, and others may contribute to the completeness of UTs (Unit Tests), simplicity of dependency management, and the compatibility of the UI component. . For reference, the coding period can be rougly divided into 3 stages: . Convert the proposal to a KEP and discuss the architecture, API design, etc. (~4 weeks) with the mentors . | Develop a push-based metrics collection interface according to the KEP. (~8 weeks) . | Write some examples and documentation & Present my work to the Kubeflow Community. . | Also, I raised some issues not only to describe the problems and bugs I met during the coding period, but also to suggest the future enhancement direction for Katib and the Training-Operator. . There is an Github Issue tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details. . Lessons Learned . Think Twice, Code Once: Andrey taught me that we should think of the API specification and all the related details before coding. This can significantly reduce the workload of the coding period and avoid big refactor of the project. Meanwhile, my understanding of Katib got clear gradually during the over-and-over rounds of re-think and re-design of the architecture. . | Dive into the Source Code: Engineering projects nowadays are extremely complex and need much effort to understand them. The best way to get familiar with the project is to dive into the source code and run several examples. . | Communication: Communication is the most important thing when collaborating with others. Expressing your idea precisely and making others understand you easily are significant skills not only in the open source community but also in various scenarios such as at a company and in group work. . | In the End . Special Thanks: . To my mentors @andreyvelich @johnugeorge @tenzen-y, especially to Andrey. Your great knowledge about the code base and the industry impressed me a lot. Thanks for your timely response to my PRs and for always attending the weekly meetings to solve my pending problems, from which I benefited a lot. What’s more, I can well remember that, in that night, you explained the usage of Kubeflow in the industry to me with greate patience, and encouraged me not to doubt about myself, just do it and explore more, contribute more. You ignite the flame of my desire to contribute to cloud native AI. . | To @gaocegege. You recommend me to the Kubeflow Community. Thanks for your patient answers for my endless silly questions. . | To Google. Thanks for offering such a precious opportunity for me to begin my journey in the open source world! . | I hold a firm belief that every small step counts, and everybody in the community is unique and of great significance. There is no doubt that our joint efforts will surely contribute to the flourishing of our Kubeflow Community, make it the world-best community managing AI lifecycle on Kubernetes, and attract much more attention from the industry. Then, more and more new comers will pour in and work along with us. . Again, I’ll continue to contribute to Kubeflow. . Links . For more details about Kubeflow and the upcoming GSoC’25 event, please check: . What is Kubeflow? | Kubeflow GSoC’25 Event | .",
17+
"content": "This summer, I gained a precious opportunity to participate in the Google Summer of Code(GSoC), in which I would contribute to Katib and fulfill a project named “Push-based Metrics Collection in Katib” within 12 weeks. Firstly, I got to know about GSoC and Kubeflow with the recommendation from the former active maintainer Ce Gao(gaocegege)’s personal blog. And I was deeply impressed by the idea of cloud native AI toolkits, I decided to dive into this area and learn some skills to enhance my career and future. In the blog, I’ll provide my personal insight into Katib, for those who are interested in cloud native, AI, and hyperparameters tuning. . Problem . The project aims to provide a Python SDK API interface for users to push metrics to Katib DB directly. . The current implementation of Metrics Collector is pull-based, raising design problems such as determining the frequency at which we scrape the metrics, performance issues like the overhead caused by too many sidecar containers, and restrictions on developing environments that must support sidecar containers and admission webhooks. And also, for data scientists, they need to pay attention to the format of metrics printed in the training scripts, which is error prone and may be hard to recognize. . Solution . We decided to implement a new API for Katib Python SDK to offer users a push-based way to store metrics directly into the Kaitb DB and resolve those issues raised by pull-based metrics collection. . In the new design, users just need to set metrics_collector_config={"kind": "Push"} in the tune() function and call the report_metrics() API in their objective function to push metrics to Katib DB directly. There are no sidecar containers and restricted metric log formats any more. After that, Trial Controller will continuously collect metrics from Katib DB and update the status of Trial, which is the same as pull-based metrics collection. . If you are interested in it, please refer to this doc and example for more details. . . My Contributions during the GSoC . I raised numerous PRs for the Katib and Training-Operator project. Some of them are related to my GSoC project, and others may contribute to the completeness of UTs (Unit Tests), simplicity of dependency management, and the compatibility of the UI component. . For reference, the coding period can be rougly divided into 3 stages: . Convert the proposal to a KEP and discuss the architecture, API design, etc. (~4 weeks) with the mentors . | Develop a push-based metrics collection interface according to the KEP. (~8 weeks) . | Write some examples and documentation & Present my work to the Kubeflow Community. . | Also, I raised some issues not only to describe the problems and bugs I met during the coding period, but also to suggest the future enhancement direction for Katib and the Training-Operator. . There is a Github Issue tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details. . Lessons Learned . Think Twice, Code Once: Andrey taught me that we should think of the API specification and all the related details before coding. This can significantly reduce the workload of the coding period and avoid big refactor of the project. Meanwhile, my understanding of Katib got clear gradually during the over-and-over rounds of re-think and re-design of the architecture. . | Dive into the Source Code: Engineering projects nowadays are extremely complex and need much effort to understand them. The best way to get familiar with the project is to dive into the source code and run several examples. . | Communication: Communication is the most important thing when collaborating with others. Expressing your idea precisely and making others understand you easily are significant skills not only in the open source community but also in various scenarios such as at a company and in group work. . | In the End . Special Thanks: . To my mentors @andreyvelich @johnugeorge @tenzen-y, especially to Andrey. Your great knowledge about the code base and the industry impressed me a lot. Thanks for your timely response to my PRs and for always attending the weekly meetings to solve my pending problems, from which I benefited a lot. What’s more, I can well remember that, in that night, you explained the usage of Kubeflow in the industry to me with greate patience, and encouraged me not to doubt about myself, just do it and explore more, contribute more. You ignite the flame of my desire to contribute to cloud native AI. . | To @gaocegege. You recommend me to the Kubeflow Community. Thanks for your patient answers for my endless silly questions. . | To Google. Thanks for offering such a precious opportunity for me to begin my journey in the open source world! . | I hold a firm belief that every small step counts, and everybody in the community is unique and of great significance. There is no doubt that our joint efforts will surely contribute to the flourishing of our Kubeflow Community, make it the world-best community managing AI lifecycle on Kubernetes, and attract much more attention from the industry. Then, more and more new comers will pour in and work along with us. . Again, I’ll continue to contribute to Kubeflow. . Links . For more details about Kubeflow and the upcoming GSoC’25 event, please check: . What is Kubeflow? | Kubeflow GSoC’25 Event | .",
1818
"url": "https://blog.kubeflow.org/gsoc-2024-project-6/",
1919
"relUrl": "/gsoc-2024-project-6/",
2020
"date": " • Sep 28, 2024"

feed.xml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="https://blog.kubeflow.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.kubeflow.org/" rel="alternate" type="text/html" /><updated>2025-03-03T04:17:08-06:00</updated><id>https://blog.kubeflow.org/feed.xml</id><title type="html">Kubeflow</title><subtitle>The Machine Learning Toolkit for Kubernetes.</subtitle><entry><title type="html">Synthetic Data Generation with Kubeflow Pipelines</title><link href="https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp.html" rel="alternate" type="text/html" title="Synthetic Data Generation with Kubeflow Pipelines" /><published>2025-02-16T00:00:00-06:00</published><updated>2025-02-16T00:00:00-06:00</updated><id>https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp</id><content type="html" xml:base="https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp.html">&lt;h3 id=&quot;synthetic-data-generation---why-and-how&quot;&gt;Synthetic Data Generation - Why and How?&lt;/h3&gt;
1+
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="https://blog.kubeflow.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.kubeflow.org/" rel="alternate" type="text/html" /><updated>2025-03-03T09:59:49-06:00</updated><id>https://blog.kubeflow.org/feed.xml</id><title type="html">Kubeflow</title><subtitle>The Machine Learning Toolkit for Kubernetes.</subtitle><entry><title type="html">Synthetic Data Generation with Kubeflow Pipelines</title><link href="https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp.html" rel="alternate" type="text/html" title="Synthetic Data Generation with Kubeflow Pipelines" /><published>2025-02-16T00:00:00-06:00</published><updated>2025-02-16T00:00:00-06:00</updated><id>https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp</id><content type="html" xml:base="https://blog.kubeflow.org/kfp/2025/02/16/synthetic-data-using-kfp.html">&lt;h3 id=&quot;synthetic-data-generation---why-and-how&quot;&gt;Synthetic Data Generation - Why and How?&lt;/h3&gt;
22

33
&lt;p&gt;When creating insights, decisions, and actions from data, the best results come from real data. But accessing real data often requires lengthy security and legal processes. The data may also be incomplete, biased, or too small, and during early exploration, we may not even know if it’s worth pursuing. While real data is essential for proper evaluation, gaps or limited access frequently hinder progress until the formal process is complete.&lt;/p&gt;
44

@@ -252,6 +252,10 @@ In the blog, I’ll provide my personal insight into Katib, for those who are in
252252

253253
&lt;p&gt;We decided to implement a new API for Katib Python SDK to offer users a push-based way to store metrics directly into the Kaitb DB and resolve those issues raised by pull-based metrics collection.&lt;/p&gt;
254254

255+
&lt;p&gt;In the new design, users just need to set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metrics_collector_config={&quot;kind&quot;: &quot;Push&quot;}&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tune()&lt;/code&gt; function and call the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;report_metrics()&lt;/code&gt; API in their objective function to push metrics to Katib DB directly. There are no sidecar containers and restricted metric log formats any more. After that, Trial Controller will continuously collect metrics from Katib DB and update the status of Trial, which is the same as pull-based metrics collection.&lt;/p&gt;
256+
257+
&lt;p&gt;If you are interested in it, please refer to this &lt;a href=&quot;https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#push-based-metrics-collector&quot;&gt;doc&lt;/a&gt; and &lt;a href=&quot;https://github.com/kubeflow/katib/blob/master/examples/v1beta1/sdk/mnist-with-push-metrics-collection.ipynb&quot;&gt;example&lt;/a&gt; for more details.&lt;/p&gt;
258+
255259
&lt;p&gt;&lt;img src=&quot;../images/2024-09-28-gsoc-2024-summary-push-based-metrics-collection/push-based-metrics-collection.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
256260

257261
&lt;h2 id=&quot;my-contributions-during-the-gsoc&quot;&gt;My Contributions during the GSoC&lt;/h2&gt;
@@ -274,7 +278,7 @@ In the blog, I’ll provide my personal insight into Katib, for those who are in
274278

275279
&lt;p&gt;Also, I raised some issues not only to describe the problems and bugs I met during the coding period, but also to suggest the future enhancement direction for Katib and the Training-Operator.&lt;/p&gt;
276280

277-
&lt;p&gt;There is an &lt;a href=&quot;https://github.com/kubeflow/katib/issues/2340&quot;&gt;Github Issue&lt;/a&gt; tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details.&lt;/p&gt;
281+
&lt;p&gt;There is a &lt;a href=&quot;https://github.com/kubeflow/katib/issues/2340&quot;&gt;Github Issue&lt;/a&gt; tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details.&lt;/p&gt;
278282

279283
&lt;h2 id=&quot;lessons-learned&quot;&gt;Lessons Learned&lt;/h2&gt;
280284

gsoc-2024-project-6/index.html

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,10 @@ <h2 id="solution">Solution</h2>
125125

126126
<p>We decided to implement a new API for Katib Python SDK to offer users a push-based way to store metrics directly into the Kaitb DB and resolve those issues raised by pull-based metrics collection.</p>
127127

128+
<p>In the new design, users just need to set <code class="language-plaintext highlighter-rouge">metrics_collector_config={"kind": "Push"}</code> in the <code class="language-plaintext highlighter-rouge">tune()</code> function and call the <code class="language-plaintext highlighter-rouge">report_metrics()</code> API in their objective function to push metrics to Katib DB directly. There are no sidecar containers and restricted metric log formats any more. After that, Trial Controller will continuously collect metrics from Katib DB and update the status of Trial, which is the same as pull-based metrics collection.</p>
129+
130+
<p>If you are interested in it, please refer to this <a href="https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#push-based-metrics-collector">doc</a> and <a href="https://github.com/kubeflow/katib/blob/master/examples/v1beta1/sdk/mnist-with-push-metrics-collection.ipynb">example</a> for more details.</p>
131+
128132
<p><img src="../images/2024-09-28-gsoc-2024-summary-push-based-metrics-collection/push-based-metrics-collection.png" alt="" /></p>
129133

130134
<h2 id="my-contributions-during-the-gsoc">My Contributions during the GSoC</h2>
@@ -147,7 +151,7 @@ <h2 id="my-contributions-during-the-gsoc">My Contributions during the GSoC</h2>
147151

148152
<p>Also, I raised some issues not only to describe the problems and bugs I met during the coding period, but also to suggest the future enhancement direction for Katib and the Training-Operator.</p>
149153

150-
<p>There is an <a href="https://github.com/kubeflow/katib/issues/2340">Github Issue</a> tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details.</p>
154+
<p>There is a <a href="https://github.com/kubeflow/katib/issues/2340">Github Issue</a> tracks the progress of developing push-based metrics collection for katib during the GSoC coding phase. If you are interested in my work or Katib, please can check this issue for more details.</p>
151155

152156
<h2 id="lessons-learned">Lessons Learned</h2>
153157

0 commit comments

Comments
 (0)